Migration of a Library Catalogue into RDA Linked Open Data

Earlier this year, information professionals Candela Gustavo, Rafael Carrasco, and Manuel Marco-Such from various library and university institutions in Spain published an article “Migration of a Library Catalogue into RDA Linked Open Data” in the Semantic Web Journal.  Here they outline the process of migrating the catalog of the Biblioteca Virtual Miguel de Cervantes that was originally created following MARC21 into conceptual models following the Functional Requirements for Bibliographic Records (FRBR) and Functional Requirements for Authority Data (FRAD).  Then they mapped their FRBR and FRAD content to Resource Description Framework (RDF) triples containing bibliographic metadata in Resource Description Access (RDA) and made new catalogue available online. Through an interface, users can browse and search information and allows the public data to be linked.  Much of the process relied on open-source technology.

Tackling the daunting task of transforming their old records into a new format, they first outlined the challenges of transforming MARC 21 to FRBR models: missing or inconsistent title information, variable encodings, markup and textual errors, multiple publication statements, unspecified roles, lack of unique identifiers for creators, and analytics cataloging.  They addressed these by implementing steps to normalize and enhance the MARC records. The transformation itself was done in three steps: 1) Identification of FRBR entities, 2) Extraction of relationships between entities, and 3) Semi-automatic clustering of entities.

The first step required highly detailed mapping between the original metadata and FRBR attributes, as they simultaneously tried to minimize duplications and decomposed complex subject headings to reduce number of different subject entities.  The second step identified connections between works (e.g., creator-subject), expressions (e.g., translators, editors), and manifestations (e.g., printer and illustrator). They implemented a web cataloging interface to allow librarians to supervise the transformation and clustering process, allowing them to retrieve, modify, and create relationships and navigate the hierarchical navigation. The last step involved grouping manifestations and expressions of the same work, employing data mining to achieve this.  The process followed principles of the Online Computer Library Center (OCLC) FRBR Work-Set Algorithm allowing them to identify sets of works based on information in bibliographic and authority records.

In transforming FRBR to RDA linked open data, they selected a persistent RDF view rather than the transient RDF view, reasoning that bibliographic archives don’t often update their data.  They implemented a parser in Java to apply mapping rules between the FRBR database and RDA vocabulary. For every entity in one of the RDA classes of vocabulary, they created an RDF document containing properties and relationships.  If a relationship could not be described with RDA elements, they opted to use popular vocabularies. They enriched the dataset semantically and automatically by linking objects to terms on other LOD datasets, for example, through links to DBpedia for persons by using identifiers in the Virtual International Authority File (VIAF).

The automated process allowed them to transform over 200,000 bibliographic records and 70,000 authority entries, resulting in 15 million RDF triples all published online at www.data.cervantesvirtual.com  The produced dataset when through a series of validation tests.  Currently, Wikidata contains 4,500 links to their dataset, and links to DBpedia allows users to use SPARQL to retrieve information.  They developed a tool for laypersons to browse the LOD.

It’s incredible to consider how this project has made their library’s information more visible and accessible, especially now that people seek information largely on the Web.  Library catalogues expressed as LOD allow both librarians and users not only find resources faster, but see the concepts and relationships that connect them.

 

–Bernadette Patino, INFO 653-02

Tagged with: , , , , , ,
Posted in Libraries, Library, Linked Open Data, LODLAM, Uncategorized

by Hugh McLeod

Follow INFO 653 Knowledge Organization on WordPress.com
Pratt Institute School of Information
%d bloggers like this: