The Library of Congress BIBFRAME Update Forum took place at the 2018 ALA Midwinter Meeting on Sunday, February 11. With five speakers in total, this session provided a comprehensive overview of the state of BIBFRAME—the standard that will replace MARC—from the point of view of early adopters. Representatives from the Library of Congress (LC), Library Service Platform (LSP) providers, and linked open data projects discussed their experience using BIBFRAME and other frameworks.
Sally McCallum from LC talked about the status of their BIBFRAME Pilot 2. This pilot began in June 2017. Sixty LC catalogers are cataloging directly in BIBFRAME, with approximately 200 catalogers not participating. The pilot comprises a comprehensive variety of formats including books, serials, maps, music, moving images, rare materials, and sound. Part of this project involved creating what McCallum described as a “realistic cataloging environment.” This means that the entirety of LC’s catalog was converted to BIBFRAME for the purposes of this pilot. Catalogers add to this pilot catalog by cataloging materials initially in BIBFRAME (using LC’s BIBFRAME editor) and then re-catalog the material in MARC for LC’s production catalog.
In a previous session, McCallum described the scale of this conversion project: 18 million bibliographic records were converted to works, instances, or items, with 1.2 million work authority records converted to works. This resulted in 19.2 million BIBFRAME work descriptions and 23.7 million BIBFRAME instance descriptions, resulting in a huge triple store with more than 4 billion triples. McCallum said that she is optimistic that this number can be reduced. Projects for the future including working to make the editor more flexible, including automating the conversion of BIBFRAME to MARC. In the absence of this ability, however, LC pilot catalogers must continue to perform duplicated work. In addition, LC is discussing the possibility of a few local policy changes, including keeping works in the MARC bibliographic file rather than the authority file, and reducing the amount of transliteration performed in bibliographic records so that only access points are transliterated, and not descriptive fields, as is currently the case.
Sebastian Hammer, the co-founder of Index Data, talked about the use of BIBFRAME in FOLIO, a community-built LSP. He described the architecture that underlies FOLIO, and emphasized that this architecture was built with BIBFRAME in mind and based on the concepts that BIBFRAME exploits. Hammer described the reasons that the library community needs an open-source platform in order to move forward with the implementation of BIBFRAME.
Hammer was followed by Amy Pemble of Ex Libris, who described the linked data implementation in Alma, their LSP product. Pemble described the level of community engagement that Ex Libris has encouraged, including the Linked Open Data Working Group, which includes a variety of academic libraries. In addition, 41 Alma and Primo libraries have been collaborating with Ex Libris to advise on linked data—what Pemble described as “putting linked data at the service of libraries.” She highlighted one such project in service of this goal: Ex Libris’ 2017 collaboration with Harvard, which led to the MARC to BIBFRAME converter, which now enables all Alma customers to publish their metadata in BIBFRAME. Pemble described the built-in features of Alma that enable customers to exploit the benefits of linked data, including both MARC and BIBFRAME views of records, APIs, and user interfaces such as knowledge cards within Primo (the Ex Libris discovery layer). Looking to the future, Pemble said that Ex Libris hopes to implement the ability to catalog natively in BIBFRAME, search linked data endpoints within Alma, and select controlled vocabulary terms, even from non-MARC-based vocabularies.
Michelle Futornick, program manager for the LD4P project, described the background and status of this endeavor. It’s a two-year project funded by the Andrew W. Mellon Foundation, which includes six partner institutions (see the full list at ld4p.org, along with descriptions of the projects). Each of the six partner institutions is working on its own project with a distinctive collection of resources to be cataloged in linked data. These projects are focusing on non-book formats that can’t be well described in MARC. For instance, Cornell University is working on a collection of vintage hip-hop LPs, while Princeton University is working on annotations in the library of Jacques Derrida. Further, each project is working on building an extension of BIBFRAME that works with each format or domain on which they’re focusing. They’re using tools built both by the partner institutions and by external groups. For instance, Princeton University Library uses a locally developed annotation markup tool. Participants are using various BIBFRAME editors (e.g., LC’s BIBFRAME editor, VitroLib, and CEDAR) and providing feedback in order to help these editors improve.
Futornick discussed how the project participants are also examining changes to workflows using linked data, including examining having MARC, BIBFRAME and linked data co-exist in a discovery layer. She mentioned Biblioportal as a useful source for ontology discovery, visualization, etc. Finally, Futornick discussed the future. The goal is to have a second project, LD4P2: Pathway to Implementation, which will include developing a linked data editor sandbox. There have already been two community meetings; for further information, see meeting.ld4p.org.
The fifth speaker was John Chapman from OCLC, talking about that organization’s work on works. He explained that work data is crucial for accurate clustering (for instance, VIAF clusters). OCLC is working on testing models of works and improving FRBR clustering. All of this effort surrounding work records, Chapman explained, is critical for the purposes of both user discovery and data mining. Bibliographic data alone isn’t enough either to test the model or to create quality works. Chapman described xR, an OCLC tool that creates synthetic authority records for works: so far, 800,000 work records and 1,150,000 expression records have been created, with only 1,200 created manually—the rest automatically. This results in proper work and expression records and proper clustering in VIAF. Chapman demonstrated an example of a record created with the xR tool, the work record for Oligarkh and its three connected expression records. Goals for the future include expanding MARC-based data mining, providing tools and data services, and expanding OCLC’s range of linked data tools.