Finding the Right API for You: A Technical Services Workflow Perspective: 2017 Annual Conference

In this program speakers discussed how they are using APIs to improve and automate cataloging, acquisition, metadata enhancement, and holdings maintenance workflows in technical services. Some of the terminology included in the program are:

  • API: Application Program Interface—a method to send query using a certain protocol and receive the result in a certain format.
  • EDIFACT: Electronic Data Interchange For Administration, Commerce, and Transport—an international standard for data exchange for trade and commerce.
  • FTP: File Transfer Protocol—a standard protocol to transfer computer files.
  • JSON: JavaScript Object Notation—a type of format for data interchange.
  • LCNAF: Library of Congress Name Authority File.
  • SACD: Super Audio CD—a high resolution audio that has better resolution than the ones from typical CDs.
  • SQL: Structured Query Language—a language used to process or access data from a database
  • SRU: Search/Retrieve via URL—a standard search protocol for internet search queries.
  • URI: Uniform Resource Identifier—a string of characters used to identify a resource. For Linked Data purposes, URI is then used to refer to things on the web.
  • VIAF: Virtual International Authority File—a combination of various name authority files. Hosted by OCLC.
  • XSLT: eXtensible Stylesheet Language Transformations—a styling language used to transform XML document into other format.

API for Music Cataloging: Leveraging Expert Community Data Sources

Lucas Mak, Metadata and Cataloging Librarian, Michigan State University (MSU) Libraries

MSU Libraries received a gift for their music collection in various formats (CDs, SACDs, vinyl) with a total of more than 700,000 items. This gift brought some problem: the cataloging team has a limited capacity for original cataloging and needed to figure out how to leverage data that are already available to help them speed up the cataloging process. The Head of Cataloging, who happens to be a musician as well, identified two data sources: Discogs and MusicBrainz. Both sites have similar characteristics: they are crowd-sourced expert community sites, contain rich metadata (e.g., track lists, artist credits, labels, year of publication), and already have links to name identifies such as Wikipedia, VIAF, and LCNAF. Both sites also support API transactions and provide JSON outputs.

MSU Libraries utilize Discogs for their GoldenRods Music Collection and Romani music Collection (see their Discogs profile here). Both are vinyl collections. The JSON output from Discogs was reformatted into a generic XML, which then transformed into MarcXML using XSLT for further processing such as searching for LCNAF identifiers and inserting additional MARC elements. After that, they use MarcEdit to transform the MARCXML into MARC records.

From the MusicBrainz resource, they pulled in three core entities with various useful data points from each entity: release information (UPC, ASIN, Title, Artist, Date, Country, Label, etc.), recording information (Title, Artist credits, Duration), and artist (name in direct order, Sort-name in the inverted order, Type, ISNI, Links to external sites). Those data points then mapped into the relevant MARC21 fields. Each entity in the MusicBrainz relational database is assigned an MBID (MusicBrainz identifier), which can be used further to query additional information about an entity. An example of such a search is using the UPC code to get the Release ID, then use the Release ID to look up additional info such as artist, recordings, or labels, and then using the Recording ID to get more information like track artist, artist’s role, or a related work. The retrieved Artist ID can be used further to query additional information such as external sites about this artist, its wikipedia entry (if any), or even the VIAF record for the authorized name. MusicBrainz database schema can be found here.

Lucas pointed out that MusicBrainz has some limitations. For example, it only allows one API call per second per IP address; there is no controlled list of genres; the place of publication is country-level ISO standard code only (which then need to be transferred into MARC); and there is no transliteration for non-Roman script data. However, the API does help speed up record creation in the catalog with less human work.

Automating OCLC headings with OCLC & Alma API

Erin Grant, Head of Metadata Services, and Alex Cooper, Data Analyst, Emory University Libraries.

Emory University Libraries, consisting of ten libraries with six OCLC symbols, migrated from the Aleph integrated library system (ILS) to the Alma library services platform in December 2015. The migration meant changing cataloging workflows as well as plenty of data cleanup work. One example was publishing holdings to OCLC. While Alma has the ability to publish holdings directly to OCLC, the process of deleting or withdrawing records became more complicated because of the number of OCLC symbols they used. The solution to streamlining the process was to create scripts that utilize APIs. This approach gave them the benefit of automating routine tasks and optimizing workflows. The approach also provides more granular control because they can customize the solution down to the item details and allows them to leverage Alma Analytics reports for custom services. Their previous workflow when deleting OCLC holdings was inefficient and required many hand-offs. Originally, for withdrawn monographs, the stacks staff deleted bibliographic records, sent the list of OCLC numbers to the cataloging team, which then manually removed the OCLC holdings. The new approach is setting the app to be triggered when the withdrawal process began. The automated steps are:

  • Use Withdrawn and Deleted Alma Analytics reports to produce the list of OCLC numbers.
  • Use Alma Analytics API to retrieve the list of those OCLC numbers.
  • For deleted records, check the catalog to make sure there are no duplicate holdings in Alma by using Alma SRU.
  • Use SQL query to send e-mail report containing the list of the OCLC numbers for quality control to make sure the records are deleted from the catalog.
  • Use WorldCat Metadata API to remove holdings from WorldCat.

Another API work they are currently working on is setting OCLC monographic holdings. Currently, the holdings are set by various methods; they can be manually added by the catalogers or student employees, or by acquiring e-books sets through WorldShare Collection Manager. They are still testing the API with the goal to resolve inconsistencies in how holdings are set.

The source code for deleting OCLC holdings can be found here and the source code for setting OCLC holdings can be found here.

Leveraging LC’s Linked Data API

Matthew Miguez, Metadata Librarian, Florida State University

Presenting remotely, Matthew started by showing a graph visualizing the growth of digital collections on their new Institutional Repository platform over the course of 2015. The graph showed a steady growth and by early 2016 the growth increased dramatically, fueled by their ETD (Electronic Theses and Dissertations) migration into the new platform. Over the same time, subject access degraded. Further analysis showed that the growth of the collection affected the quality of the subject access, since there was no subject analysis done for their ETD collection when the records were originally created. Since the new digital repository platform provides a single search for all collections, they investigated the possibility of automating and proofing subject access across the contents for better access.

The Library of Congress provides a Linked Data API that can be accessed here. Typically, one can use the service by sending a URI (Uniform Resource Identifier) and retrieving the controlled heading. In their case, they have a list of terms as keywords (created by the authors and the department) and needed to figure out a way to use those terms as URIs in order to get the controlled headings. Matthew figured out the way by using the pattern id.loc.gov/vocabulary/label/term where vocabulary is the pair of terms describing the vocabulary or authority file they were searching from and term is the query for the term you search for.

With this method, they can send the keywords to find the controlled vocabularies and then add those into the records to enhance subject access to their collection. The process ends up almost like a plug-n-play subject reconciliation service. They use the service to edit all loaded IR records. As a tool for the IR submission workflow, users can still submit the document with their own keywords and the IR staff can run the process to pull in the authorized terms.

The source code for the work can be found here and the subject reconciliation program can be found here.

Acquisitions API

Wen Ying Lu, Head of Cataloging, Santa Clara University Library

In 2012, Santa Clara University Library started to use GobiExport Plus to generate bibliographic and order records in their Millennium/Sierra ILS and transmit the EDIFACT order records to the GOBI system. The process required a lot of staff time: they have to retrieve the information selected for ordering, review the information, and then submit the order either one at a time or as a batch load. This process could take from five minutes to overnight (sometime up to 48 hours) and had to be done repeatedly via FTP for each of their sub-accounts on their system. In February 2016, they decided to become a beta test partner with Innovative Interfaces, Inc. and YBP (now GOBI) for the Acquisition API (also known as GOBI API) and fully implemented the API in July 2016.

The API approach allows the system to do real-time data transfer, which helps streamline their workflow, save staff time, and more quickly make newly-ordered titles discoverable in their catalog. With the implementation of the API, once a title is selected, the bibliographic record shows up in the ILS right away, the order record is automatically created and sent back to GOBI. Additional benefits include reducing human error, increasing data accuracy, and decreasing the ergonomic risk factor of the repetitive tasks. They also share with GOBI the holdings that they did not order through GOBI to reduce the potential of order duplication.

Lu pointed out that this implementation still has some limitations: only bibliographic records are delivered via API (thus necessitating the delivery of full bibliographic records at the time of order); once the order record is created, it cannot be repurposed for alternative processing; and the API can only handle one order at a time. She also provides some consideration for other libraries to think about whether the acquisition API is right for your library, the library is OK with those limitations, and whether the existing workflow can be revised to take the advantage of the benefits of the API.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.