Metadata Interest Group at ALA Midwinter 2011

Metadata Interest Group Meeting
ALA Midwinter Meeting
Jan. 9, 2011 8-10 AM

Links to the actual presentations will be provided when they are available.

The first speaker was Corey A. Harper, who presented on, “Linked Library Data: 2010-2011 Update.” His talk was designed to provide an update on linked data activity since ALA Annual 2010. Coverage included:

  • Preconference announcement of the WC3 incubator group
  • National library activities in linked data
  • Dublin Core 2010 in Pittsburgh
  • Archives and museums activities in linked data
  • Authority work
  • IFLA work
  • Linkypedia – not quite linked data but illustrates why the linked data movement is so important
  • Vision for using library linked data

Harper began with the four principles, as announced by Tim Berners-Lee:
1. Use URIs as names for things
2. Use http URIs so people can look up the names
3. When someone looks up a URI, give back information
4. Include links to related URIs

Once this announcement was made, growth started to expand exponentially for linked data, which Harper illustrated by showing how the cloud of linked data has grown. Libraries really came on board between 2009-2010. At this point, the cloud has grown so much that it is no longer manually maintained, but automatically generated, published by the Comprehensive Knowledge Archive Network.

Libraries all know the value of library data: bibliographic and authority–these have been the first areas for libraries to look at in terms of publishing linked data. Linked data provides standardized set of mechanisms to expose the data in a way that plays with other data on the web and is usable with other data on the web and so that the libraries can use other non-library data. Examples of library data that available:

  • Bibliographic ontology
  • RDA (Resource Description and Access)
  • FRBR (Functional Requirements for Bibliographic Records) – both official and unofficial (some bibliographic data has been developed outside of libraries)
  • ISBD (International Standard Book Description)
  • VIAF (Virtual International Authority File) and MADS (Metadata Authority Description Schema)

Harper then provided an update on what entities have been publishing linked data since April 2010. His list included national libraries, such as the German national library, Hungarian national library, and British national library, which has just made its data available as an RDF download. Getting the data out there is the big first step for libraries. Additionally, there has been a lot of authority work done in relations to linked data. VIAF has been revamped in terms of how it published linked data, based on work at DCMI 2010 conference. It now clusters authority records, and offers different views of the same data. Harper showed an example of how Bob Dylan is represented as a subject heading and a FoaF (Friend of a Friend) name. This allows data to be available in both a library version and a FoaF representation At http://id.loc.gov, there have been additions of the MARC code lists: countries, geographic areas, and languages. There have also been early effort to manage precoordination in subject headings with being expressed in RDF.MADS as RDF. It is right now an open draft for public comment. There is a mirror some of the library of congress data at http://lcsubjects.org, that is also trying to manage precoordinated headings.

At the Worldwide Web Consortium, a new “incubator group” has been created to address library linked data: W3C LLD XG (Library Linked Data): Incubator groups are discussion groups designed to come up with the strategic directions for a specific issue, with the usual outcome a report with recommendations to the W3C. The membership of the group is researchers, consultants, and librarians, including a lot of national libraries. As part of their work, they have collected over 50 use cases to find out what kinds of applications library linked data could support: publishing bibliographic data, dealing with authority data, archival needs, etc. They are mining use cases for functional requirements and design patterns, with a report due to be issued in Summer 2011. All deliberations of the group are public and people are welcome to follow along: http://www.w3.org/2005/Incubator/lld/wiki/Main_Page.

Harper next provided a brief overview of initiatives and activity in relation to RDA. All RDA elements, roles, and vocabularies have been registered in the open metadata registry and are represented as SKOS. Additionally, IFLA FRBR and ISBD elements are all registered. IFLA is reviewing and consolidating all of the FRBR reports to reconcile conflicts and update FRBR, with work to represent it as RDF. Ultimately, RDA would like to be a multilingual work.

Additional Developments

  • The Open Metadata Registry has continued to grow. It was formerly the National Science Digital Library Registry, but its scope has grown beyond that. The Open Metadata Registry provides a vocabulary service and allows users to take URIs and assign different views of the same data, allowing the registry to become international in scope.
  • Linked Open Copac Archives Hub (LOCAH) is a UK-based project, funded by the JISC. The project is working on making available EAD data from the Archives Hub and bibliographic data (MODS) from Copac (which themselves are both JISC-funded services) as linked data. (thanks to Pete Johnson for clarifying).
  • Europeana Project and Europeana Data Model (EDM) is a project to represent museum objects across Europe. It builds upon the OAI-ORE (Open Archives Initiative for Object Reuse and Exchange) model. The project is trying to aggregate the different descriptions of digital surrogates and link them to the actual resource in question.

Harper ended his talk ends with a description of Linkypedia: http://linkypedia.inkdroid.org/. Although not linked data, it is a alpha project done by Ed Summers on his spare time. Summers is harvesting all of the links used in documentation on every Wikipedia article. A lot of the citations point to library/museum/archives information. Linkypedia is designed to find out what articles are citing a particular source (e.g., how many Wikipedia articles cite the NARA website?) Soon libraries will be able to enter their specific cite to find out what articles are citing it. There is an additional set of views to see what other citations are contained within the same article. The principles of the project: topical hubs, aboutness, shared interest, and ways to link cultural heritage community information together. It follows the same principles of linked data.

Some discussion following the formal presentation noted the following:

  • Publishers are also working with linked data, for example the New York Times, BBC and Reuters. Large publishers and news agencies are starting to get into this space.
  • Rhonda Marker noted a connection between linked data and new requirements of the National Science Foundation to develop data management plans. As a result, libraries need to be more explicit about the rights management and the relationships between data sets, journal articles, and other related bits. Linked data principles should help manage those relationships. Harper responded that Dublin Core has a new working group that is looking at metadata provenance: history and change history of the metadata itself, may help with managing datasets by documenting how they have been curated.

The second speaker was Oliver Pesch of EBSCO Information Services, (filling in for Mike Giarlo), who spoke about “Institutional Identifiers: NISO I2 Working Group.”

Pesche began by providing some working group history. It was founded in 2008 and chaired by Grace Agnew of Rutgers University and Pesch, and composed of members from all sectors of the library supply chain. The Mission of the I2 Working Group is to create a robust, scalable, interoperable standard to uniquely identify institutions and describe the relationships between them. As requirements, it should be lightweight to manage, re-usable by business sector registries, and interoperable with legacy applications.

Pesche explained why such an identifier is important. The identity of an institution is critical to any information model, and needs to be global, interoperable, unambiguous, and unique. The Working Group is trying to develop a central registry to assign the identifier and store core metadata to identify the institution, provide look-up services to see if an institution has been identified, and provide an API for programmers. Distributed sets of business applications would be able to use this data. Pesch showed a chart of how the central registry could be used by various registration agencies, with business applications being developed by the registration agency.

There is a draft list of metadata elements to identify institutions. There is a main identifier, variant identifiers, affiliated institutions, etc. In determining the identifier, the Working Group looked at a lot of existing identifiers for institutions, e.g., OCLC symbols, MARC codes, SAN (standard address number), ONIX, etc. The closest they found is the International Standard Name Identifier (ISNI), which provides public identification of any entity involved in creation, production, management, and content distribution chains. The actual identifier is a 16 digit number with check character. ISNI is working with VIAF to leverage ISNI in the VIAF authority files. An alternative is to use the http GET function to use a base URL to identify institution or create a REST-ful URL.

Pesch ended the presentation by providing some scenarios for how the registry could be used for ILL request, and ordering subscriptions to different portions of an institution. More information about the I2 Working Group can be found on NISO’s site: http://www.niso.org/workrooms/i2.

Posted by Kristin Martin

Posted in ALA Midwinter 2011 | 1 Comment

News from the front: briefings from RDA test participants

ALA program organized by ALCTS at Midwinter Conference 2011

RDA testing has been completed. In this session, a group of test participants shared their experience in an informal panel discussion. Representatives from academic, museum, school libraries, ILS vendors and LIS faculty testers presented lively and informative discussion of their individual experiences and issues encountered during testing, as well as the insights they gained.

Beacher Wiggins, the Director for Cataloguing in the Library of Congress, gave a brief update on the status of the testing of RDA by the national libraries and other testing participants. The test just finished two weeks before the ALA midwinter conference 2011. 7000 RDA records have been created in the Library of Congress catalogue. MARC 21 was still be used to represent the data. The informal online survey to RDA testers were just closed in Jan. 6, 2011. At ALA Annual conference 2011 in New Orleans, decision will be made either RDA will be adopted, or will be adopted but at a later time, or only some fields will be adopted. No matter what the decision will be, it is for sure that mixed data from AACR2 and RDA will appear in the future.

Christopher Cronin, the Director of Metadata and Cataloguing Services in the University of Chicago Library, shared their test plan and experience. In their test plan, they decided:

  • Who would be involved in the test
  • How much staff time would be spent
  • What RDA elements should be tested

Original and copy cataloguers for a variety of formats as well as cataloguing department heads participated in the test. Christopher listed how many hours were spent on:

  • Preparing ILS
  • Creating records
  • Displaying records
  • Reviewing LC and PCC rules and local policy
  • Training staff
  • Library-wide presentation
  • Testers meetings
  • Post-test analysis

Six RDA elements were tested:

  • Other title information (RDA 2.3.4)
  • Copyright date (RDA 2.11)
  • ISSN of series (RDA 2.12.8)
  • ISSN of subseries (RDA 2.12.16)
  • Media type (RDA 3.2)
  • Source consulted (RDA 29.6)

They totally created 1283 RDA records, including 617 monographs, 598 maps, 23 serials, 19 sound recordings, etc. as well as 800 authority records and 18 Dublin core records.

What they liked about RDA:

  • 37X in authority records
  • Expression of relations
  • Getting rid of abbreviations
  • Treatment of reproductions
  • No rule of three

What they disliked about RDA:

  • Changing established headings (comparing with AACR2)
  • Copyright date in 260$c
  • 33X fields in bibliographic records (utility of this data was unrealized in MARC and their system)
  • Navigating within search result was difficult because RDA Toolkit was lack of indexing.

Christopher concluded that RDA had minimal impact on original-cataloguing as long as they followed the national standards but had large impact on copy-cataloguing because decisions had to made on whether accepting the importing RDA records as-is, whether accepting sometimes or all the time, whether correcting poor copies (coded RDA but missing core elements), and whether upgrading AACR2 records to RDA records, etc.

Christopher and his colleagues were going to continue the test and the post-test analysis. They would train cataloguers to create RDA authority records, review the survey from participating cataloguers, review created records, evaluate policy decisions and evaluate impact on authority processing.

Maritta Coppieters, the Vice President at Backstage Library Works reported their RDA test. Their cataloguers created both AACR2 records and RDA records for the same items in order to compare the results. Maritta showed her concerns from the point of a library service provider.

  • Assumption: RDA is more expensive in their business than AACR2 because
      • There are so many options, so they will have to cater for different requirements from different institutions.
      • Relaxed restrictions means more different metadatada will be input into records.
      • Upgrading AACR2 records to RDA records or not?
  • Reality: RDA will slow down cataloguing because
      • Lack of tools, examples and templates
      • Difficulty of search and organization of RDA Toolkit

Panel speakers from other participating institutes briefly shared their experience including Penny Baker, the Collections Management Librarian at Sterling and Francine Clark Art Institute, Williamstown, MA, Richark Hasenyager, Director of Library Services for the North East Independent School District, San Antonio, TX, Sylvia Hall-Ellis, Associate Professor, Library and Information Science Program, University of Denver, etc.

Basically, all the speakers showed positive attitude to RDA and agreed that testers were able to work on RDA rules, especially those with no or less cataloguing experience. The most challenging issues were RDA Toolkit was ambivalent to navigate and the RDA terminology was lack of annotation so it was hard to understand.

At the Q & A session, Beather warned the audience to be caution in thinking of converting their local records from AACR2 to RDA because OCLC has the master records.

As results from testing will not be available at this point, results did not be discussed during the panel. The working group will start to analyze the test data and will present the result at ALA Annual Conference 2011.

Posted in ALA Midwinter 2011 | 2 Comments

ALA Midwinter 2011: Best Bets for Metadata Librarians and Call for Bloggers

Below is a list of metadata and digital library-friendly sessions for ALA Midwinter 2011. Planning to attend a session or already reporting on a session? Think about blogging it here! If you would like to blog any of the sessions, please contact Kristin Martin at kmarti@uic.edu with your name, e-mail address, and preferred session. A link is provided to the ALA Conference Scheduler and to fuller descriptions, when available. See a section not on here that you think would be of interest? Suggest it!

I’ve tried to be inclusive as possible with the sessions as metadata is a cross-disciplinary topic within library and information science. Sessions of interest include metadata, digital projects, digital technology, and cataloging, and are from all different groups within ALA. Note that many of the sessions are sponsored through LITA, which has its own blog and they are also looking for bloggers. They are listed here for interest and I will link to write-ups following the conference.

Friday, January 7, 2011

10:30am – 12:00pm

FRBR Interest Group (ALCTS)
San Diego Convention Center (SDCC): Room 25 C
Conference Scheduler: http://connect.ala.org/node/119619
ANO description

1:30pm – 3:30pm

News from the front: Briefings from RDA test participants
San Diego Convention Center (SDCC): Room 26 A/B
Conference Scheduler: http://connect.ala.org/node/119761
ANO description

3:30pm – 5:30pm

Cataloging and Classification Section Executive Committee Forum (ALCTS CCS)
San Diego Convention Center (SDCC): Room 26 A/B
Conference Scheduler: http://connect.ala.org/node/120015
ANO description

4:00pm – 5:15pm

Competencies and Education for a Career in Cataloging Interest Group (ALCTS)
San Diego Convention Center (SDCC): Room 28 B
Conference Scheduler: http://connect.ala.org/node/120536

Saturday, January 8, 2011

8:00am – 10:00am

Technical Services in Academic Libraries Interest Group (ALCTS)
San Diego Convention Center (SDCC): Room 23 C
Conference Scheduler: http://connect.ala.org/node/119548
ANO description

10:30am – 12:00pm

Catalog Form and Function Interest Group (ALCTS)
San Diego Convention Center (SDCC): Room 24 A
Conference Scheduler: http://connect.ala.org/node/120218

Digital Libraries Interest Group (LITA)
Hilton San Diego Bayfront (HIL): Aqua 300
Conference Scheduler: http://connect.ala.org/node/120578

Electronic Resources Interest Group (ALCTS CRS)
San Diego Convention Center (SDCC): Room 31 B
Conference Scheduler: http://connect.ala.org/node/119608
ANO description

1:30pm – 3:30pm

Catalog Management Interest Group (ALCTS)
San Diego Convention Center (SDCC): Room 30 C
Conference Scheduler: http://connect.ala.org/node/119648
ANO description

Cataloging Norms Interest Group (ALCTS)
San Diego Convention Center (SDCC): Room 25 C
Conference Scheduler: http://connect.ala.org/node/119633
ANO description

Digital Conversion Interest Group (ALCTS)
San Diego Convention Center (SDCC): Room 30 D
Conference Scheduler: http://connect.ala.org/node/119691

JPEG2000 Interest Group (LITA)
Marriott Hotel & Marina (MAR): Warner Center
Conference Scheduler: http://connect.ala.org/node/119664

4:00pm – 5:30pm

Intellectual Access To Preservation Metadata Interest Group (ALCTS)
San Diego Convention Center (SDCC)
Room 24 A
Conference Scheduler: http://connect.ala.org/node/119684
ANO description

Collaborative Digitization Discussion Group (ASCLA)
San Diego Convention Center (SDCC): Room 23 B
Conference Scheduler: http://connect.ala.org/node/120372

Holdings Update Forum–Holdings information in Electronic Content Access
San Diego Convention Center (SDCC): Room 30 C
Conference Scheduler: http://connect.ala.org/node/119926

MARC Formats Interest Group (ALCTS, LITA)
San Diego Convention Center (SDCC): Room 26 A/B
Conference Scheduler: http://connect.ala.org/node/120043
ANO description

Sunday, January 9, 2011

8:00am – 10:00am

Digital Preservation Interest Group (ALCTS)
San Diego Convention Center (SDCC): Room 07 A
Conference Scheduler: http://connect.ala.org/node/119686
ANO Description

Metadata Interest Group (ALCTS)
San Diego Convention Center (SDCC): Room 05 B
Conference Scheduler: http://connect.ala.org/node/119544
ANO description
Blogger: Kristin Martin

Top Technology Trends (LITA)
San Diego Convention Center (SDCC): Room 26 A/B
Conference Scheduler: http://connect.ala.org/node/120454

10:30am – 12:00pm

Cataloging and Classification Research Interest Group (ALCTS)
San Diego Convention Center (SDCC): Room 33 A
Conference Scheduler: http://connect.ala.org/node/120137
ANO Description

Next Generation Catalog Interest Group (LITA)
Hilton San Diego Bayfront (HIL): Sapphire M
Conference Scheduler: http://connect.ala.org/node/119596

US RDA Test Participants Forum
San Diego Convention Center (SDCC): Room 07 A
Conference Scheduler: http://connect.ala.org/node/120553

Continuing Resources Standards Update Forum
Hilton San Diego Bayfront (HIL): Aqua 314
Conference Scheduler: http://connect.ala.org/node/119936
ANO description
1:30pm – 3:30pm

Authority Control Interest Group Meeting (ALCTS)
San Diego Convention Center (SDCC): Room 11 A
Conference Scheduler: http://connect.ala.org/node/119574

4:00pm – 5:30pm

Creative Ideas in Technical Services Discussion Group Meeting (ALCTS)
San Diego Convention Center (SDCC): Room 30 A
Conference Scheduler: http://connect.ala.org/node/120296

PCC Participants’ Meeting
San Diego Convention Center (SDCC)
Room 26 A/B
Conference Scheduler: http://connect.ala.org/node/119667
ANO Description

Role of the Professional Librarian in Technical Services Interest Group (ALCTS)
San Diego Convention Center (SDCC): Room 31 A
Conference Scheduler: http://connect.ala.org/node/119676

Monday, January 10, 2011

8:00am – 10:00am

Heads of Cataloging Interest Group (ALCTS)
San Diego Convention Center (SDCC): Room 07 A
Conference Scheduler: http://connect.ala.org/node/120021
ANO description: http://www.ala.org/ala/mgrps/divs/alcts/resources/ano/v21/n4/event/ig.cfm

10:30am – 12:00pm

Forum (ALCTS)
San Diego Convention Center (SDCC): Room 28 A/B
Conference Scheduler: http://connect.ala.org/node/120029
ANO description

1:30pm – 3:30pm

Continuing Resources Cataloging Committee Update Forum (ALCTS CRS)
San Diego Convention Center (SDCC): Room 11 A
Conference Scheduler: http://connect.ala.org/node/119808
ANO description

Technical Services Workflow Efficiency Interest Group (ALCTS)
San Diego Convention Center (SDCC): Room 30 A
Conference Scheduler: http://connect.ala.org/node/119611
ANO description

Posted in ALA Midwinter 2011 | Leave a comment

Metadata Interest Group at ALA Midwinter 2011

Please join for an interesting meeting and discussion at the ALA Midwinter Meeting in 2011 in San Diego:

Sunday, January 9th from 8:00 am to 10:00 am (San Diego Convention Center, Room 05 B)

Corey Harper (Metadata Services Librarian, New York University) will update the group on events following the 2010 Linked Data pre-conference, including special sessions held at Annual ’10 and DC2010 in Pittsburgh, and the current work of the W3C Linked Library Data group.

Mike Giarlo (Digital Library Architect, Penn State University) will talk about the linked data design proposed by the NISO I2 (Institutional Identifiers) Working Group.

Following the presentations, we will hold a brief business meetin

Posted in ALA Midwinter 2011 | Leave a comment

Metadata Interest Group Meeting ALA 2010: Linked Data

Metadata Interest Group Meeting
The Metadata Interest Group met on Sunday, June 28, and had two speakers.

Summaries of the presentations are below. A link to the full presentations is available at: http://connect.ala.org/node/107906

Linked Data and Controlled Vocabularies on the Web
Rebecca Guenther, Library of Congress

Ms. Guenther described a project underway at the Library of Congress to provide access to the Library of Congress’s controlled vocabularies using the Resource Description Framework (RDF). First, she gave an overview of the controlled vocabularies and their uses. Controlled vocabularies control value, reduce ambiguity, provide for synonym control, allow for validation, and establish formal relationships among terms. They can be simple, like lists of enumerated lists (e.g., drop down menu) or complex, (e.g., full thesauri with multiple relationships). The Library of Congress (LC) maintains standards that contain controlled vocabularies, including:

  • LCSH/NAF
  • TGM
  • MARC controlled lists (e.g., ISO 639-2 language codes)
  • MODS/METS/MIX/PREMIS controlled lists

Controlled vocabularies are currently represented in a variety of ways,

  • Metadata format like MARC authority records
  • XML schemas, e.g., enumerated list
  • RDF/XML and RDFS (i.e., semantic web)
  • SKOS Simple Knowledge Organization System
  • MADS (MODS for authority records)

Guenther focused on using SKOS at http://id.loc.gov. SKOS is an RDF application used to express knowledge organization systems such as classifications, thesauri, and taxonomies. It allows distributed decentralized management of SKOS through linked data-inspired applications. It requires a uniform resource identifier (e.g., http URIs). The data model in place at id.loc.gov provides a concept scheme; logical groupings of concepts; and labeling properties, annotation properties, and associative properties

SKOS was selected because the defined element set is relevant to controlled vocabularies, more than RDF or OWL (ontology web language) alone. It is easy to transform MARC authority records into SKOS and show broader and narrower relationships and it enables web services using the URIs.

Guenther also provided some additional information about “linked data,” which is a feature of the semantic web where links are made between resources. It goes beyond the hypertext links because it allows links between concepts. According to Wikipedia: “term used to describe a method of exposing, sharing, and connecting data via dereferenceable URIs on the Web.” Id.loc.gov is a web service for shared vocabularies. It should reduce maintenance and make openly available comprehensive information about controlled terms and has been an experiment ground for semantic web technologies.

Id.loc.gov went live in April 2009, with more vocabularies added in 2010: LCSH, TGM, MARC code list for relators, and PREMIS controlled vocabularies. Data is open and continuously updated, and can be bulk downloaded in RDF. Searches can bring up terms by ID or label information, and multiple vocabularies can be searched at once. A demonstration of the site revealed the visualizations tree, suggested terminology tab, and links to similar concepts in other vocabularies, such as the French RAMEAU subject headings.

The data from id.loc.gov has been put in use in several projects, including:

  • University of Pennsylvania online books
  • University of Virginia auto suggest feature
  • Freebase.org
  • National libraries of Sweden and France

In the future LC will be adding some additional vocabularies:

  • MARC code list for language (ISO 639-2) and other ISO 639 lists
  • MARC code list for countries and geographic areas
  • Additional PREMIS controlled vocabularies
  • Name authorities will be a challenge because they doesn’t fit into SKOS very well, so looking at a different mark-up

Some other avenues in the future includes a MADS OWL schema to enable identification of facets within name and subjects, expanded information on subdivisions, and additional relator terms to enhance existing relationships.

The technical infrastructure for id.loc.gov

  • Django (Python)
  • LCSH uses MySQL and SKOS RDF generated at time of request, mainly operates like relational database with MARC mapped to tables
  • Everything else is RDF triplesotre (Python Library, uses MySQL), XML to SKOS RDF/XML before ingest
  • Programmatic queries using SPARQL


VIVO: A Research-Focused Discovery Tool

Sara-Russell-Gonzalez, University of Florida

Russell-Gonzalez discussed VIVO, an open source semantic web application that enables the discovery of research and discovery for researchers. It is designed for researchers, students, administrators, and donor/funding agencies. It provides profiles for researchers. Originally developed at Cornell University Libraries to support the life sciences, it was redesigned in 2007 to be a semantic web application, and can cover all disciplines. The University of Florida got involved with a 2009 NIH grant to create National Networking with VIVO. VIVO is designed in part to answer the following questions:

  • Researchers don’t visit the library with online resources, so how do you know what your researchers are doing and how can you be involved in the research process?
  • How can researchers form collaborations with researchers in other disciplines or students learn about potential advisors?
  • How can administrators know their strengths and weaknesses for strategic planning?

VIVO gathers data from a variety of sources, although all of it is public. As much as possible is done automatically, drawing from internal and external sources. Because each school is different, each school has their own local VIVO instance. Local sources can includes the institutional repository, human resources databases, institutional grants database, faculty reporting tool, etc. National sources are mostly abstracts and indexes, like PubMed. All data is mapped into an RDF structure. Compliance with semantic web standards enables national network across all VIVOs around the country

Data is stored in RDF triples, with reflexive relationships (i.e., relationships are reflected in both directions). Consequently it grows quickly in size. The VIVO core ontology is used to describe people, organizations, activities, publications, events, interests, grants, and other relationships, with support for local extensions and FOAF (Friend of a Friend). Http URIs identify objects and data uses SPARQL end-points.

There are multiple challenges to using the semantic web approach, such as determining the level of granularity, scalability as the database grows, provenance of the data, and keeping data up-to-date. Disambiguation, particularly of authors, may be one of the biggest challenges. From a political standpoint, determining when data should be removed is another issue (e.g., what happens when a faculty member leaves?).

There is one year left on the project with some upcoming enhancements:

  • Want to be able to give ability to produce CVs and biosketches
  • Forming collaborations with publishers to bring in additional external data sources
  • Developing visualization capabilities

VIVO is still looking for schools to get involved, data providers and for application developers to interface with VIVO. The first national conference for VIVO is August 12-13 at the New York Hall of Science. VIVO’s website is: http://vivoweb.org/

Reported by Kristin Martin

Posted in ALA Annual 2010 | 1 Comment

Apples and Oranges

This is a report on the Metadata Interest
Group program: Converging Metadata Standards in Cultural Institutions: Apples & Oranges, which occurred Saturday June 26 2010, at 8:00 a.m. during the ALA Annual Meeting in Washington.

The three presentations provided some good insights into metadata challenges and possible courses of action. The first two dealt more closely with the theme suggested by the title of the session � aggregating heterogeneous metadata from differing sources including libraries, museums, and archives � while the third, a research report on an assessment of metadata from a cost/benefit perspective, brought in the strain of understanding user behavior and letting it guide metadata practice. This mixture came together like a good salad as evidenced by some lively question/answer at the end.

Danielle Plumer, Coordinator for the Texas Heritage Online project of the Texas State Library and Archives Commission, described the metadata education component of this cooperative digitization grant project. Over 30 institutions partnering together in 10 projects, each of which is creating at least 1,000 metadata records, were offered training at various locations in project management, legal issues, metadata standards and crosswalks (particularly content standards), controlled vocabularies, and digital preservation management. Much of the training was adapted for the audience for this project (many of whom are not librarians and who had a large learning curve) from modules developed by the Library of Congress, Cornell and others, and will be further modified to an online learning format and made available to anyone through Amigos Library Services. Danielle had some interesting comments on discoveries arising from her work with these diverse institutions: there�s nothing wrong with using MARC to describe cultural objects; LCSH is the most commonly used standard vocabulary, but is often poorly understood and some systems display it poorly; and often, metadata decisions are driven, not by the needs of a project, but by the limitations of the system/software used to create and store it.

Ching-Hsien Wang, Chief Information Officer at the Smithsonian Institution presented �Striving in Library, Archives, & Museum Converging Landscape: The power of working together�. The Smithsonian Institution encompasses 20 libraries, 19 museums, 14 archives, has recently launched a �one-stop� search center (http://collections.si.edu/search/), which will provide for the first time, the ability to search across all collections. It currently includes 4.6 million records, 445,000 images, from 40 data sources, encompassing highly diverse types of materials (e.g. books, postage stamps, audio of interviews�). In addition to simple search, the interface provides faceted browsing by object type, media, topic, name, date, place, data source, and many advanced features. �Metadata made it all happen.� They began by combining records from 8 Horizon databases, all MARC but with many differences, and as a result of that effort decided that metadata standardization needed to happen as they moved to incorporate data from the scientific and museum databases as well. The overcame challenges of defining common data elements and data typing for those elements through collaborative discussions with data providers, while respecting the diverse perspectives and traditions of different institutions. Much of the work to create unified indexes and presentation was done by programmers massaging and transforming the metadata; some MARC fields were omitted or, as with LCSH, �taken apart�. Sometimes assumed values for a particular context had to be supplied for the aggregation. They are at the end of the first phase but have much more to do; are working on iPhone and georeferencing applications and hierarchical facets, and are bringing each data source online one by one. Catalogers are learning and modifying their practices (and remedying existing metadata) as a result of seeing the outcomes.

Joyce Celeste Chapman, Library Fellow at NCSU Libraries, presented �Assessing metadata and incorporating user feedback,� a report on a research study she conducted to compare the time spent creating specific EAD elements with a study of both user behavior and opinion on the usefulness of those elements. This was a small-scale study, and Joyce was careful to state that the sample was not random and results were not generalizeable, but that the indications from them could nevertheless be useful and point to areas where a change of emphasis in metadata creation could benefit users. She also mentioned as a larger context, the recently released Final Report of the Task Force on Cost/Value Assessment of Bibliographic Control http://connect.ala.org/files/7981/costvaluetaskforcereport2010_06_18_pdf_77542.pdf . Discussion in the later question/answer period pointed out the difficulty of separating the effect of metadata practice from the effect of aspects of the discovery interface when trying to derive data on user impact. Chapman�s study was able to sidestep this problem to some degree by presenting a generic interface where each metadata element was chosen and presented separately, but she pointed out that the attempt to isolate metadata elements could result in a disjointed user experience. Timing for EAD field creation (Abstract, Bio/historical note, Scope/Content note, Subject Headings, Collection Inventory, and Other) was collected from 9 metadata creators at two institutions. A sample of end users were give 5 different tasks and their choices were analyzed; some were also interviewed on the relative importance they would place on data elements. The most striking finding was that Collection Inventory, while taking lots of time, also had high importance to users, while the Biographical note, which also was time consuming, was not ranked highly by users. There were also observations about the usefulness and duplication of information between Abstract and Scope/Content note. The user research group at NCSU is considering next steps for additional metadata assessment measures across different metadata schemes and methodologies.

In the question/answer, there were requests for sharing of metadata massaging tools/code (even if very institution-specific, people thought there�d be value in seeing how others are doing things at the code level); a general recognition that dates and subjects are among the most challenging elements to deal with in aggregating metadata; there are challenges in synchronizing / updating aggregations, especially when the source systems don�t provide timestamps; interaction between library, museum and archive folks often results in learning for all, and that all contexts could benefit from more detailed research into metadata�s value for users, however difficult that may be to assess.

Posted in ALA Annual 2010 | Leave a comment

ALA Annual 2010: Best Bets for Metadata Librarians and Call for Bloggers

Below is a list of metadata and digital library-friendly sessions for ALA Annual 2010. Planning to attend a session or already reporting on a session? Think about blogging it here! If you would like to blog any of the sessions, please contact Kristin Martin at kmarti@uic.edu with your name, e-mail address, and preferred session. Fuller descriptions, when available, are linked to. See a section not on here that you think would be of interest? Suggest it!

I’ve tried to be inclusive as possible with the sessions as metadata is a cross-disciplinary topic within library and information science. Sessions of interest include metadata, digital projects, digital technology, and cataloging, and are from all different groups within ALA. Note that many of the sessions are sponsored through LITA, which has its own blog and they are also looking for bloggers. They are listed here for interest and I will link to write-ups following the conference.

Friday Sessions

10:30 AM – 12:00 PM on 06/25
FRBR Interest Group
Location: MAY in Chinese BR
Unit/Subunit: ALCTS

3:30 PM – 5:15 PM on 06/25
Cataloging and Classification Forum (CCS)
Location: HIL in Lincoln
Unit/Subunit: ALCTS – CCS

4:00 PM – 5:15 PM on 06/25
Electronic Resources Management Interest Group
Location: HIL in Fairchild
Unit/Subunit: LITA, ALCTS

4:00 PM – 5:15 PM on 06/25
Competencies and Education for a Career in Cataloging Interest Group
Location: JW in Commerce
Unit/Subunit: ALCTS – CCS

Saturday Sessions

8:00 AM – 10:00 AM on 06/26
Technical Services Managers in Academic Libraries Interest Group Program
Location: MAD in Constitution

Unit/Subunit: ALCTS

8:00 AM – 10:00 AM on 06/26
Grassroot Prog.: Digital Initiative for College Libraries
Location: WCC in 141
Unit/Subunit: ALA

8:00 AM – 10:00 AM on 06/26
Converging Metadata Standards in Cultural Institutions: Apples & Oranges
Location: WCC in Ballroom B
Unit/Subunit: ALCTS
Blogger: Laura Akerman

10:30 AM – 12:00 PM on 06/26
Catalog Form and Function Interest Group
Location: HIL in Columbia 5

Unit/Subunit: ALCTS – CCS

10:30 AM – 12:00 PM on 06/26
Developing a Sustainable Digitization Workflow
Location: WCC in 146C

Unit/Subunit: LITA

1:30 PM – 3:30 PM on 06/26
Image Resources Interest Group
Location: CAP in Pan American
Unit/Subunit: ACRL

1:30 PM – 3:30 PM on 06/26
Catalog Management Interest Group
Location: HIL in Fairchild

Unit/Subunit: ALCTS – CCS

1:30 PM – 3:30 PM on 06/26
Cataloging Norms Interest Group
Location: HIL in Columbia 5

Unit/Subunit: ALCTS – CCS

1:30 PM – 3:30 PM on 06/26
Multiple Formats and Multiple Copies in a Digital Age: Acceptance, Tolerance, Elimination
Location: WCC in 147B
Unit/Subunit: ALCTS – CMDS, RUSA – CODES

1:30 PM – 3:30 PM on 06/26
Digital Conversion Interest Group
Location: JW Marriott Hotel (Capitol)- BR H/J
Unit/Subunit: ALCTS – PARS

4:00 PM – 5:30 PM on 06/26
MARC Format Interest Group
Location: HIL in Kolorama
Unit/Subunit: LITA, ALCTS

4:00 PM – 5:30 PM on 06/26
Holdings Update Forum: “Next Generation OPACs: Making the Most of Local Holdings Data”
Location: JW in Grand BR IV
Unit/Subunit: ALCTS – CRS

Sunday Sessions

8:00 AM – 10:00 AM on 06/27
Digital Preservation Interest Group
Location: JW in Grand BR I/II
Unit/Subunit: ALCTS – PARS

8:00 AM – 10:00 AM on 06/27
Digital Library Technology Interest Group
Location: HIL in Fairchild
Unit/Subunit: LITA

8:00 AM – 10:00 AM on 06/27
Cataloging and Beyond: The Year of Cataloging Research
Location: WCC in 147A
Unit/Subunit: ALCTS – CCS, RUSA – RSS, LITA

8:00 AM – 10:00 AM on 06/27
Metadata Interest Group
Location: WCC in 202A
Unit/Subunit: ALCTS – CCS,LCTS
Blogger: Kristin Martin

8:00 AM – 12:00 PM on 06/27
Digitization: Preserving and Open Access to African American Collections
Location: WCC in 152B
Unit/Subunit: ACRL – AFAS

10:30 AM – 12:00 PM on 06/27
Cataloging and Classification Interest Group: Social tagging in libraries
Location: HIL in Columbia 2

Unit/Subunit: ALCTS – CCS

10:30 AM – 12:00 PM on 06/27
Intellectual Access to Preservation Metadata
Location: JW in Capitol BR H/J
Unit/Subunit: ALCTS – PARS

10:30 AM – 12:00 PM on 06/27
Open to Change: Open Source and Next Generation ILS and ERMS
Location: WCC in 146C
Unit/Subunit: ALCTS – AS, ALCTS – CRS

10:30 AM – 12:00 PM on 06/27
To Protect and Serve: Is Digitization Good for Your Historical Collections?
Location: REN in Rennaisance West A/B
Unit/Subunit: RUSA – HS

10:30 AM – 12:00 PM on 06/27
Internet Resources and Services Interest Group
Location: HIL in Columbia 10
Unit/Subunit: LITA

10:30 AM – 12:00 PM on 06/27
With Great Power Comes Great Responsibility: Building a Support Infrastructure for an Open-Source ILS
Location: HIL in Fairchild
Unit/Subunit: LITA

10:30 AM – 12:00 PM on 06/27
MODS and MADS: Current implementations and future directions
Location: WCC in 143B/C
Unit/Subunit: LITA

1:30 PM – 3:30 PM on 06/27
RDA Update Forum
Location: JW in Grand BR I/II
Unit/Subunit: ALCTS – CCS

1:30 PM – 5:30 PM on 06/27
Authorized Genre, Forms and Facets in RDA
Location: HIL in Lincoln
Unit/Subunit: LITA, ALCTS

4:00 PM – 5:30 PM on 06/27
Standards Interest Group (LITA)
Location: HIL in Columbia 1
Unit/Subunit: LITA

4:00 PM – 5:30 PM on 06/27
Institutional Repositories in Action: Success Stories from the Federal World
Location: WCC in 202A
Unit/Subunit: FAFLRT

4:00 PM – 5:30 PM on 06/27
Creative Ideas in Technical Services Interest Group Meeting
Location: MAY in Chinese BR

Unit/Subunit: ALCTS

Monday Sessions

8:00 AM – 10:00 AM on 06/28
Boot Camp for the 21st Century Metadata Manager
Location: WCC in 150B
Unit/Subunit: AFL – OLAC, ALCTS – CCS

8:00 AM – 10:00 AM on 06/28
Heads of Cataloging Interest Group
Location: WCC in 140A/B

Unit/Subunit: ALCTS – CCS

10:30 AM – 12:00 PM on 06/28
Got Data? New Roles for Libraries in Shaping 21st Century Research/Presidents Program
Location: WCC in Ballroom B
Unit/Subunit: ALCTS

1:30 PM – 3:30 PM on 06/28
Next Generation Catalog Interest Group
Location: HIL in Columbia 8
Unit/Subunit: LITA

Posted in ALA Annual 2010 | Leave a comment