Saturday, February 4, 2012

(Controlled Vocabularies + Authority Files)*Software=Interoperability

All of this week’s readings demonstrated the apparent need for authority control as part of cataloguing best practices. The work of creating and maintaining authority files using controlled vocabularies or a thesaurus, as far as Gorman is concerned, is imperative for achieving 100% precision and 100% recall in information retrieval within large databases. However, not everyone is interested in achieving perfection and so the Dublin Core metadata terms, which offer the less than satisfactory results that Gorman describes when using free text to search the Web is sufficient.

It was also apparent that each type of community, be it a library, archive or museum, has its own unique cataloguing requirements for bibliographic records to satisfy their unique user needs. Salo’s piece on the quality of metadata harvesting tools in institutional repositories brought into focus the difficulty that uncontrolled names can produce in collocation of scholarly articles for a chosen author. Salo lays the blame at two very different door-steps. One of the problems is the lack of standardization in institutional repository metadata that does not use authority control mechanisms and the other is software design that does not help facilitate the resolution of authority problems.

I could not find the reading listed in the syllabus, Lanzi, E. (1998). Standards: What role do they play? What, why and how of vocabularies. In Introduction to Vocabularies: Enhancing Access to Cultural Heritage Information, ed. E. Lanzi. Los Angeles, CA: Getty Information Institute, pp. 8-27, but I did find another piece on controlled vocabularies from the Getty Information Institute that made a strong argument for the use of community oriented vocabulary in conjunction with authority files for better retrieval. Tillett mentioned one of the Getty’s thesauri, Union List of Artists Names that the museum uses to control the name variations of an entity. Increasing precision and recall for a non-expert searcher at the Getty is handled by a software interface that uses the controlled lists to suggest terms for the individual to use. The same sort of program is used in a Google search.

Authority control is only part of the solution. Software programing that increases interoperability is another part. Together they will decrease the reality of the quip “Garbage In-Garbage Out”. 

Libraries-Metadata-Interoperability


I enjoyed reading Robert Darnton’s The Library in the New Age in which his emphasis on the continuity of the nature of information that is its instability was an interesting perspective. We cannot, however, divert our attention from the fact that the technology used to digitally organize and preserve this unstable text is rapidly changing. The flux of technology is the crux of the problem. This was in fact Darnton’s point number four of his argument when he pointed out that Google may even disappear or be over shadowed by another company or technology rendering the digital data inaccessible. It did the heart of this self-taught bookbinder good to hear Darnton say, “The best preservation system ever invented was the old-fashioned, pre-modern book.” But as they say about a lot of modern stuff, “They just don’t make them like they used to.”

The article on metadata sharing across the different information disciplines by Elings and Waibel (Metadata for All: Descriptive Standards and Metadata Sharing across Libraries, Archives and Museums) reinforced the need for standards when creating data structures and organizing data content, which allows for data sharing and aggregation across libraries, archives and museums. I was reminded of the different view museums have of information in their vision to engage the public in the discovery of cultural materials through the collections. As Elings and Waibel point out museums do not just describe materials for search and retrieval; museums organize interpretations of objects that lend to the authenticity of the object in their collections. They then reach out to the general public ultimately to bring the patron into the museums. A few years ago while gathering information for an undergraduate research project on museum conservation practices I experienced the fruits of database interoperability when I found an online project co-sponsored by The Getty’s conservation department and the Courtauld Gallery in the UK. The Getty was doing conservation work on a couple of Lucas Cranach the Elder painting for the Courtauld Gallery. One museum database held the conservation information the other held the historical information. Together they developed an amazing inside look into the mysteries of art conservation and introduced possibly thousands of people to cultural treasures they may never be able to view in person. Information from two different databases brought together by a fascinating interface that allowed self-directed investigation. I'm all for information professionals striving to bring continuity to information organization and retrieval.

Saturday, January 21, 2012

The genius of FRBR and FRAD


     The genius of FRBR and FRAD is the relationships that exist between the entities that are tracked in bibliographic and authority records. It was sensed by Cutter, expanded upon by Lubetzky, realized in FBBR and FRAD and finally functionally utilized by RDA. With the dawn of the Internet universal interoperability of databases would naturally have been a goal for any organization that collected, organized, maintained and shared large amounts of information. FRBR and FRAD are the first of the theoretical system-neutral models that elegantly “map the relationships” between the recorded data to match the search criteria of the users. The advocates of these conceptual models understand the need to control the input of data placed in a container that can be read and manipulated by a computer. 
      The first of these containers was the MARC record. As the software and hardware technologies that read and move content around the web evolve a standard that is flexible and easily encoded is needed. RDA fits the need by distributing data into “discrete data elements” allowing for easier encoding than the AARC2 standard.  Ultimately, RDA can be used to create resource databases that are machine-readable moving libraries away from the MARC record container. 
      I believe that all the effort to move information management towards models such as FRBR and FRAD are in anticipation of the Semantic Web. Machine-readable descriptive bibliographic and authority records that can explain what we know about the content of a record will allow a machine to process knowledge itself. If the goal of a catalog is to help the user “find, identify, select and obtain” relevant resources then information professionals should be preparing for the day when software can “intelligently” retrieve and present works of any type for every user.  

Universal Cataloguing System


      The on going struggle to organize the world’s information into a universal catalogue of knowledge has generated a variety of approaches. There are those who believe that interoperability on an international scale can only be achieved through a strict set of cataloging rules with a controlled vocabulary thesaurus such as the US National Library of Medicine MeSH. Another approach is a hierarchical set of metadata such as that used in Archives. Still others preach the utility of the user-created metadata of Folksonomies. 
      It would seem that one size does not fit all. This is best illustrated in the field of museum curation and the use of metadata to search and retrieve information. Museums must gather, maintain and organize information on the collections for the use of the museum and scholarly research as well as the general public. On one level the need for authenticity and completeness of content for example of preservation information is acute. Structured and controlled access points would better facilitate research and maintenance of art objects and artifacts. The investment in terms of time and cost to develop such a database is high. The question of how much information does the general public want or need to successfully utilize the museum holdings must also be addressed. Would the Dublin Core Metadata Element Set suffice to serve the general public? Or could user-generated metadata be incorporated to facilitate greater usage statistics? 
      It seems that the same problems of how to open a collection to all people plagued the information specialists of the 17th and 18th century as today. Gabriel Naude believed that a good library had two catalogues: one based on a systematic organization of disciplines and another that allowed an alphabetical search by author. I question whether a universal system for organizing and retrieving information is desirable, much less achievable.