The unexpected boon of impermanence

Screen Shot 2017-09-29 at 3.57.55 PM.png

In a world of CCTV cameras on every corner, and smartphones in every hand, we’re living in a time with unprecedented amounts of data. Unfortunately, it isn’t always easy to see how to parse and use this data in a meaningful way. Enter Documenting the Now, a joint initiative between Washington University in St. Louis, the University of California, Riverside, and the University of Maryland, and Carnegie Mellon University. DocNow is a platform designed “around supporting the ethical collection, use, and preservation of social media content.” Across the messy web of big data, archivists and activists are linking hands to tackle human rights related issues in a community-centric archive.

I learned about this initiative from an article on ABC News Australia’s website, “Meet the digital librarians saving social media posts to protect human rights.” Essentially, DocNow captures posts on social media as a way to shed light on otherwise unseen aspects of human rights abuses. For example, in the 2014 killing of Michael Brown in Ferguson, Missouri, the Documenting the Now team looked to Twitter as a way to document the event.  In two weeks, more than 13M tweets were collected and archived. However, the team struggled with the fact that this is an impermanent archive, an archive out of their control. Because it is a community-based archive, it is possible for tweets to be deleted by users (although they cannot be edited). As we are reading about authority control for class next week, I found this particularly timely, as DocNow clearly (and knowingly) has a lack of authority control.  We can see the unique challenges that come to light without such control in place.

Despite the impermanence of the archive, the lack of a central authority or “truth” can also present an unexpected boon: the ability to piece together several different accounts of an event, and to do so using the audience’s own wording, or tagging. Interestingly, though, DocNow’s goal isn’t to collate posts in order to present a neutral or authoritative view, but rather, to “understand the biases in each particular view of the event.” Another advantage to the lack of authority control in this case is that the event is more likely to bubble up to the surface, and be visible to the wider public.  The more people that tweet at or about a human rights violation, the more likely other folks will be to hear about it. And since the public can use whatever tags they may find appropriate, it makes the data more available to discovery, and more relevant. A great example of the power of this platform comes from Professor Jay Aronson of Carnegie Mellon University, who worked on a Ukrainian legal case “focusing on deaths during the Euromaidan protests,” where the compiled video aided the case greatly in prosecuting riot police.

Social media isn’t just a pastime – our world’s entries are just another form of “documents” for us to organize into meaningful information.

Submitted by Lindsay Menachemi, LIS653-01, Fall 2017

Tagged with: , , , , , ,
Posted in Archives, Classification, Open Data

What happens when a “database” has no identifiers: the ATF’s gun registry

The ATF’s Nonsensical Non-Searchable Gun Databases, Explained

This week’s readings were about the importance and logic behind the creation of authority control files and controlled vocabularies in order to have consistent, searchable records. This article discusses how a lack of standardized naming resources and record consolidation for the ATF’s gun registry causes a myriad of issues when legally purchased guns are under investigation or involved in crimes.

Due to current legislation, the ATF’s registry cannot be searchable or consolidated into a database, “The ATF’s record-keeping system lacks certain basic functionalities standard to every other database created in the modern age. Despite its vast size, and importance to crime fighters, it is less sophisticated than an online card catalog maintained by a small town public library”. In fact, the only digital component to the registry is the image files, whose naming system can only be traced if the physical copy of the purchase record can be found among the hundreds of boxes in the ATF’s archive. The article explains that, “The ATF processes a high number of trace requests: 372,992 last year. The agency says a trace takes on average four to seven business days to complete. If not for the ban on consolidating data into a searchable system, the ATF could create a database that allows it to immediately check the sales history of any gun used in a crime.”

The creation of a database with authority files for gun dealers, an archive of all gun models with standardized vocabulary, and a record (paired with an institutionally created unique identifier) of every legal gun purchased would drastically lessen the time it takes for the ATF to track down ownership records, greatly improving the speed of federal investigations, perhaps ultimately saving lives.

We spend a great deal of time in class discussing the importance of knowledge organization and how effective databases, metadata schemas, OPACs and interoperable records has changed the way people all over the world access information, and I think this article is an incredible and weighty example of how the creation of authorities and searchable databases could have significant real world implications.

Submitted by Drew Facklam

Tagged with: , , , , , , , , , , ,
Posted in Archives, Cataloging, Classification

Classifying the Cookbook

For the last two summers, I have worked at a cooking school’s library, amidst a collection of cookbooks which numbers in the thousands. My interest is reference—which in this case, usually goes like this. A student dressed in kitchen whites dashes in breathlessly, and announces: “Chef asked me to look for a recipe for Bearnaise!” It’s my job to track down his sauce recipe as quickly—and as well—as possible, so that he can dash back to the kitchen to, presumably, make French sauce.

While the library was well laid out—according to Dewey, with books filed under chef names, ingredients, and types of cooking or country of origin—working there made me think about how flawed even the best system of classification could be. A student would ask for a recipe for madeleine cookies, and I’d steer them to baking, even find them an excellent recipe by top pastry chef Dorie Greenspan—meanwhile wondering, though, if an even better recipe was stowed unread under the “chefs” section, maybe in a Jacques Pepin or even a Thomas Keller cookbook. With time for browsing limited, the chef cookbooks tended to be less often read, and the ingredient-driven books (in sections for Pasta, Salads, etc.) tended to circulate better. I wondered about those excellent but less accessible recipes, so rarely read or used.

If you cook, you know that usually a cook’s library—no matter how treasured the volumes—tend to contain a lot of background books that are rarely used. Now that recipes are online, classified and more searchable, any recipe is in theory more accessible. As Joni Mitchell would have said, there’s “something lost and something gained”—there’s a beauty to using the old cookbooks, with their handwritten notes in the margins, and the stains of past feasts on their photos.

Here’s how some of the food world’s stars classify their cookbook collections:

Cookbook Libraries

Submitted by Rose Kernochan, 653-01

Posted in Uncategorized

“Nation Building Through Information Sharing”: Nancy Dupree and the Afghanistan Center at Kabul University


Dupree, right, and her colleague show an archived paper of the Taliban [from Al Jazeera obituary] [Massoud Hossaini/Reuters]

On September 10, 2017, the Afghanistan Center of Kabul University (ACKU) announced that its founder, Nancy Dupree, died at 89 years old. Dupree, an American who spent many years in Afghanistan, collected photographs, sound recordings, newspapers, magazines, and other documents from when she arrived in 1962 to the early 2000s. These materials document Afghanistan through the peaceful 1970s to the Soviet invasion and Soviet-backed Kabul government to Taliban rule.

ACKU, founded in 2006, was a vision of Dupree’s since 1989 when she and her husband worked on the ACBAR Resource and Information Centre (ARIC). ACBAR was the Agency Coordinating Body for Afghan Relief, which was active in Peshawar, Pakistan to support Afghan refugees. ARIC, then, according to the website of ACKU, was a “central depository for reports and surveys generated by NGOs, bilateral humanitarian organizations and UN agencies… The purpose was to facilitate coordination of humanitarian aid to Afghan refugees and to avoid duplication.” This commitment to collection and organization is exemplified in the Center’s motto: “nation building through information sharing.” ACKU also has a program for building libraries in communities outside of Kabul.



Nancy Hatch Dupree at the Afghanistan Center at Kabul University in 2014. [from New York Times obituary] [Massoud Hossaini/Associated Press]

This motto of the ACKU is echoed by an anecdote Omara Khan Massoudi, former director of Afghanistan’s National Museum, told The New York Times for Dupree’s obituary. During the years of Taliban rule Dupree returned to Kabul to install metal doors to help protect the surviving cultural artifacts in the National Museum. During this time she told Massoudi, “A nation stays alive if its culture stays alive.”

I am interested in this story because we often hear about the documents and artifacts taken out of their cultural context by anthropologists, archaeologists, and historians to be organized and preserved for study in museums and archives. It seems that Dupree was invested in keeping this collection in Afghanistan as opposed to an institution in the United States because of its significance to Afghan culture and the importance of this collection staying in Afghanistan. The organization of materials in libraries, museums, and archives is not done in a vacuum. It is a worthwhile endeavor that can not only assist users in accessing an information resource, but also empower them in their use.

Sources consulted:

New York Times obituary

Washington Post obituary

Afghanistan Center at Kabul University website

Submitted by Alice Griffin (653-03)

Tagged with: , , , ,
Posted in Archives, Libraries, Library

Sad Songs, Artificial Intelligence and Gracenote’s Quest to Unlock the World’s Music

So, who doesn’t love a sad song?: Gracenote wants to help you find them, using machine listening and learning technologies to recognize and classify genres, and even curate collections globally.

I stumbled upon this article after briefly researching CD-Text and iTunes metadata. I was inspired by the memory of wasted days entering song titles, artists, albums, and genres into empty cells on iTunes as a kid. I felt triumphant. iTunes didn’t know the music I was listening to and neither did Winamp. To this day, iTunes can not read CD-Text alone. It requires records to be mastered as CD-Rs and then to be submitted to Gracenote or All Music, two of the world’s largest music databases, in order to display an album’s full digital content on iTunes, Window’s Media Player, or any other music service.

Gracenote, a large digital entertainment company, is taking this one step further. It is mainstreaming machine listening and learning technologies to find and curate playlists that fit a single mood and curate a ‘mood profile’. Gracenote is a massive database with detailed and deep metadata that empowers Apple, Microsoft, and Amazon’s music platforms as well as their respective Cloud recognition technologies. This article briefly discusses the leap from traditional algorithms to neural machine learning networks. Yet these new technologies still use the original categorization method developed for CD databases at their most basic, using the duration of each track, and the overall album length, as the entry data. Commercial CDs only preserve a chronological list of track lengths, known as the album’s “Table of Contents”: a database finds each song individually by duration, compiles the list of tracks, and matches them with TOC (an album) in the database and then, with the assistance of Internet softwares, like Gracenote, albums are again identified by their exact duration but now more detailed album information is returned to the user – Mp3, track titles, album art and more.

Screen Shot 2017-09-24 at 12.11.47 PM

Screen Shot 2017-09-24 at 12.11.06 PM

Screenshot of Gracenote website

To keep it simple, a machine learning system creates algorithms from training sets that can learn and make predictions and are either either supervised or unsupervised. Gracenote for instance, began with a taxonomy of “100 vibes and moods” and used a supervised system. This creates a self-learning system with a training set that keeps the algorithms from going too far astray from the original input. For example, an unsupervised system would result in The Go-Go’s showing up in the middle of your ‘mercurial mood mix’. Additionally, there are endless complexities to capturing recorded music and identifying moods through the ‘ear’ of a machine. I think Marcus Cremer, VP Research of Gracenote, put it best that when “unsupervised, Gracenote’s system could for example decide to pay attention to compression artifacts, and match them to moods, with Cremer joking that the system may decide: “It’s all 96 kbps, so this makes me sad.”

Compressed mp3s make me sad too.

Here are examples of Gracenote’s most recent attempts at “vibe taxonomies”: “sexy stomper”, “plaintive”, and “soft, sensual, & intimate”. Would Gracenote read this song as sad with its twangy guitar and steady drive forward?

Therefore, Gracenote’s move to use machine listening technologies, those that are unassisted and with no human interaction, and machine learning algorithms to further identify and curate mood profiles without the human ear or context is quite a leap! — and well maybe a little sad. According to the article this technology has been in development for over ten years and is now being ‘taught’ to create and curate our music experience. It is obvious that no human will ever be able to listen and categorize every song ever recorded, but the race to document and categorize everything is overwhelming and seems to in some ways, leave little to be discovered by human effort. Our likes are all machine fed.

While this article is delightful and silly, it points to the immensity information systems and architectures, and the roles they play in our near constant music consumption. I’ll speak for myself and say that I had never thought much of how the information I entered and listened to on iTunes, or how Spotify could possibly make the mistake of putting these songs together on a suggested playlist. I scoff at the mishap and now, instead of blaming Spotify unknowingly for this flagrant mistake, I’ll shake my head and despair at the direction supervised machine learning is going and question to what extent can software truly detect a “sad song”.

Screen Shot 2017-09-24 at 12.12.16 PM

Gracenote Screenshot

Submitted by: C McLaughlin LIS-653-01 Fall/2017

Sad Songs, Artificial Intelligence and Gracenote’s Quest to Unlock the World’s Music


Tagged with: , , , , , , , , , ,
Posted in Archives, Classification, Knowledge Structures

OCLC: replace FAST heading “Illegal aliens”

In July 2014, students and librarians from Dartmouth College submitted a proposal to change the Library of Congress Subject Heading (LCSH) Illegal aliens and associated headings such as Children of illegal aliens. LC rejected that proposal in December 2014. The American Library Association (ALA) approved a resolution in January 2016 “urging the Library of Congress to change the subject heading Illegal aliens to Undocumented immigrants” (link to resolution [pdf]). In March 2016, LC announced that in response to constituent requests, they would change the Illegal aliens heading and replace it with two headings, Noncitizens and Unauthorized immigration, but some Republican members of the House of Representatives objected to that decision. LC has been seeking constituent input since May 2016, but their decision seems to have been delayed indefinitely due to political concerns. (See the Cataloging News column from Cataloging and Classification Quarterly volume 54 number 7 for a timeline of recent efforts to change the heading.)
The Library of Congress has been pressured to maintain a heading which ALA Council agreed was “dehumanizing, offensive, inflammatory, and even a racial slur.” While FAST is primarily based on LCSH, there is no reason why OCLC cannot make the ethical choice that LC has been constrained from making.
The July 2016 final report from the SAC Working Group on the LCSH Illegal aliens includes a summary of thoughtful recommended LCSH changes. The report’s summary of recommendation states “This report concurs with the Library of Congress decision to change the subject heading Aliens to Noncitizens, but recommends that Illegal aliens be replaced with Undocumented immigrants where appropriate. In cases where the subject heading Illegal aliens has been assigned to works about nonimmigrants, more specific terms should be assigned.”
We understand that some manual reassignment of headings may need to occur with this change, contrary to the automated nature of FAST assignment. However, we feel this is a unique, time-sensitive case, where there is a moral imperative to replace wording which is disrespectful and harmful to many of our users. With this heading receiving national attention in venues such as The New York Times, this is an opportunity to show our users that the vocabulary we use provides “the highest level of service to all library users through appropriate and usefully organized resources” (from Section I of the ALA Code of Ethics).
We, the undersigned, request that OCLC change the FAST headings which use derogatory Illegal aliens terminology and replace them with the headings Noncitizens and Undocumented immigrants or more specific headings as recommended in the SAC report. [source]

Tagged with: , , ,
Posted in Cataloging, Classification, Libraries

Beginning Stages of Topic/Group Selection for LIS-653-01 [FA17]


photograph by Robin Miller

Topic Suggestions Include: Tattoo Cataloging, Wikipedia+GLAM, Cataloging of Dance, Linked Open Data for Art Objects, Fan and Amateur Archives and Wikis, Linked Open Data for Community Organizations, Adoption of Wiki-data in Libraries, Archiving Stolen Objects,  Linked Open Data for Social Good, Folksonomies and Hashtags for Images on Instagram


photograph by Sarah Adams

– submitted by Sarah Adams

Tagged with: , , , ,
Posted in Research Projects

Most Americans – especially Millennials – say libraries can help them find reliable, trustworthy information

Americans struggle to determine what news and information sources they should trust and how to discern reliable information online. They worry that fake news is sowing confusion about current events. And many express a desire to get help.

About six-in-ten adults (61%) say they would be helped at least somewhat in making decisions if they got training on how to find trustworthy information online, according to a new analysis of Pew Research Center survey data from 2016. What’s more, a majority of Americans say public libraries are helpful as people try to meet their information needs.

About eight-in-ten adults (78%) feel that public libraries help them find information that is trustworthy and reliable and 76% say libraries help them learn new things. Also, 56% believe libraries help them get information that aids with decisions they have to make.

On each of these questions, Millennials (those ages 18 to 35 in 2016) stand out as the most ardent library fans. Young adults, whose public library use is higher than that of older Americans, are particularly likely to say the library helps them with information.

A large majority of Millennials (87%) say the library helps them find information that is trustworthy and reliable, compared with 74% of Baby Boomers (ages 52 to 70) who say the same. More than eight-in-ten Millennials (85%) credit libraries with helping them learn new things, compared with 72% of Boomers. And just under two-thirds (63%) of Millennials say the library helps them get information that assists with decisions they have to make, compared with 55% of Boomers.

While the library is seen as one useful resource, the survey also found that 55% of adults say that training to gain confidence in using computers, smartphones and the internet would help in making decisions.

Blacks and Hispanics are more likely than whites to believe training would help them, both in how to use online resources and in gaining confidence with digital tools. Similarly, those with less than a high school diploma are more likely than those with at least a bachelor’s degree to think training would help. And women are slightly more likely than men to express this view.

Many Americans believe public libraries help them in other ways, according to the survey:

65% say libraries help them grow as people.
49% think libraries help them focus on things that matter in their lives.
43% believe libraries help them cope with a busy world.
38% say libraries help them cope with a world where it’s hard to get ahead.
27% think libraries help them protect their personal data from online thieves.
Beyond generational differences, there are some other demographic differences in views on how libraries can help. Women are somewhat more likely than men to report that libraries help them find information that is trustworthy and reliable (82% vs. 75%), learn new things (80% vs. 73%), grow as a person (69% vs. 61%), focus on things that matter in their lives (54% vs. 44%) and cope with a busy world (47% vs. 38%).

Hispanics are especially likely to say that the public library helps them learn new things, grow personally and focus on things that matter in their lives compared with smaller shares of blacks and whites who say the same. Hispanics are also more likely than whites and blacks to say the library aids them in coping with a busy world or with a world where it is hard to get ahead.

Those with less than a high school diploma are more likely than college graduates to say libraries help them in several areas: helping them focus on things that matter in their lives (63% vs. 46%), coping with a busy world (55% vs. 37%), coping with a world where it is hard to get ahead (54% vs. 30%) and protecting their personal data from online theft (48% vs. 18%).

Note: See full topline results here (PDF). Read more about Americans’ engagement with libraries and library resources in a 2016 Pew Research Center report.

Note: This report was made possible by The Pew Charitable Trusts, which received support for the project through a grant from the Bill & Melinda Gates Foundation. The findings and conclusions contained within are those of the authors and do not necessarily reflect positions or policies of the Bill & Melinda Gates Foundation.

Chelsea Fritz
LIS 653

Tagged with: , , , ,
Posted in Libraries

Treasures as Archives


(To further on Lindsay’s blog post on Harvey)

A common theoretical question you may have entertained: If there was a fire in your house and you had 5 minutes to escape, what (aside from living beings) would you take with you?  If you are lucky you will never have to fully answer this question. However, many who have survived a catastrophic occurrence such as the victims of last weeks Hurricane Harvey have done just that.

The New York times produced an article on 6 survivors of Harvey and what they chose to keep in the face of losing all of their possessions. The article included photographs by Tamir Kalifa of the objects and their owners (staged in front of mountains of debris, presumably theirs). The objects are a photo booth strip, a set of tea cups, a Turkish rug, a child’s homemade Father’s Day card, a ceramic lamp, and urns filled with the ashes of the owners three dogs.

Each owner was interviewed as to why they chose to keep the object and what it meant to them. The first thing to note is that while none of these objects are likely to have been the most costly item in their home they were the most valuable to that person, either because they were too specific to be replaceable (the Fathers Day card) or they represented an emotional attachment that exceeded the monetary value of the item (the ceramic lamp).

This makes me think of what we discussed in class this week in terms of personal archives and the inherent or implied value in  “ephemera”. What is valuable to our personal archive and what is valuable to be included in a more general purpose archive? Now that these objects are included in a New York Times article on an historic event, their value and relevance evolves and multiplies.

To my mind, the most interesting items that were saved were the three urns holding the ashes of LaVern Cox’s dead dogs (also worth noting: not the actress Laverne Cox).  Everything was about to be destroyed and she literally chose to save ashes- something that had already been incinerated. To her mind “they were like our children” and even in the form of ash represented a life beyond the object itself.

The other thing to note about this article is that, as an archive in and of itself, it is inconsistent in its documentation.  Photographs of everyone and their objects adhere to the same structure (even down to how they are shot) except for Shirley Hines’ cup. It is noted in the article that the broken cup in the photo is the one that was thrown away while others that are undocumented were kept.  While I am guessing this was an aesthetic choice for the sake of visual impact, as a record of artifacts this inconsistency interrupts the usefulness of the article as a record.

Micaela Walker

LIS 653-01 Fall 2017

Tagged with: , , , , ,
Posted in Archives, Cataloging, Classification, Uncategorized

by Hugh McLeod

Follow LIS 653 Knowledge Organization on
Pratt Institute School of Information