• Metadata for the Masses

    Many classification systems suffer from an inflexible top-down approach, forcing users to view the world in potentially unfamiliar ways.

    When I’m shopping for wine online, the standard methods of presentation mean little to me — what’s the difference between a California cabernet, a French merlot, an Australian shiraz? If I were in a store, I could find a salesperson to navigate this sea of options, asking questions like, “What goes with grilled fish?” or “Can you recommend a fruity red wine?” However, online wine stores lose customers who don’t understand highly specialized wine categorizations.

    But what if we could somehow peek inside our users’ thought processes to figure out how they view the world? One way to do that is through ethnoclassification [1] — how people classify and categorize the world around them.

    Addressing a Problem in Classification

    We’re beginning to see ethnoclassification in action on the social bookmarks site Del.icio.us, and the photo sharing site Flickr. Both services encourage users to apply their own freely listed tags to content — tags that others can then employ when looking for content. See a web page that looks interesting, but don’t have time to read it? Post it to Del.icio.us with a tag that will help you find it again.

    Let’s consider another classification challenge. When I’m looking for documents on Adaptive Path’s intranet (which I helped design), I’m often frustrated because I’m unable to uncover items that I know should be there. There are a number of reasons why — picking topics from a pull-down menu is arduous, the topics we currently employ are not sufficient, and updating the tool with new topics is too time consuming. Productivity declines as we hunt for documents with cryptic filenames.

    The primary benefit of free tagging is that we know the classification makes sense to users. It can also reveal terms that “experts” might have overlooked. “Cameraphone” and “moblog” are newborn words that are already among Flickr’s most popular; such adoption speed is unheard of in typical classifications. For a content creator who is uploading information into such a system, being able to freely list subjects, instead of choosing from a pre-approved “pick list,” makes tagging content much easier. This, in turn, makes it more likely that users will take time to classify their contributions.

    Not a Perfect Solution

    Clearly, such tagging systems are not a panacea; they present many potential drawbacks. With no one controlling the vocabulary, users develop multiple terms for identical concepts. For example, if you want to find all references to New York City on Del.icio.us, you’ll have to look through “nyc,” “newyork,” and “newyorkcity.”

    You may also encounter the inverse problem — users employing the same term for disparate concepts. Flow, for instance, can either mean optimal creative experience, or the movement of a fluid.

    What’s more, sometimes the tagging is simply wrong. “Archeology” turns up — among the appropriate articles about human settlements — entries on dinosaurs and primitive microbes. While such muddling is fine for a casual service, adopting user-driven tagging is problematic for things like organization-wide document repositories.

    Bringing Order to Chaos

    The potential benefits of free tagging should encourage practitioners to address such shortcomings. In looking for a real-world analog, I thought of the foot-worn paths that appear in a landscape over time. Called “desire lines,” these trails demonstrate how a landscape’s users choose to move, which is often not on the paved paths. A smart landscape designer will let wanderers create paths through use, and then pave the emerging walkways, ensuring optimal utility.

    Desire Lines
    Photo from Phil Gyford.

    Ethnoclassification systems can similarly “emerge.” Once you have a preliminary system in place, you can use the most common tags to develop a controlled vocabulary that truly speaks the users’ language.

    Use the tags to understand how people consider the content at hand. Then you can “pave” the best paths to ensure findability — say, by explicitly linking “nyc,” “newyork,” and “newyorkcity.” You can also align these tags with more formal schemes, thus enhancing the utility of both. Tools like the Getty Thesaurus of Geographic Names Online could be used to make explicit relationships between “nyc,” “manhattan,” and “harlem.”

    As with any novel approach, further experimentation is necessary to draw out the most value for both creators and users. I haven’t even addressed the intriguing social aspects of such tools, which will likely have greater impact than the personal ones. I’m excited to see what arises!

    [1] Ethnoclassification, to the best of my knowledge, was coined by Susan Leigh Star for her Digital Libraries conference workshop “Slouching Toward Infrastructure.”



  • Close
    Team Profile