Cleaning up the tags

As you probably know, the best way to get a bargain on eBay is to stumble on an item that has been poorly described. People searching for a pirate doll, for example, will not find an item that has been described as “priate doll” and hence its eventual sale price will probably be much lower than an item correctly described.

Is the same true of information? In an article in D-Lib online magazine (a DARPA-sponsored publication about digital librarianship), two digital librarians discuss whether we should make any effort to “tidy up” folksonomies.

“On this scene enter – winged, horned, and spined –
A longlegs, a moth, and a dumbledore

[Hardy] might have instead written “A crane fly, a moth and a bee”, had he been willing to foresake the opportunity to instill a little local colour, but his choice to use dialect or common names was inspired, and the poem benefits from it. However, a search engine would not.”*

It`s undeniably true that the way items are tagged on sites like Flickr and is very haphazard. This is often because words are mis-spelled due to carelessness or a particular idioglossary, and any such tags are unlikely to be useful to other people unless the mis-spelling is common enough to be statistically significant. Another reason for the variation is that there is no widespread convention on whether to use singular or plural words for a tag; looking for “goose” on Flickr will not find images tagged only with “geese”.

The third common reason for the variation in tags for a particular thing is the one referred to in the example of Thomas Hardy`s poem. Searching for “dumbledore” will give you a lot of hits about Harry Potter and some, but far fewer, about bees. Tags are applied in many languages, and even though most are in English, English has such an enormous vocabulary that most words have synonyms.

The librarians who wrote the article recognise the difficulty of educating or coercing all users to use more useful tags, although they believe that regular users will naturally tend to use the same tags as each other because these are the tags they see most frequently in searches (a Power Law effect).

My own opinion is that user behaviour is unlikely to get any better than it is today. In fact the more people who use tagging systems, the higher the proportion of naive tags there will be. However, Flickr has shown that value can be added to tags by using statistical methods to enhance searching. Search for “apple” and you will get results divided automatically into images of computers and images of fruit, simply because of the additional tags that are commonly applied to those two discreet sets of images.

If you Google “geese” you will get some pages that only contain the word “goose”. Google is sophisticated enough to know that the two words are closely related. We can learn to deliver useful search results based on tags of relatively poor quality. All that is needed is a critical mass of tags to begin with. Let`s start tagging stuff!

* Interesting to note that even in this short extract there are two words, “foresake” and “instill”, that my spell checker quibbles with. Even librarians have cultural differences.


2 Responses to “Cleaning up the tags”

  1. 1 Dominic Sayers November 30, 2006 at 13:43

    Hi NatC, and thanks for the comment. Your blog is very interesting – I have added it to my aggregator. It seems like tagging one of the points at which your interest in linguistics and non-deterministic systems meets.

  2. 2 NatC November 28, 2006 at 14:04

    As to what is a good tagging habit… I’m not sure that we would want everyone to tag in the same way, in the same manner that I would not want everyone to speak international english. We have lots of words and synonyms because each of them have specific cultural aspects attached to them, that go far beyond what a dictionnary entry can tell us. And that can be useful during a tag search as well.
    Also, as a user, one of my prefered ways to look for entries of interest has always been to look for a tag, select interesting entries tagged that way, and look at how other people had tagged these entries. This way I’m expanding my horizon and start looking at tags I did not think of at first, but may bring interesting additional meaning or cultural specificities (would it simply be jargon from experts in the subject).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


This is not a riot

RSS What Dominic is doing

  • An error has occurred; the feed is probably down. Try again later.

Share me

Add to Technorati Favorites

Dominic's photographs

RSS My stubbornly unread reading list

  • An error has occurred; the feed is probably down. Try again later.

%d bloggers like this: