Tag Archives: records

Sentiment Analysis for Cultural Collection objects – aka how to identify the good stuff

Forgive me if this is an old idea, but I wanted to throw it out there and see if it has any mileage.

Sentiment Analysis is a technique widely used in marketing, and especially social media, to get a measure of the popularity of a brand or product. My question is whether the same techniques could be used to find the very best stuff in cultural collections, based on what people are sharing and talking about online.

The problem is this: I regularly use online collections, whether via a web interface or increasingly through APIs, but almost always end up with a wide range of results that I have to scour through to extract what I would call the decent stuff. Yes, many collections provide tools where I can drill down by facets like date or keyword, or maybe if there is a digitised version available, but what about the less tangible measures around quality and interest – in other words the ‘wow factor’?

Flickr is the one tool I know where this is actually available – if you do a search or use many of the API methods you can sort by what they call ‘interestingness’, a mystical measure based on an unknown formula that involves the number of Likes, Comments, Views, Tags and no doubt other factors. They even tried to patent the concept of interestingness and have no doubt, like Google’s rankings, continuously tweaked the algorithm over time.

So, whether it’s a small museum collection, or millions of records in aggregators like Europeana, DPLA or Trove, has anyone tried to do this for cultural collections and if not, how could it be done?

I’ve used the term sentiment analysis in the title of this post but I think it’s actually rather simpler than that, so here are a few examples of quantitative metrics I feel could provide the basis for this seemingly qualitative measure.

Web analytics – most collections with have Google Analytics or something similar measuring page views. Can we assume that the most visited objects are likely to be the best ones?

Referrals – taking one particular element of analytics, if people are being directed to specific objects from external sources (or even internal ones) surely this means that those objects have something of interest about them?

Social media – in a similar vein, if someone posts a link to an object, for example on Twitter, Facebook, Pinterest, Instagram or any other social channel, even if that doesn’t result in any referrals it’s another sign that someone, for whatever reason, has identified it as being noteworthy. And thinking back to the actual practise of sentiment analysis, if the original post or any replies/comments use emotional, positive words then those would have to score even more highly.

So if we can extract those metrics and combine them in clever ways, won’t we be able to identify all the great stuff?

Without going into detail, here are some interesting links to explore: