Sentiment Analysis for Cultural Collection objects – aka how to identify the good stuff

catalogues, cultural collections, databases, objects, quality, records, sentiment analysis, social media

Forgive me if this is an old idea, but I wanted to throw it out there and see if it has any mileage.

Sentiment Analysis is a technique widely used in marketing, and especially social media, to get a measure of the popularity of a brand or product. My question is whether the same techniques could be used to find the very best stuff in cultural collections, based on what people are sharing and talking about online.

The problem is this: I regularly use online collections, whether via a web interface or increasingly through APIs, but almost always end up with a wide range of results that I have to scour through to extract what I would call the decent stuff. Yes, many collections provide tools where I can drill down by facets like date or keyword, or maybe if there is a digitised version available, but what about the less tangible measures around quality and interest – in other words the ‘wow factor’?

Flickr is the one tool I know where this is actually available – if you do a search or use many of the API methods you can sort by what they call ‘interestingness’, a mystical measure based on an unknown formula that involves the number of Likes, Comments, Views, Tags and no doubt other factors. They even tried to patent the concept of interestingnessÂ and have no doubt, like Google’s rankings, continuously tweaked the algorithm over time.

So, whether it’s a small museum collection, or millions of records in aggregators like Europeana, DPLA or Trove, has anyone tried to do this for cultural collections and if not, how could it be done?

I’ve used the term sentiment analysis in the title of this post but I think it’s actually rather simpler than that, so here are a few examples of quantitative metrics I feel could provide the basis for this seemingly qualitative measure.

Web analytics – most collections with have Google Analytics or something similar measuring page views. Can we assume that the most visited objects are likely to be the best ones?

Referrals – taking one particular element of analytics, if people are being directed to specific objects from external sources (or even internal ones) surely this means that those objects have something of interest about them?

Social media – in a similar vein, if someone posts a link to an object, for example on Twitter, Facebook, Pinterest, Instagram or any other social channel, even if that doesn’t result in any referrals it’s another sign that someone, for whatever reason, has identified it as being noteworthy. And thinking back to the actual practise of sentiment analysis, if the original post or any replies/comments use emotional, positive words then those would have to score even more highly.

So if we can extract those metrics and combine them in clever ways, won’t we be able to identify all the great stuff?

Without going into detail, here are some interesting links to explore:

10 Comments

jamesinealing says:

18 February 2014 at 4:28 pm

My musing for today: Sentiment Analysis for Cultural Collection objects â€“ aka how to identify the good stuff http://t.co/IdyVHnBJCZ

Reply
Chris says:

18 February 2014 at 4:39 pm

Interesting, and I can’t think of any examples of what you’re talking about in the museum/gallery world.

Just to build on what you’ve said, I think your examples would be more likely to identify the stuff that’s already been identified as great stuff (if you see what I mean). Although I’d say there’s still plenty of value in doing that – you never know what activity might be going on without you knowing. Or it might be that a small network have discovered something and you can highlight it for a new audience.

For interestingness, I think it’d be better to look for items that get a disproportionate % of comments/shares per visit. That’s the stuff that’s causing a reaction. Although a greater % of referrals (as opposed to site/organic search) might point towards that too.

Reply
mia_out says:

18 February 2014 at 5:02 pm

RT @jamesinealing: My musing for today: Sentiment Analysis for Cultural Collection objects â€“ aka how to identify the good stuff http://t.coâ€¦

Reply
OpenGLAM says:

18 February 2014 at 7:04 pm

RT @jamesinealing: My musing for today: Sentiment Analysis for Cultural Collection objects â€“ aka how to identify the good stuff http://t.coâ€¦

Reply
rdhyee says:

18 February 2014 at 7:27 pm

RT @jamesinealing: My musing for today: Sentiment Analysis for Cultural Collection objects â€“ aka how to identify the good stuff http://t.coâ€¦

Reply
James says:

18 February 2014 at 8:57 pm

Chris, thanks for your comment, it’s great to hear someone else’s thoughts. It’s equally interesting that you don’t believe any museum/gallery is currently doing this, so I wonder where equivalent models exist? I’m sure online retailers must use this sort of data extensively.

I do see what you mean about ‘identifying stuff that has already been identified’ but my question, and I think a key point in my thinking, is whether this sort of identification is a) automated and b) captured in a way that is fed back into the core system. My experience is also that it’s a very long tail with museum objects, both in time and numbers. By this I mean that there are typically huge numbers of items in every collection that will interest equally diverse audiences; and also that that interest will vary over time, both in an immediate sense (when something or somewhere perhaps hits the news) and as longer trends (based on people’s interests).

As for the last point about interestingness, it would be a fun task playing with those sorts of metrics to see what sorts of results they give.

Reply
Chris says:

18 February 2014 at 11:13 pm

Publishers seem to do a lot of this kind of thing. Well, to the extent that they’ve all got a ‘popular’ widget on their homepage somewhere (with amusing results in the case of the BBC’s ‘man marries goat’ perennial fave). Someone once pointed out the difference between the iPlayer’s ‘Featured’ and ‘Most Popular’ lists can be quite jarring.

I assume those are mostly calculated on (page)views rather than anything else but could be wrong. Ah – forums is another. Reddit gives the option like hot, rising and top which (point to popularity over a different periods of time) along with controversial and gilded which highlight where people have reacted to content.

Reply
James says:

14 March 2014 at 10:32 am

Just to note that when I first published this I had a brief dialogue with a few people on Twitter which can be found here – https://twitter.com/jamesinealing/status/435813116258304000

As part of that Mia Ridge supplied a couple of interesting links to related posts she had published, one dating back to 2007!

Reply
EpiphanyLboro says:

14 March 2014 at 2:00 pm

Great post. I’m currently doing my PhD on a topic very similar to this and presenting a paper about it at Museums and the Web in a couple of weeks. The abstract is here: http://mw2014.museumsandtheweb.com/proposals/introducing-the-epiphany-project-discovering-the-intrinsic-value-of-museums-by-analysing-social-media/ … the thought of mashing any data I find up with the collection info hadn’t escaped me – indeed I think you need to use the collection info to find the Social Media data in the first place.

There’s one other researcher I know of (though there are probably quite a few I’ve missed) who’s specifically looked at Sentiment Analysis for museum exhibitions – Elena Villaespesa – from the Tate and studying at Leicester – she presented some stuff about it at M&W last year: http://mw2013.museumsandtheweb.com/proposals/diving-into-the-museum-social-media-stream/

If you find any more people looking into this stuff, do let me know?

Reply
James says:

14 March 2014 at 2:57 pm

Thanks David, it’s interesting to see those two papers. It will also be really interesting to see any results from these as although they aren’t looking at the specific issue of online collections that I was focussing on here, the methods they use would obviously be directly applicable. I’m also trying to sound out some contacts in the commercial sector to see if they use such measures for example to spot trends and highlight specific products in an automated way. Let’s keep in touch.

Reply

10 Comments

Leave a Reply to OpenGLAM