Linking collections – aiding discovery through Europeana

This blog post supports an ignite talk given at the Europeana AGM, Riga, November 2016 (slides)

Quick link: install Chrome extension demonstrator

Having worked for two years at Europeana, but now working for Imperial War Museums, a data provider, I wanted to ask the question:

What can data providers get back from Europeana, both for themselves and for their users?

When we provide data to Europeana, do we think what data we can get back? What others have provided that would be of interest to our own users? Individual objects often have rich information but are often dead ends. To me I want to know more. What else is around here? What happened on the same day? Did this artist paint other paintings? Where can I see all these things?

In the context of the Imperial War Museums’ (IWM) collections there are some fascinating possibilities given the geographic coverage of Europeana, extending the corpus of content relating to major conflicts. This could range from official documents, maps and photographs of battles from both sides, through to personal artefacts that relate to the service of an individual.

What are the possibilities? What approaches can we adopt? What tools could help?

Europeana Labs has an Apps showcase that aims to highlight the ways that Europeana content and the API have been used. It is clear that many stand-alone discovery tools exist, but from the perspective of data providers there are just a handful of examples, typically where a search box is provided alongside the collections, or there’s is an option within search results that allows a user to also explore Europeana.

Whilst such signposting appears useful, I wonder what actual use it gets, and what value it brings? At an individual record level I’m not aware of any sites (though I stand to be corrected!) where there is a link to the Europeana record, or indeed in most cases even any acknowledgement that the item is in Europeana. And yet this is where I see there being a great opportunity to tap into the data and tools that Europeana provides.

To explore this idea I have hacked a very quick demonstrator in the form of a Chrome browser extension. The concept is very simple – running on collections sites (currently a small sample) it detects if the item is in Europeana, retrieves the metadata through the Europeana API, and then based on recorded metadata for time, place and people (and broader concepts) it allows the user to display matches from across Europeana.

THE BATTLE OF RIGA, SEPTEMBER 1917 (Q 23904) General Oskar von Hutier, the Commander of German Eight Army, after the capture of Riga on 3rd September 1917. Copyright: © IWM. Original Source: http://www.iwm.org.uk/collections/item/object/205184140
THE BATTLE OF RIGA, SEPTEMBER 1917 (IWM Q 23904) General Oskar von Hutier, the Commander of German Eight Army, after the capture of Riga on 3rd September 1917.

Taking this IWM image of General Oskar von Hutier, the Commander of German Eight Army, after the capture of Riga on 3rd September 1917, not just the title but crucially the metadata tells us it was created on 1917-09-03 in Riga, Latvia and shows Hutier, Oskar Von.

© IWM (Q 23904)
Search results matching key terms through the Europeana API

Through the Europeana API we can then fetch the matches for each of these, and using some quite simple code and basic design give the user the ability to view them. So in this case they can find (at the time of writing) 285 things also relating to 3rd September 1917 (including many newspapers from across Europe), 8,271 items close to the centre of Riga, and 8 items, including photographs and articles, about Oskar von Hutier.

It’s as simple as that, but hopefully opens up a world of possibilities for cross-collection discovery!

If you want to see for yourself, install the Chrome browser extension and try some of the following links:

Imperial War Museums

Other collections

What next?

This has been implemented very simply and very quickly. Using better linked data approaches, entity recognition, similarity algorithms, and no doubt a whole host of other clever things, I’m sure it could be much improved. It could run as a browser extension, but equally the idea could be applied on specific sites so all users see results.  It could also be adapted specific themes, for example to limit the corpus from which results are retrieved to one subject such as 1914-18, or art.

Some background

It is worth mentioning that some similar tools do already exist, most notably the Eexcess Chrome extension. However, the aim here is to provide a demonstration of a lightweight tool focused on collection objects.

Some technical details

What about collections items that are not in Europeana? The idea can be extended by extracting concepts through other methods, either present in the page itself (as long as they are structured and identifiable) or through a collection’s own API.  The tool also does a little bit of its own interpretation of data to enhance results, something that could be easily expanded. An example is that if no date is present it parses the title and description to extract any date that can be detected. Another example is that the Imperial War Museums collections contain placenames but not geotags, so it looks up the placenames using pre-fetched results from the Google Maps API.

What are the challenges?

The discovery can only be as good as the metadata. Put succinctly, the better the metadata the more focused the results and the less noise. From the object point of view, if something has only been recorded as a photograph taken in Paris, then all the user can expect to get back is photographs, and other things from Paris. But let’s assume the item has been catalogued in detail, for example it’s a photograph of the Ancien Couvent de l’Abbaye aux Bois, at 16 Rue de Sèvres by Eugene Atget taken in 1908. You’d expect some great results with all the other items showing this building or by Eugene Atget, or created in 1908. But to achieve that they will all need to have that same metadata, or at least the search methods need to take account of differences in format, spelling, and language. My experience is that this varies extensively between providers, but that’s another reason to strive to improve what we all contribute.
Similarly there are some basic technical issues that mean some providers items are difficult to work with. One example is where their collections items are presented with urls that don’t match the ones provided to Europeana. Sometimes this can be coded for, but that just makes the tool harder to manage and less scalable. Sometimes it’s simply impossible (perhaps the providers could put unique Europeana IDs in their pages?)

Finally, by its very nature a browser plugin is going to be of very limited use as it relies on people discovering it, having the right technology, and installing it. The same code could easily be embedded by providers on their own collections sites, or better still the principle could be taken and developed in a far more integrated way.

Europeana plugins and embedding tools – test page

This post is a test page for various independently developed WordPress plugins that include Europeana driven functionality.

1. CHContext plugin

Sidebar results (see right) displayed based on the page tags (‘kitten’) using the CHContext plugin

2. DPLA & Europeana search plugin

Sidebar search box widgets (see right) created with DPLA & Europeana search plugin by José Fernández

3. EExcess WordPress plugin

Recommendations based on the phrase “art deco” sourced using the EExcess plugin (two using image format, one as simple citation link)

art deco

4. ImageSuite plugin

Image inserted with ImageSuite plugin

Photo Source

5. Europeana Attribution Tool

Attribution using the Europeana Attribution Tool
Europeana Food and Drink projectMade with Europeana

Still Life with Flowers and Fruit

Creator: Sande Bakhuyzen, Gerardina Jacoba van de

Provider: Rijksmuseum

Rights: http://creativecommons.org/publicdomain/mark/1.0/

6. embedr.eu

Search, discover and get flexible embed code for images from embedr.eu

6a. Zoomable image embedded via an iframe from dev.embedr.eu/europeana___2021657__232675/

 

6b. Custom sized static image cropped from the above image (using the IIIF standard) and delivered as an html snippet complete with attribution. In this example the image url is http://iiif.embedr.eu/europeana___2021657__232675/980,1398,507,617/400,487/0/native.jpg

Detail of ‘Brunet, Rôle de Mr. P??PIN dans Romainville, Vaudeville grivois’ | creator unknown | Museon | Some rights reserved.

7. culturepics.org

Embed code using image with caption option from culturepics.org/?x=500&y=300&collection=2022608

Dovrebane
Dovrebane
Source: Norsk Folkemuseum on Europeana

8. Europeana search widget

Customisable search widget (now deprecated) – more info at http://labs.europeana.eu/api/search-widget

9. LRE collection from European Schoolnet

Made from data harvested from Europeana

Testing the Europeana Search Widget

Disclaimer: I work for Europeana. But this is still great and I would have blogged about it anyway!

I was prompted by a new blog post from my former Kew colleague Anna Saltmarsh – Plants to pixels: enhancing access to Kew’s herbarium collections – to have a closer look at the Europeana search widget. It can deliver targeted search results directly on external pages – everything from private blogs to institutional data provider websites. There’s a really handy wizard that lets you create your own widget, with different themes and styles to suit most needs. Crusially though you can also tap into the power of the Europeana API to control what is displayed and what your users can then search for.

Here’s an example of the code that allows you to quickly and easily search Kew’s content, in this case looking for palms:

<script type="text/javascript" src="http://www.europeana.eu/portal/themes/default/js/eu/europeana/min/EuSearchWidget.min.js?sw=true&query=palm&qf=DATA_PROVIDER:{Royal+Botanic+Gardens%2C+Kew}&withResults=true&theme=dark&v=2"></script>

And a live example, looking at user-contributed content to the Europeana 1914-18 project:

Embeddable images from external sources – a few tests

With today’s announcement from Getty of an embeddable viewer, I thought I’d test a few related services and see how they work at both a technical and practical level. This page is just to demonstrate them, and I’ll write up results and opinions after I’ve had a chance to test them here.

Getty Images – Embed Viewer

Getty have announced a new embed code. It uses an iframe and comes with a standard size and format and a ‘robust’ set of terms and conditions. Further details and instructions at www.gettyimages.co.uk/Creative/Frontdoor/embed

Getty Images – WordPress plugin

With all the hype of the Embeddable Viewer launch, it seems the official Getty Images WordPress plugin has been rather overlooked (just over 400 downloads in 2 months is low for a mainstream WordPress plugin). It’s easy to find images and use them, but what’s with teh watermark, and the very first line of the conditions says “Grant of License. Getty Images grants to you, for a period of thirty (30) days, a non-exclusive, non-sublicensable, non-transferable and non-assignable right to use the image and/or film preview file you have selected and any derivatives or copies (collectively, the “Licensed Material”), on your personal computer and, in the case of film, in any test, sample, comp or rough cut evaluation materials. The Licensed Material may only be used in materials for personal, noncommercial use and test or sample use, including comps and layouts.” So can I even use this one here? I’m confused!

EDIT: After writing this and trying to publish the post I then got the message “WARNING: You may not publish posts with Getty Images comps. Download the image first in order to include it into your post.” which kind of explains this. So it’s really just a tool for publishers to find content and create drafts with a view to purchasing a license. That explains it I guess. But I wonder if they have missed a trick – why not add an option to the plugin which allows users to add the simple embed code into a WordPress post?

Flickr

Flickr provides embed code (using an iframe like Getty) or an html snippet (all subject to the owner’s privacy settings, but in the case of Flickr Commons both of these are available throughout). The player gives title and attribution, the ability to favourite (requires Flickr login), and also the navigation to other images, presumably adjacent images from the same user. Oddly the html code option has title and alt text for the link and image respectively, but does not visibly display the image title or owner.

Flickr embed:

Flickr html:

Dr William Bland, ca. 1845 / photographed by George Goodman

Pinterest

Pinterest is a slightly different case as it’s not a primary source for images, but I thought it was still interesting. They use an approach more like Facebook Like and Twitter buttons, displaying the Pin using javascript.
EDIT: I’m finding that the Pinterest code has a habit of corrupting when editing in WordPress, so it doesn’t look like it’s very WP friendly!
 

CulturePics

A month or two back I put together a small hack at culturepics.org using Flickr and Europeana images for people to create and download or link to at the specific size they needed. Providing easy-to-grab html snippets was a key feature.

Miss Sarah Hodges of Salem
Miss Sarah Hodges of Salem
Source: George Eastman House on Flickr Commons

Europeana

Thanks to David Haskiya from Europeana (see commment below) I’ve been pointed to this example from Europeana exhibitions of an experimental embed code.

Creative Commons License

Any more?

Any more suggestions of good (and bad) examples are welcome, and if you leave them in the comments I’ll add them in here.

Sentiment Analysis for Cultural Collection objects – aka how to identify the good stuff

Forgive me if this is an old idea, but I wanted to throw it out there and see if it has any mileage.

Sentiment Analysis is a technique widely used in marketing, and especially social media, to get a measure of the popularity of a brand or product. My question is whether the same techniques could be used to find the very best stuff in cultural collections, based on what people are sharing and talking about online.

The problem is this: I regularly use online collections, whether via a web interface or increasingly through APIs, but almost always end up with a wide range of results that I have to scour through to extract what I would call the decent stuff. Yes, many collections provide tools where I can drill down by facets like date or keyword, or maybe if there is a digitised version available, but what about the less tangible measures around quality and interest – in other words the ‘wow factor’?

Flickr is the one tool I know where this is actually available – if you do a search or use many of the API methods you can sort by what they call ‘interestingness’, a mystical measure based on an unknown formula that involves the number of Likes, Comments, Views, Tags and no doubt other factors. They even tried to patent the concept of interestingness and have no doubt, like Google’s rankings, continuously tweaked the algorithm over time.

So, whether it’s a small museum collection, or millions of records in aggregators like Europeana, DPLA or Trove, has anyone tried to do this for cultural collections and if not, how could it be done?

I’ve used the term sentiment analysis in the title of this post but I think it’s actually rather simpler than that, so here are a few examples of quantitative metrics I feel could provide the basis for this seemingly qualitative measure.

Web analytics – most collections with have Google Analytics or something similar measuring page views. Can we assume that the most visited objects are likely to be the best ones?

Referrals – taking one particular element of analytics, if people are being directed to specific objects from external sources (or even internal ones) surely this means that those objects have something of interest about them?

Social media – in a similar vein, if someone posts a link to an object, for example on Twitter, Facebook, Pinterest, Instagram or any other social channel, even if that doesn’t result in any referrals it’s another sign that someone, for whatever reason, has identified it as being noteworthy. And thinking back to the actual practise of sentiment analysis, if the original post or any replies/comments use emotional, positive words then those would have to score even more highly.

So if we can extract those metrics and combine them in clever ways, won’t we be able to identify all the great stuff?

Without going into detail, here are some interesting links to explore:

Mainly just a place to test things out