Visualising the entire dataset from Operation War Diary by time and place in a two minute animation.
When I was given access to the growing corpus of data being crowdsourced through Operation War Diary I wanted a quick way to understand what it contained. As the data, beyond the names of those mentioned, is focused around dates and locations I wanted a map that would change through time, and in looking for examples I stumbled across this animated heatmap of collision data from New York City and knew it would work perfectly.
Getting the data into the correct format was the first challenge. The dataset I had at the time contained nearly 300,000 records, with over 10,000 distinct place names mentioned, from 10th Avenue to Zwartelen. Needless to say, with historic and highly localised data, geocoding these was a bit hit and miss (I used the Google API) but I just wanted a dataset to experiment with, so the 5,400 locations I gleaned was fine for this purpose. A key point was that they represented 160,000 diary entries, so a good enough sample. An interesting side-effect of doing this was that in many cases it managed to harmonise the data – there were 10 different spellings of Poperinge alone, and the modern version wasn’t the most common!
The animation presents just the data for casualties and, obviously, only those where the place could be geotagged. I am acutely conscious how at each stage of this process the sampling and the processes may have led to biases and inacuracies, but this is presented simply to whet the appetite as I feel there is so much potential in this data.