Mining Data With Limited Tools

In my last post, I laid out some of the challenges working with the current metrics dashboard and the data exporting process for ASK. Despite the limitations of available metrics, edited snippets, and overwhelming amounts of data, I was able to work my way around many of these issues in some way (a lot of Google docs and spreadsheets). However, each solution then presented more challenges.

In regards to limited dashboard metrics, there were some easy (and some not so easy) fixes. To get a better sense of where people were using the app, I was able to look at chat stats by the various locations where beacons fired. I compiled these numbers by year as well as all-time into Google sheets and could make simple visualizations that way.

Google sheets allows for basic data viz, but the data has to be updated manually, which is a time-consuming process.

Google sheets allows for basic data viz, but the data has to be updated manually, which is a time-consuming process.

However, with this solution came new (currently unsolved) challenges. For example, there is no way to update this data in real-time and, as you can see, my chart reflects data from January 29, 2019. Additionally, locations are only attributed to chats if visitors use the app. If they use the texting feature—about 70% of ASK usage for the last year —there is no location data available.

For other metrics and visualizations I was able to generate chat stats and export that as a CSV. This gave me a spreadsheet of chat ID, start time, date, chat duration, host/s name, exhibition location, and device type. With some hefty data manipulation, I was able to parse out device type variations, chat locations, show chats by year and records by year.

Basic data viz allows us to see overall trends for further exploration, even if the data can't be updated in real time.

Basic data viz allows us to see overall trends for further exploration, even if the data can’t be updated in real time.

Again we see limitations with this solution, which is another instance of manual data manipulation leading to dated visualizations (October 2018 in this instance). Additionally, there are huge changes and spikes in 2018 both for SMS (text-messaging) and overall chats recorded. These spikes are likely reflective of the Bowie Trivia popularity, however, there is no easy way to remove those records from the exported CSV file.

For some challenges there were no simple solutions. Edited snippet content became a challenge that just had to be accepted as an overarching limitation. The most efficient way to analyze what visitors were asking based on collection was through snippets because unlike chats, snippets would be tagged by collection. As mentioned earlier, chat locations are based on beacons firing through the app.  So a chat could actually have an attributed location, but not necessarily reflect where the object being asked about is actually located. For example, a visitor could take a picture of an object on view in the Ancient Egyptian Art galleries, but send the question from the European galleries, therefore the chat would be marked as occurring in European despite actually being about an Egyptian work. However, when the ASK team processes snippets, they tag the artworks with their associated collection making it easier to look specifically at what people were asking based on the type of object. This also bring me to my next challenge of combining and exporting data.

In order to read through the different snippets associated with collections I had to sort the dashboard metric ‘How many snippets have been created (via collections)?’ by each collection and export the snippets four times by snippet editorial status (draft, team approved, curator approved internal, and web approved). Through this method I was able to analyze the various questions asked by collection and draw out themes by collection (which I will come back to later).

Of course, within this framework, I had to be selective in what I attempt to do with the available data since there is a great deal of content and my time on the project is limited. In my next post, I’ll present some of the initial findings as a result of these workarounds.

Start the conversation