Extremely smart people dedicated to the field of machine learning have made tools that are not only better, but far more accessible than they have been in the past. We don’t have anyone at the Brooklyn Museum who’s an expert in machine learning, but because of the improvements in machine learning tools, we don’t have to be. If you’ve been following our series of blog posts you’ll remember we talked previously about the accuracy issues we’ve had with iBeacons and the problems that poses for us—primarily that decreased accuracy in iBeacon results means delayed response times to visitor questions. Since providing a seamless, personal, educational, and ultimately extremely rewarding experience is the foundation of what ASK Brooklyn Museum is built upon, anything we can do to improve the efficiency of the system is a project worth taking on.
One of the tools we’ve leveraged to help us in this goal is called Elasticsearch which is a full text search server. While not strictly just a machine learning tool, it uses many NLP algorithms (which are machine learning based) in its backend to do more intelligent text searching to match similar phrases. This means instead of doing a ‘dumb’ search that has to match exact phrases, we can do more fuzzy style searching that can locate similar phrases. For example, if we have a block of text that has the phrase ‘Where are the noses?,’ if we did a search using the phrase ‘What happened to the noses?,’ the first block of text would be returned near the top of the results.
This particular use case is exactly what we were looking for when we needed to solve a problem with our snippets. We’ve talked about snippets in a previous post, but just to recap snippets are pieces of conversations between visitors and our audience engagement team about works of art in the museum. Due to their usefulness in not just highlighting great conversations, but also in their acting as a sort of knowledge base, snippets have become an integral part of ASK. This means we’d like to create snippets as much as we can to grow this knowledge base and help spread the wisdom gleaned from them. However, over the course of this process it’s easy to accidentally create snippets for the exact same question which clutters the system with duplicates. This is problematic not just for search results but also because all snippets go through a curatorial approval process and seeing the same snippets creates unnecessary extra work for everyone involved.
In addition to solving the problem with duplicates, cleaning up the system means we can much more accurately track the most commonly asked questions. All of this happens as part of a seamless process during snippet creation. When an audience engagement team member begins the snippet creation process, the dashboard automatically queries our Elasticsearch server to look for similar or duplicate snippets. These search results show up next to the snippet editor which makes it easy to quickly find if there are existing duplicates. If a duplicate snippet does exist, the team member simply clicks a “+1” counter next to the snippet. This increments a number attached to the snippet which we can then use for various metrics we track in the system.
Just based on our short time using machine learning tools, it’s clear how powerful the advantages of using these tools are in the here and now. We’ve already talked about how they’re already improving our response times, metrics, and knowledge base, but that may just be the tip of the iceberg. The AI revolution is coming and as tools get more sophisticated yet simpler to use, the question isn’t if you’re going to use it, but when and how?
As the lead developer at the Brooklyn Museum, James helps maintain and grow internal applications, cloud infrastructure, and is currently developing the API behind the Bloomberg Connects project. He previously ran a tech startup in Singapore before joining the museum in early 2013 and has backpacked through Asia during his more adventurous years. He's also an avid gamer and budding game developer/pixel artist.