Performance Optimization, Not Premature Optimization

At the Brooklyn Museum, we like to take inspiration from many things. After recently watching “Mad Max: Fury Road,” we realized to make our servers go faster, we should have a dedicated staff member spit gasoline into a combustion engine connected to one of our servers…vroom vroom!

Nitrous makes cars go faster. Servers not so much.

Nitrous makes cars go faster. Servers not so much.

All jokes aside, for most consumer/public facing apps, performance is a very serious consideration. Even if your app has the best design, bad performance can make your Ferrari feel like a Pinto. While performance can mean many things in a technical context, in this post I’ll just be talking about the performance of our back-end.

As I mentioned in an earlier post, we use an internal API to power both our mobile app and the dashboard our Visitor Engagement team uses to chat with visitors. This API has to be able to not just handle requests, but do it in a very performant way. This is especially true given the nature of ASK which revolves around a real-time conversation between visitors and our Visitor Engagement team.

When taking performance into consideration, it’s easy to fall into one of the deadly programming sins: premature optimization. Premature optimization is what happens when you try to optimize your code or architecture before you even know if, when, and where you have bottlenecks. To hedge against this, the solution is simple: benchmark your application. Since I’m just talking about the back-end in this post, application in this context means our API.

When we benchmark our API, we’re not just benchmarking the webserver the API is being served from; we’re benchmarking every component the API is comprised of. This includes (but not limited to) our webserver, API code, database, and networking stack. While our back-end is relatively simple by industry standards, you can see from this list that there are still many components in play that can each have an impact on our performance. With so many factors to account for, how do we narrow down where the bottlenecks are?

Similarly to power plants, back-end servers also need to be able to meet peak demand.

Similarly to power plants, back-end servers also need to be able to meet peak demand. photo credit: 2009-09-10 AKW Gundremmingen via photopin (license)

Well first we have to ask ourselves, “What is an acceptable level of performance?” This is a question you can’t answer fully unless you also add the variable of time to the equation. Similarly to the way power utility companies determine how much electricity they need to generate, we also look at the same thing: peak load (see also: Brown Outs).┬áPeak load is simply how much load do you anticipate having during the busiest times? If we know our system can handle peak load, then nothing more needs to be done in terms of optimization.

In practice, our real bottlenecks are most likely to be the human element of the equation: our Visitor Engagement team. Since we only have a few working at any given point in time, and the fact that quality answers can sometimes take a little while to come up with, having too many people asking questions and not enough people answering can be our worst bottleneck at times. That being said, when we’re optimizing for a certain load average on our backend, we didn’t want to just aim for that number; we wanted to aim a bit higher to give ourselves some cushion.

So how do we actually figure out where our bottlenecks are? In essence, this is a basic troubleshooting problem. If something is broke, where is it broke? Often times the simplest way to figuring this out is by isolating each component from each other and benchmarking each by itself. Once you have a baseline for each, you can then figure out where the bottleneck lies. Depending on what the actual bottleneck is, the solution can vary wildly and can be a massive engineering effort depending on the scale at which your application operates at. I recommend reading engineering blog posts from Facebook, Netflix, and other companies dealing with extremely large scales to get a better sense of what goes into solving these type of technical problems.

At the end of the day our number one priority is providing a great experience for our visitors. Our back-end is just one piece of the overall effort that goes into make sure that happens, and when it’s running well, nobody should notice it at all. Kind of like a well-oiled machine running quietly in the background…so cool, so chrome….

Start the conversation