Artillery load testing toolkit

Decision

Initially we decided to use Artillery for load testing/stress testing our back-end systems to verify and ensure that our infrastructure and our systems work as expected and to understand our current performance limitations better.

In the process of setting up the project it has become apparent that there are some limitations or current issues, like e.g. only experimental typescript support for running custom processors, bundling issues when deploying tests via AWS Lambda or randomly failing AWS Fargate test deployments. It's proven useful for our current testing use case when deploying tests via AWS Fargate using a csv payload, but our current decision is to consider and explore alternatives as well.

Problems

Currently, there is no good way in which we can reliably load test our API endpoints, or our back-end systems in general. We have mainly relied on using bash scripts that were ran locally from a developer machine. Neither did this provide distributed load testing at scale, i.e. simulating a customer sending a large amount of requests per second over an extended period of time, nor did it come with performance metrics reports for our tests. Also, it was not possible to set up continuous testing scenarios for load testing specifically.

Context

We ran into major issues with one of our customers due to us not being able to handle the number of requests per second they had sent to one of our endpoints. They had sent at a rate that we had provided them, a rate that we had, to our best understanding and knowledge, verified against our pre-production environment with shell script based load tests.

We could see that our system was able to cope with the load for a certain amount of time but eventually ran out of memory. This highlighted that we need to be able to run sustained production grade load tests against our endpoints for a reasonable duration to simulate production scenarios which does not seem feasible locally with shell scripts. On the other hand it highlighted that we need to able to run tests against our production environment since the database response times we saw in the pre-production environment differed vastly to our production environment, which was unexpected and made the whole situation even worse.

Options

We did not consider other solutions at this time. It was simply something we had heard of or read about and at first glance fulfilled all requirements in terms of project maturity, github issues management, feedback from others and npm downloads. We needed something quick and felt comfortable to give it a try. We might decide in the future to use another tool instead.

Reasoning

We decided for Artillery testing toolkit because it is, for once, the only one we looked at, secondly, it comes with a good documentation and easy to implement solution and allows to run distributed load tests using AWS Services (Lambda and Fargate). It provides a straight-forward way to set up testing scenarios and each testing scenario can be deployed with its Artillery CLI to AWS with automatic clean-up of all created resources. Also, it comes with integrated performance metrics and can be integrated with other tools, i.e. for monitoring and observability in order to be able to correlate other metrics, or extended with custom integrations with Javascript.

Consequences

How do we implement this change?

As a starting point, we will create a separate project "Artillery" and install it as a dependency. It will be where the YAML files for the testing scenarios live. We will set up a default VPC for this project on our AWS Development account so that we can deploy our test scenarios with AWS ECS and run them from there. In the future we might move Artillery into the projects that are actually being tested and create the VPC necessary for ECS deployments elsewhere.

Who will implement the change?

Collect team will set up the project and corresponding infrastructure and create a first test scenario.

How do we teach this change?

After getting some experience with the testing toolkit we will offer to the other teams to run a session/workshop to share our learnings. Part of this will also be a brief introduction to Best Practices on how to approach production load testing, especially in regards to understanding the associated risks, i.e. overloading our system and creating a negative impact for our daily operations and customers and what to do if the worst-case happens, i.e. how to kill a running test scenario.

What could go wrong?

Along with the decision to use this toolkit comes the intention to test our production environment to get a more realistic understanding of the performance of our systems and its limitations. Also by using this toolkit we will be able to create a considerable load on our system. We could accidentally overload our system to a degree that it creates issues for customers or creates unnecessary infrastructure costs. Since we did not consider other solutions at this time it might also turn out that this solution is not what we expected and not suitable for our needs. Another possible risk could be to unintentionally expose secrets like i.e. API keys in the test YAMLs.

What do we do if something goes wrong?

One risk with using Artillery for production load testing is that we could accidentally overload our system. To mitigate this risk we decided to deploy our testing scenarios to AWS Fargate only which will allow us to kill the corresponding container to abort the testing scenario, similar to how we can kill a testing scenario when running it locally. If we were to deploy our testing scenario to AWS Lambda we would not be able to kill a running Lambda function if need be, it would run until it completed its task or until the total execution time limit of 15 min is reached. We will document in the project README how to got about killing a testing container on AWS Fargate.

What is still unclear?

Currently we are using Artillery only manually via the run script to deploy when needed to carry out a specific load testing scenario for a very specific endpoint. It is still unclear how to approach and setup continuous load testing, if at all. Artillery offers integration with various CI tools i.e. Github actions and can be run i.e. when a branch is merged into main or at a desired time similar to a cron job which would be more favourable so as not to load test at random times, possibly during peak hours. As of yet it's also still unclear how beneficial it would be overall and what specifically to test so that it actually provides a representative picture of our system as a whole.
We don't have any experience in setting up the tests, so we will need to explore a little how to create testing scenarios that fit our needs.
It's also still unclear if we want to use Artillery long-term since we haven't looked at alternatives due to the fact that we needed something quickly that did the job.

Related ADRs

ADRs