News & press releases

Benchmarking Real-Time Serverless Applications with owperf

Benchmarking real-time workloads requires suitable tooling for measuring event-to-response time. The CLASS project developed a suitable benchmarking tool called owperf for serverless workloads based on Apache OpenWhisk. In this article we present a short introduction to owperf and its capabilities.

Introduction

The CLASS project aims to deliver real-time big-data analytics. This implies responding to events with analytics computation distributed across the compute continuum (from edge to cloud) that yields desired insights. At the cloud, the CLASS architecture employs the serverless platform of Apache OpenWhisk [1] as a polyglot event-driven programming model. This way, different analytics tools, depending on the developer’s choice, can be invoked in response to events.

However, maintaining real-time response to events further requires that the responding code needs to execute under certain latency constraints. For example, response code execution may need to complete within a certain deadline after the triggering event’s occurrence, or it may need to be executed in a certain frequency, e.g., in response to a stream of events. Proving those real-time properties typically involves an empirical performance evaluation, in which events are generated, e.g., in a specific frequency, and the response time is measured and analyzed, looking for average, extremes and variance (“jitter”).

While Apache OpenWhisk (OW) has always provided tooling for performance evaluation of direct synchronous function invocation, such as wrk and gatling [2], it seemed to lack event-based tooling. Bridging the gap, CLASS has developed such a tool for OpenWhisk, called owperf [3]. This tool was presented to the OW community, accepted, and then merged into the core OW distribution. In the rest of this article, we will present a short overview of owperf.

Meet owperf

owperf provides several unique capabilities previously not found in OW performance tooling. First and foremost, it allows to benchmark rules, which bind events to actions (functions) in OW. Note that unlike a function invocation, a rule execution flow of event-response does not return to a caller, because there is no caller – so a standard RTT (Round-Trip Time) cannot be collected. Therefore, in order to measure the total time it takes from an event triggering (happening) to the response finishing, owperf cross-references¹ the local time at which the event trigger is invoked, with response processing timestamps taken from OW’s activation database. This technique leads to two more unique owperf capabilities – the ability to benchmark asynchronous invocations (i.e., invocations that do not return to the caller) and a detailed profiling of all benchmarks – including the overhead of OW itself – by leveraging the detailed timestamps in the OW activation DB. All this useful data – without needing to instrument OpenWhisk internally with monitoring code.

In essence, owperf is a CLI tool, written in node.js, and has a rich set of options. Let’s look at using it for benchmarking events/rules through a simple example:

./owperf.sh –a rule –w 4 –i 1000 –S –T –e myEvent

Let’s break this down:

owperf.sh is the executable (a bash script that wraps the main owperf.js code).
–a rule specifies that this benchmark is of rules (events-to-actions).
–w 4 specifies that owperf should spin up 4 concurrent client workers.
–i 1000 specifies that the benchmark will last until the master worker (one of the client workers is always the master controlling the benchmark) has finished 1000 test iterations.
–S specifies “no setup”, meaning that owperf should use a custom setup of event, action and rules prepared by the user, instead of creating it. This is a typical choice for benchmarking serverless applications, combined with settings of event (for rule benchmark) or action (for action invocation benchmark).
–T specifies “no teardown”, meaning owperf should not remove the setup after the test. Again, a typical choice for benchmarking pre-made applications.
–e myEvent specifies that owperf workers should invoke the specific event trigger called “myEvent”.

So what does the resulting benchmark look like in this example? It’s a rule (event-response) benchmark, using an event trigger called “myEvent”, setup and connected to actions via rules, all prepared by the user as part of the application. The arrival process is 4 concurrent streams of event triggers (by 4 client workers). The default frequency of invocation per worker is once every 200 msec (5 per second), so we expect an average load of 4x5=20 events per second. The benchmark would last 1000 iterations of the master worker – around 5-6 minutes. Note that the benchmark only starts after a warmup period , in which each worker performs an initial sequence of invocations. The net benchmark time after the warmup is roughly 1000/5=200 seconds (almost 3.5 minutes) and there is a quiescence stage of 1 minute at the end, letting all response executions complete, and activation records be ready for processing. All aforementioned parameters – invocation frequency, warmup, duration of benchmark and post-processing quiescence – are configurable via CLI, and many others.

Putting it all together

Let’s reconsider our example in a full use-case. A developer starts by setting up her OpenWhisk application – deploys actions, event triggers, rules and packages. Specifically, she deploys the “myEvent” trigger and binds it to an action through a rule.

After running some system tests to make sure the application works, our developer is now ready to do some performance tests. She downloads owperf to her laptop and makes sure it is set up according to the owperf page [1]. Next, she launches the example command, showing output that ends similar to this:

As can be seen above, owperf generates as a result a large CSV record of data that contains both input fields (benchmark settings) and output fields (measured results). Additionally, owperf prints a CSV header for the record. So, our developer pops up her favorite spreadsheet, and then imports into it both the CSV header and the data record. If she runs additional benchmarks, she can silence all output except the data record (using the -q CLI option), making it simple to add more rows to the spreadsheet table, as can be seen below:

Now, she can reason about the results and make some calculations. Here are a few fields of interest:

output.oea.{min/max/avg/std} – statistics of OEA (Overhead of Entering Action) – the time it takes from the origination of the event to the start of the response.
output.ad.{min/max/avg/std} – statistics of AD (Action Duration) – the time it takes for the response action to execute.
output.attempts.tp – rate (throughput) of incoming invocations per second (arrival rate). In our example, should be close to 20 requests/sec as discussed previously.
output.invocations.tp – rate of completed requests per second (service rate). Depends on how much throughput your OpenWhisk service / deployment can sustain (but not higher than the arrival rate, of course).

There are, of course, many more interesting fields, such as errors, more detailed profiling, etc. Check the owperf page [3] for details.

Conclusions

In this article we presented the owperf tool, and using it for performance evaluation of OW applications. However, there are many more use-cases for owperf. For example, it can be used to micro-benchmark your OW deployment or cloud service, to tune it to your needs. Furthermore, owperf can be used to construct multi-workload and multi-tenant test scenarios, in which multiple owperf instances work concurrently to stress and benchmark a complex system. And there is more – dig into the owperf page for details. Just download and start hacking!

Cross-referencing timestamps in a distributed system is, expectedly, subject to clock skew issues. However, owperf assumes that standard measures such as NTP synchronization as implemented across all machines, so that actual clock difference is limited to several (<10) msec, which is acceptable error.

¹ Cross-referencing timestamps in a distributed system is, expectedly, subject to clock skew issues. However, owperf assumes that standard measures such as NTP synchronization as implemented across all machines, so that actual clock difference is limited to several (<10) msec, which is acceptable error.

Bibliography

[1] Apache OpenWhisk, "Apache OpenWhisk is a serverless, open source cloud platform," Apache Foundation, [Online]. Available: http://openwhisk.apache.org/. [Accessed 21 June 2018].

[2] A. O. Community, "Apache OpenWhisk - Performance Tests," [Online]. Available: https://github.com/apache/openwhisk/tree/master/tests/performance. [Accessed 1 2020].

[3] E. Hadad, "owperf - A performance evaluation tool for Apache OpenWhisk," IBM Research, 2019. [Online]. Available: https://github.com/IBM/owperf. [Accessed 3 2019].