News & press releases

SLA Predictor to provide Real-time guarantees from the CLASS Cloud Computing Platform

Introduction

In the last part of the project, we have performed several tests to validate the Cloud Computing Platform scalability features of the CLASS project. We have used a test COMPSs workflow application simulating a big-data analytics workload in a distributed environment. COMPS workflows distribute its computation tasks in a number of workers (i.e., slaves, instances, pods) to execute in parallel. The initial number of workers is defined by the COMPS master at launch time, and it is scaled in or out during execution time by Rotterdam, depending on the assessment done by the SLA Manager component.

Description of the scenarios

We run our tests in two different scenarios, with the aim of comparing execution times of the workflows with and without auto-scaling and thus, analyse the effect of the scalation overhead in the final execution time:

	Scenario I: COMPSs workflow without auto-scaling In this scenario, Rotterdam‘s adaptation engine will be disabled; thus, it will not scale out or in the number of workers during run time.	Scenario II: COMPSs workflow with auto-scaling We will use the same COMPSs master application and Rotterdam with its adaptation engine enabled. An SLA will be created for each workflow at deployment time.
Description of the scenario	The COMPSs master application uses Rotterdam’s REST API to launch a workflow in the Cloud. (1) The master defines the number of workers to execute the parallel tasks (2) Rotterdam creates these workers (pods) in the cluster at launch time (3) COMPSs executes the computation tasks in parallel in these workers	First steps are the same as in Scenario I. (4) Missed deadline metrics generated by COMPSs master application are stored in Prometheus. These values are related to the expected total execution time of the workflow. The SLA is verified against this metric. (6) If there is a missed deadline (a computation task takes more time to complete than expected) the SLA will notify Rotterdam, that will scale out the number of workers (7) COMPSs master application will redistribute the computation tasks in the new worker nodes before continuing the execution of the workflow.
Description of the experiment	To run the same COMPSs workflow in a “static” environment (fixed number of workers for each execution and same conditions for each cluster), without the use of the features provided by Rotterdam. First, we tried with only one worker per workflow, then with three and so on.	First, we execute the workflow with only one initial worker before starting to apply the autoscaling features. Then, we perform the same test starting with three initial workers and scaling out to
Results
Analysis of results	The average workflow duration is 608 seconds with only one worker. It decreases to 584 seconds with 3 workers in parallel. With more than three workers, there is a great decrease in the execution time. 466 seconds with 6 workers, and 388 with 12 workers. We observe that there is a limit in the initial number of workers to assign to the workflow, before it stops decreasing the total duration. In principle, it seems that with 12-15 workers for this workflow we found this limit.	1. The workflow execution with one initial worker and scalation to three workers during execution time takes about 70 seconds more than executing the same workflow (only one worker) without scaling it out. Only after scaling out from one worker to nine or more, we start to see an improvement in the execution time. 2. We observe no benefit in execution time scaling up the workflow from three initial workers to six final workers. In fact, the execution time end up being a bit higher than without scalation. This is the same situation we presented in the previous tests. But when we scale to nine or 12 workers, we can see a small improvement in the final execution time

Description of the scaling overhead

We have identified the following factors to explain why scaling out doesn’t always bring better final execution times.
1.   The metric that controls the scalation (missed deadline), produced by COMPS, shows several minutes after the workflow execution starts.
2.   There is a (configurable) gap of 20-30 seconds between the appearance of the metric that controls the scalation, and it’s detection by the SLA.
3.   Rotterdam needs some time to create new workers and services required by the COMPSs workflow.
4.   The COMPSs master application takes time to detect the new available workers and to redistribute the tasks in the new workers.

All together, these "scalability" tasks take 60 to 90 seconds. Long-time workflows benefit from the Cloud Data Analytics Scalability, but for medium-time applications, like the one used in these tests, the improvement in the final execution time obtained by scalation is not as optimal as desired.

Architecture with time guarantee

As explained before scaling out the number of workers doesn’t always lead to better execution time. The overhead of the scalation process can affect elasticity, especially in short-time applications. Also, missed deadline might not be the optimal metric to drive QoS. As we focus on the development of cloud services towards real-time response, we should address this issue.
If we could predict the initial optimal number of workers for a target execution time, we will avoid any horizontal scaling out that doesn’t drive to an improvement in final execution time. In this section we will describe how we have improved our SLA Manager with an intelligent module, called SLA Predictor that helps us anticipate to missed deadlines by obtaining the optimal number of workers for a desired execution time, on a certain state of the system.

Rotterdam and SLA Predictor integration

We have added a new service to the SLA Manager. This new service is responsible for calling the Machine Learning subsystem, SLA Predictor, to get the recommended number of workers for a COMPSs workflow based on a desired total execution time.

Figure 1: Rotterdam and SLA architecture with SLA Predictor

The QoS requirements will be defined by a new metric called “execution_time”, which express the desired total execution time for the workflow. The following diagram shows the new sequence for COMPSs workflows deployment using the new ML capabilities provided by the SLA Predictor component:

Figure 2: New sequence diagram of COMPSs workflows

The COMPSs master application specifies the initial number of workers and total execution time as QoS parameters.
As before, Rotterdam calls the SLA Manager to create the agreement with the new QoS parameters.
The SLA Manager calls the SLA Predictor to verify the QoS parameters, thus, to check if the workflow can be completed in the desired execution time with the initial number of workers.
The SLA Predictor replies the SLA Manager with the optimal number of workers for the desired execution time in the actual state of the cluster, based on previous executions.
This information is sent back to Rotterdam.
Rotterdam deploys the initial number of workers advised by the SLA Predictor (that might defer from the initial number of workers specified by COMPSs master).
Rotterdam creates the SLA for this execution time. If the execution time is not being met, more workers will be deployed.

In the next section you can see a description of the improvement in the execution time obtained with this new architecture.

Validation of the architecture with time guarantee

We have done a set of tests using this new component (SLA Predictor). The objective of these experiments is to show the impact of using the SLA Predictor before deploying the application in the Cloud and compare it with the previous experiments. Instead of scaling out the application during the execution time, now we use ML technics to predict the behaviour of the workflow in the Cloud, to adjust the initial number of workers before launching the application.

	Scenario III: COMPSs workflow with SLA Predictor
Description of the scenario	We run the same COMPS workflow application than in Scenario I and II. We call the SLA Predictor with these parameters: • Initial number of workers: 3 • Desired total execution time: 540000 ms The result of the call to SLA Predictor is 6 workers Rotterdam and the SLA Manager create an SLA where the constraint is to execute the workflow in 540000ms or less (time guarantee) with 6 workers.	We run the same COMPS workflow application tan in Scenario I and II. We call the SLA Predictor with these parameters: • Initial number of workers: 3 • Desired total execution time: 480000 ms The result of the call to SLA Predictor is 9 workers Rotterdam and the SLA Manager create an SLA where the constraint is to execute the workflow in 480000ms or less (time guarantee) with 9 workers.
Results	Real execution times (in ms) obtained with 6 workers	Real execution times (in ms) obtained with 9 workers
Sample Execution Logs	[(6, (470001, 1000000)), (9, (420001, 1000000))] [pid: 48\|app: 0\|req: 23/39] 192.168.7.28 () {34 vars in 488 bytes} [Tue Mar 16 08:20:37 2021] GET /predictSLA?workers=3&exectime=540000 => generated 1 bytes in 12138 msecs (HTTP/1.1 200) 2	[(6, (450001, 470000)), (9, (410001, 420000))] [pid: 49\|app: 0\|req: 17/40] 192.168.7.28 () {34 vars in 488 bytes} [Tue Mar 16 08:36:04 2021] GET /predictSLA?workers=3&exectime=480000 => generated 1 bytes in 14645 msecs (HTTP/1.1 200) 2 headers in 78 bytes (1 switches on core 0)
Description of results	• The desired execution time (540000ms) is fulfilled in the average (t=526776 ms) • From the logs provided by the SLA Predictor, we can see that some of these last experiments were executed on the most stressed level of the system (when running the first experiments we didn’t have this information, provided by SLA Predictor). • Only in one case (#5) the execution time is higher than the target value (540000ms) and the SLA Manager generates a violation to scale	• The desired execution time (480000ms) is fulfilled in the average (t=476695ms) • From the logs provided by the SLA Predictor we can see that the execution times predicted by the SLA predictor, tend to be more optimistic than real final execution times obtained on the experiments • Only in two cases (#1 and #5) the execution time is higher than the target value (480000ms) and the SLA Manager generates a violation to scale.
Conclusions	• The average execution time always fulfils the SLA • The results obtained are constraint by: o The level of stress of the cluster and network infrastructures. o The initial data used for training the model. • The SLA Predictor selects the minimum number of workers, to avoid scalation and its overhead and thus sometimes having a slightly higher execution time than desired

Measurement of the improvement from scenario II to scenario III

To really assess the benefits of the SLA Predictor, we must compare the results of Scenario II (Rotterdam with auto-scaling and deadline misses as QoS target) with the results of Scenario III (Rotterdam with SLA Predictor and execution_time as QoS target). In both cases, in the average, when avoiding to scale there is an improvement in the final execution time.

Now, when avoiding to scale from 3 to 6 workers we can see an improvement of 67132 ms in the total time to perform the workflow. It grows to 81293 ms when the scalation from 3 to 9 workers is circumvented.

Conclusion

We can conclude that with the SLA Predictor and the new QoS metric (execution time vs deadline misses), we avoid the generation of violations in most cases, and we guarantee the execution time even in stressed states of the system. If you want to know more about the steps taken in the design of the SLA Predictor, you can check the article (Designing a context-aware ML subsystem: SLA Predictor) in this blog.