News & press releases

Deploying and managing COMPSs workflows with Rotterdam

Following previous newsletters in which COMPSs workflows for describing complex big-data analytics workloads and the CLASS Cloud Computing Platform based on Rotterdam and Openshift / Kubernetes were introduced, this newsletter digs into the process of deploying and managing COMPSs workflows in Openshift (OKD) using Rotterdam and shows an example of a COMPSs workflow deployment to illustrate all the process.

Rotterdam is a Container as a Service (CaaS) facade which facilitates the deployment and lifecycle management of containerized applications and cloud data analytics workloads on container orchestration platforms through API calls, abstracting all the cloud infrastructure details away from developers. It includes the SLALite application, a lightweight implementation of an SLA system, responsible for enforcing QoS parameters, including real-time. It makes use of Prometheus to get and evaluate the metrics.

In the following example, the functionalities shown in the figure above are already deployed in an Openshift cluster that is part of the Cloud infrastructure. The example includes a COMPSs application that will connect to Rotterdam to launch and manage a big-data workload. The COMPSs framework is the responsible for monitoring the execution of the workflow and sending to Rotterdam the information about deadlines misses to Prometheus through the Pushgateway application.

Workflow deployment

To launch to the CLASS cloud a workflow (or a COMPSs task) with Rotterdam, first we need to define a QoS template or use one of the default templates included in Rotterdam. These templates are used during runtime to generate the SLAs associated to the execution. This means that if there is a violation in one of these SLAs, Rotterdam will take the actions defined in these templates, like scaling out or in the number of replicas of the deployment (corresponding to COMPSs workers) that generated the violation.

QoS templates

In this example we will use the following QoS template which is already included in the list of Rotterdam’s default QoS templates:

{
"type": "app-compss",
                  "guaranteeName": "DeadlinesMissed_1",
                  "maxAllowed": 0,
                  "action": "scale_out",
                  "scaleFactor": 1.5,
                  "guarantees": [{
                                    "name": "deadlines_missed",
                                    "constraint": "deadlines_missed < 1"}]
}

This QoS template is identified by “DeadlinesMissed_1”, and defines a metric called “deadlines_missed” that must be always lower than one. In other words, if there is at least one missed deadline, then the SLALite component will generate a violation. The action that will be executed by Rotterdam after this violation is generated, is to scale out the number of replicas / COMPSs workers.

Launching the workflow

After defining the QoS template, the COMPSs application deploys the workflow using Rotterdam. Internally, COMPSs defines a JSON file like the following described in the sninpped below, and sends it to Rotterdam through a REST API:

{
                  "name": "compss-workflow",
                  "dock": "class",
                  "qos": [{"qosid": "DeadlinesMissed_1"}],
                  "replicas": 4,
                  "image": "example/compss-test",
                  "ports": [44240],
                  "command": ["/bin/sh"],
                  "args": ["-c", "while true; do echo hello; sleep 10;done"]
}

This JSON example defines a COMPSs workflow based on a docker image called “example/compss-test”, composed by four replicas, which correspond to four COMPSs workers. During the deployment phase, Rotterdam uses the QoS template identified by “DeadlinesMissed_1” described in the previous section to generate the corresponding SLA.

Missed deadlines violations and scaling out operation

The Figure below shows the actions performed by the CLASS Cloud Infrastructure when the COMPSs framework detects a deadline miss in the execution of the workflow.

Concretely, when the COMPSs application detects a missed deadline, it sends this information to Prometheus through the Pushgateway application (1, 2). The SLALite component, which is continuously monitoring metrics from Prometheus to check the existing SLAs (3),  detects a violation and sends this information to Rotterdam (4).  

When Rotterdam is called because of a SLA violation, it will take the required actions described in the corresponding QoS templates. In this case it will scale out the number of workers as shown in the Figure below.

By dynamically adding more workers to the workflow it is expected to mitigate or to avoid the number of missed deadlines, thus improving the performance of the COMPSs application.

References

An efficient distribution of big-data analytics workloads across the compute continuum: https://class-project.eu/news/efficient-distribution-big-data-analytics-workloads-across-compute-continuum

CLASS Cloud Computing Platform - first integration in the City of Modena Data Center: https://class-project.eu/news/class-cloud-computing-platform-first-integration-city-modena-data-center

The Origin Community Distribution of Kubernetes that powers Red Hat OpenShift  https://www.okd.io/