News & press releases

Cloud Computing QoS for Real-Time Data Applications

One of the CLASS project’s goals is to develop cloud computing services capable of guaranteeing the “soft” real-time constraints imposed by the data operations and analytics tasks executed on the cloud. These soft guarantees will complement “hard” real-time guarantees required by the safety-critical operations executed on the edge (and distributed based on the COMPSs framework [2]). To do so, CLASS cloud computing capabilities will follow a Quality of Service (QoS) approach to ensure the output of the less critical data operations and analytics tasks deployed in the cloud are available at the right time.

Data applications and data-driven services are expected to realize the benefits from big data by building algorithmic logic from data (for example, by means of supervised machine learning techniques) instead of the traditional classical systems development life cycle. Continuous adaptation to complex environments, real-time automated decision-making and, ultimately, systems autonomy, are among the expected benefits. These applications and services are expected to build their data-driven capability from both data in motion (real-time, streaming) and data at rest (historical, batch).

Figure 1. CLASS vision of combined data-in-motion data-at-rest analytics distributed along the compute continuum: from the edge of the network (closer to sensors and actuators) to the cloud (where massive infrastructure resources reside).

The European project CLASS explores the distribution of these two kinds of analytics workloads along the compute continuum (see Figure 1): from the cloud (public and/or private) to the edge (connected things and network edge devices). The project’s application domains are smart cities and connected cars (as a precursor to autonomous cars). In these areas, the benefits of distributing tasks along the compute continuum are clear: reliability, availability and privacy provided by the edge combined with cost efficiency provided by the cloud. However, two main challenges arise: tasks deployed at the edge (for example, on board the connected car) need to guarantee “hard real-time” responses (e.g. very low latency) and those deployed at the cloud need to guarantee certain QoS levels regarding time: right-time or “soft real-time” guarantees.

The CLASS project combines two different types of real-time requirements within a unified framework, to meet both the safety and business real-time requirements of the system:

In system components with safety-critical (life-critical) and mission-critical real-time requirements (a.k.a. “hard real-time”), engineers must provide robust evidence about the correct timing behaviour of the system. These system components oversee the control of components that could result in human life being endangered or affect the integrity of the system, for example in the automotive or the aerospace domains. For reasons of latency and connectivity, these functions operate at the edge, as in the case of Advanced Driver-Assistance Systems (ADAS) which execute within onboard car computers.
In systems components with business-critical real-time requirements, engineers must provide Quality of Service (QoS) guarantees (a.k.a. “soft real-time”) guarantees. These systems oversee functions that can enhance the overall service but are not fundamental for their correct operation and/or overall success; for example, determining the most time-efficient car route. The data services and operations providing the intelligence for this type of function can be deployed in the cloud for reasons of cost effectiveness. This kind of task may be offloaded from the connected object to the cloud.

In CLASS, “soft real-time” constraints imposed on cloud computing services will be satisfied by following a Quality of Service (QoS) approach implemented by three interrelated components:

The Cloud Analytics Service Management (CASM) service will accept the time constraints specified by the engineers as a parameter and will ensure the initial deployment satisfies them.
The Service Level Agreement (SLA) will reflect time constraints such as the Budget (average computational time of the task), the Deadline (point in time at which results may be not valid anymore) or the Period (periodicity with which the task needs to be executed).
The Cloud Analytics Service Scalability (CASS) service would receive alerts from the SLA Manager and ensure the SLA by provisioning more cloud resources.
The SLA Manager will continuously monitor tasks’ time performance and trigger alerts when they do not reach certain service-level objectives (SLO); this alert will trigger adaptation mechanisms in both CASM and CASS components.

The main design objectives of the CLASS cloud computing services are as follows:

Elasticity: to scale in and out the amount of resources to meet an increase or decrease in the workload, for example, the number of running tasks or their time constraints.
Intelligent provisioning: to give the right amount of resources to the right tasks at the right time, especially at times of peak demand when tasks compete for the resources.
Cloud-native operations: to support container-based deployment of tasks by using container technology (e.g. Docker [2]) and container orchestration services (e.g. Kubernetes [3]).
Portability: to support different infrastructure virtualization solutions, to avoid vendor lock-in and maximize efficiency, including hybrid and multi-cloud settings combining the resources of data centres (as a private cloud) and public clouds such as AWS or GCP.
Simplified deployment: to use abstractions closer to those of analytics programming models than those from container orchestration platforms (e.g. Kubernetes [3]).
Fully managed: to provide automated and self-optimized resource provisioning, self-scalability and self-healing, letting users pay for what they use (on demand).
Resource orchestration coherence: to ensure the overall coherence of the scheduling of different types of processes: virtual machines, containers or analytics workloads.
Integrated monitoring: to provide means for visualizing analytics tasks’ time performance and alerting other components any SLA violation.

References:

[1]    F. Lordan, E. Tejedor, J. Ejarque, R. Rafanell, J. Álvarez, F. Marozzo, D. Lezzi, R. Sirvent, D. Talia and R. M. Badia, “ServiceSs: an interoperable programming framework for the Cloud,” Journal of Grid Computing, vol. 12, no. 1, p. 67–91, 2014.
[2]    “Docker,” 2018. [Online]. Available: https://www.docker.com/.
[3]    “Production-Grade Container Orchestration - Kubernetes,” 2018. [Online]. Available: https://kubernetes.io/.