ACES Observability & Monitoring

When considering complex distributed architectures—spanning also multiple cluster or edge environments—the ability to gain deep insights into the performance, health, and interactions of these clusters, nodes and constituent workloads becomes paramount. Observability represents an approach to analyzing and optimizing systems by providing a real-time perspective on all operational data related to applications and infrastructure. Observability lays the groundwork for the ACES platform to be able to proactively identify and address issues to ensure seamless operations and optimal resource utilization. However, the complexities of multi-cluster or edge environments, such as the ones tackled in ACES, change the way of comprehensively viewing system behavior, dependencies, and potential bottlenecks and subsequently detecting, diagnosing, and resolving fatal errors.

To this end, ACES is defining and realizing a set of open, portable, and expressive data acquisition and knowledge representation models as well as software that can cover the needs of Cognitive Edge-Cloud services and infrastructure at all levels. Specifically, ACES is developing components to acquire and transform data from available infrastructure, resources, devices, services, users, and applications that feed into knowledge, and, in this, achieve a high level of contextual awareness. We touch on topics such as distributed data collection, data collection reconfiguration approaches, data aggregation, data exchange minimization, peer-to-peer data exchange and data replication approaches. In the context of autopoiesis, data acquisition is critical as the systems require sensing and deep understanding of the world in order to operate and drive actionability.

In ACES, University of Ljubljana is designing and developing the Monitoring & Observability framework, a vertical ACES component, spanning the overall ACES architecture and its constituent components. The component provides monitoring and observability aspects to the different layers of the software stack on various levels (i.e., edge, application, network, and cloud layer). It encompasses monitoring, logging, tracing, metrics collection, alerting, anomaly detection and analysis, visualization, and performance analysis. Due to its inherent distributed nature, the Monitoring & Observability framework considers hierarchical and distributed monitoring and storage, including across multiple clusters. To date, we have defined the component architecture consisting of the main subcomponents Monitoring & Observability Core, Push Gateway, Alert Manager, and Data Forwarder, which provide core functionalities of monitoring and telemetry data collection, storage, forwarding and querying as well as alerting to ACES workloads or other ACES components. Additionally, auxiliary subcomponents enable functionalities such as service discovery or data analysis, data export and visualization. Following the initial implementation and first software release, the component has already been successfully deployed in the test and integration environment. Next steps target final software component release and further use case implementation as well as other related piloting activities.

University of Ljubljana is additionally investigating and developing various multi-layer system characterization and aggregation approaches that should provide unprecedented monitoring and observability context of edge systems. But more on that in the upcoming news releases. Stay tuned!


Submit a Comment

Your email address will not be published. Required fields are marked *

Related content

Architecture award for #EUCloudEdgeIoT

At the EUCEI final conference on 18.06.2024 the ACES project was conferred the Architecture award by EUCloudEdgeIoT. ACES envisions to build on our progressive insights...

Best Paper Award ADAPTIVE2024

Last 14th April in Venice (IT), the ACES team won the Best Paper Award ADAPTIVE2024 with the paper: "Aged-based Modeling in the Edge Continuum using Swarm...

NSDI ’24 Open Access

ACES team presented the paper: Automatic Parallelization of Software Network Functions at the NSD ’24 Open Access Conference Sponsored by King Abdullah University of...

Agent-Based Modeling as a Starting Point for Applying Swarm Intelligence in the Edge Continuum

The Complexity of the Edge-Computing Infrastructure The rise of local processing capacity at the edge is driven by numerous advantages critical for future processing...

A Comprehensive Cybersecurity Solution for Autopoietic Cognitive Edge-Cloud Services (ACES)

Edge-cloud services are rapidly adopted. However, the increase in cyberattacks on these services presents significant challenges, including service interruptions, data...

From Homeostasis and Autopoiesis to Anti-fragility

The ever-increasing demand for computing power calls for radical new ways to reduce emissions, mitigate risks, improve system resilience, and eliminate single points of...