By Joao Costa

While traditional log monitoring solutions require manual rule-based analysis, novel AI methods leverage Natural Language Processing to model log streams and capture normal operating conditions. But between this novel dynamic approach and the well-established static rule matching monitoring, a complementarity can be much beneficial, enabling a powerful automated monitoring system that can detect abnormal situations and potential security threats without human intervention. In this blog post we will be discussing the innovative approach to runtime security monitoring in PIACERE.

To address the need for monitoring stack for the runtime conditions, so that the self-learning and self-healing mechanisms can be fed, PIACERE’s team extended the open-source security platform Wazuh to the needs and specificities of DevSecOps [1]. Based on this, we make available a monitoring system capable of detecting security-related events and incidents in the deployed application’s environment. It is (to the extent possible) deployable automatically and notifies users about security alerts.Wazuh agents are compatible with many platforms, including Windows, Linux, Mac OS X, AIX, Solaris, and HP-UX. The technology consists of Wazuh agents installed on machines, a Wazuh server that orchestrates the agents and collects their data, and an Elasticsearch database with a modified Kibana UI. Wazuh includes several modules, each with specific rules and thresholds for triggering alerts. When an alert is triggered, Wazuh can notify users or other components and generate evidence based on event-driven changes to metrics.

In pair with the latter static rule matching approach, PIACERE has been further developing a novel LOg MOnitoring System (LOMOS) in order to identify patterns and anomalies in logs without manual pre-processing of raw unstructured data. It achieves this by identifying log templates that match logs and breaking them down into structured log templates according to a tree structure. LOMOS observes the sequence of templates to learn what is normal behaviour and provides an anomaly score for normal logs. The LOMOS dashboard displays date identified threats, their ranking, and calls for action. The system uses LogBERT algorithms [2], Drain for log template parsing [3], and OpenSearch or Grafana to set up alerts and send them to specific services. There is a training period and a monitoring period. LOMOS uses deep learning techniques to analyse system and application logs, providing insights on the status of monitored assets. It computes an anomaly score on sequences of log templates, applying Natural Language Processing (NLP) models to the template IDs, allowing for a more efficient and accurate log analysis.

PIACERE’s approach for runtime security monitoring combines the power of rule-based matching with deep learning-based anomaly detection on logs (see figure befow). This interplay enables the creation of a monitoring system that can automatically detect deviations from normal operating conditions in real-time, including potential security threats. The system analyses infrastructure and application logs captured by the Wazuh agents and runs static and dynamic monitoring based on rule-matching and deep learning techniques to compute an anomaly score on sequences of log templates. The technology uses NLP models to capture normal operating conditions and provide valuable insights regarding the current and past status of monitored assets. The NLP approach enhances traditional rule-based monitoring solutions, provides valuable insights regarding the current and past status of monitored assets, and determines a more comprehensive and effective method of detecting security threats at runtime.

Figure: Overall system functionality of static and dynamic security monitoring,
aligning with self-learning and self-healing mechanisms [5]

As IT systems continue to grow in complexity and scale, traditional manual approaches to managing and troubleshooting them are becoming increasingly challenging. This evolution has led to a paradigm shift in the concepts of deployment and monitoring, from single servers to complex ecosystems of networks and servers, necessitating new approaches to protect against security attacks. Self-learning and self-healing IT systems are emerging as potential solutions to address this shift, taking this a step further by automatically detecting and resolving issues in real-time without human intervention. To ensure the business continuity of Infrastructure as Code with respect to pre-selected Non-Functional Requirements, a monitoring component is implemented using a time-series database to create complex statistical variables, train a predictor to identify potential failure patterns, and alert DevSecOps teams of potential risks. PIACERE, with its self-learning and self-healing capabilities, can automatically deploy security monitoring agents, detect security threats, and react according to predefined actions. The suggestions are presented in a user-friendly manner, including the event types, edit options, and timestamp. By utilizing self-learning and self-healing IT systems, businesses can reduce downtime, improve reliability, and ultimately provide a more efficient and secure IT infrastructure.

The pertinence of runtime security is rapidly engaging companies and institutions worldwide, that are becoming more aware of their vulnerabilities and consequences of attacks, in the age of digital transformation affecting workflows across industries. With the different skillsets and expertise of the PIACERE consortium partners, the impact of the novelty developed in this project is already affecting impactful domains such as supply-chain cybersecurity and resilience through a research collaboration with the EC-funded project FISHY (fishy-project.eu). Built upon a mix of highly skilled industrial and academic partners, FISHY aims at delivering a coordinated cyber resilient platform towards establishing trusted supply chains of ICT systems through novel evidence-based security assurance methodologies and metrics as well as innovative strategies for risk estimation and vulnerabilities forecasting leveraging state-of-the-art solutions, leading to resilient complex ICT systems. This can be particularly important for smart logistics and the maritime port sector as confirmed by the use case owners of PIACERE, Prodevelop, that were also part of this research collaboration that resulted in two papers and two workshop presentations, opening the doors to further research work under the topic of this blogpost.

The XLAB team will be presenting the details of the technology, the current challenges and the success stories already in April, at the FastContinuum 2023 Workshop happening on the 16th in Coimbra, Portugal (https://sites.google.com/view/fastcontinuum-2023 , collocated with the International Conference on Performance Engineering 2023), and at the 4th International workshop on Information & Operational Technology security systems(https://drcn2023.upc.edu/IOSEC2023.html , collocated with the International Conference on the Design of Reliable Communication Networks 2023) in Vilanova, Spain. In the first event we will discuss runtime security in the novel context of DevSecOps, in pair with the design time security methods applied in PIACERE (for which we wrote about in this blog post). In the second event we will be discussing the duality between static and dynamic methods at runtime security in the context of supply chain cybersecurity and resilience, further discussing the benefits identified by the use cases in the food sector (in the context of a Farm2Fork initiative) and in smart logistics, particularly with PIACERE’s use case Prodevelop. Two papers are coming out of these studies, the first one »Security in DevSecOps: Applying Tools and Machine Learning to Verification and Monitoring Steps« [4] published with the ACM , and the second one »Runtime security monitoring by an interplay between rule matching and deep learning-based anomaly detection on logs« [5] published with IEEE. Still in April, the PIACERE project is organizing a webinar in collaboration with the Software Forum dedicated to DevSecOps named “DevOps Innovation in Practice: New lifecycle processes, new applications” with excellent panelists such as Matija Cankar, Damian Tamburri and Andrey Sadovykh where security at runtime as well as at design time will be highlighted topics (Register Now). Join us online or in Coimbra and Vilanova to further discuss the future of runtime security.



[1] Alonso, Juncal, Radosław Piliszek, and Matija Cankar (2022) Embracing IaC through the DevSecOps philosophy: Concepts, challenges, and a reference framework. IEEE Software 40.1: 56-62.

[2] Guo, Haixuan, Shuhan Yuan, and Xintao Wu. “Logbert: Log anomaly detection via bert.” 2021 international joint conference on neural networks (IJCNN). IEEE, 2021.

[3] P. He, J. Zhu, Z. Zheng, and M. R. Lyu (2017) Drain: An online log parsing approach with fixed depth tree. In 2017 IEEE international conference on web services (ICWS), pages 33–40. IEEE.

[4] Matija Cankar, Nenad Petrović, Joao Pita Costa, Aleš Černivec, Jan Antič, Tomaž Martinčič and Dejan Štepec (2023) Security in DevSecOps: Applying Tools and Machine Learning to Verification and Monitoring Steps. Proceedings of the International Conference on Performance Engineering 2023, ACM.

[5] Jan Antič, Joao Pita Costa, Aleš Černivec, Matija Cankar, Tomaž Martinčič, Aljaž Potočnik, Hrvoje Ratkajec, Gorka Benguria Elguezabal, Nelly Leligou, Alexandra Lakka, Ismael Torres Boigues and Eliseo Villanueva Morte (2023) Runtime security monitoring by an interplay between rule matching and deep learning-based anomaly detection on logs. Proceedings of the International Conference on the Design of Reliable Communication Networks 2023, IEEE.

[6] Dario Di Nucci, Fabio Palomba, Damian A. Tamburri, Alexander Serebrenik, and Andrea De Lucia. 2018. Detecting code smells using machine learning techniques: Are we there yet?. Proceedings of the 25th International Conference on Software Analysis, Evolution and Reengineering (SANER): 612–621, IEEE.