What is the CERN Accelerator Logging Service?
The CERN-wide Accelerator Logging Service was born out of the LHC Logging Project, which was launched in 2001. The service first became operational in late 2003, and has since become what is considered to be a mission-critical service. The current mandate can be summarized as:
- Information management for Accelerator performance improvement.
- Meet requirements for recording beam history.
- Make available long-term statistics for management.
- Avoid duplicate logging efforts.
The Logging Service persists data of more than 2 million pre-defined signals coming from heterogeneous sources. These signals range from data related to core infrastructure such as electricity, to industrial data such as cryogenics and vacuum, to beam related data such as beam positions, currents, losses, etc.
The Logging Service provides access to logged data for more than 1,000 registered individuals and close to 200 registered custom applications from around CERN.
How is the CERN Accelerator Logging Service made?
It is a "Big Data" system running on a large computing cluster.
Data is persisted in Apache Hadoop using a lamda architecture (most recent data in HBase and data older than 30 hours in HDFS using Apache Parquet). Data is ingested via custom Java data-acquisition infrastructure based on Akka, Apache Kafka, Spring-based java processes to perform ETL between Kafka and Hadoop. Approximately 2TB (before compression) of data per day are persisted.
Data extraction is performed using Apache Spark via various Java or Python APIs. A Jupyter notebook service also exists that is fully integrated with the Logging service. A generic Web application (called TIMBER) is also provided to visualize the data. The back-end of this system is based on Java and the Spring framework, whereas the front-end Technology is based on the latest version of the Angular framework.
Quality and DevOps play an important role, with CI/CD based on GitLab, Sonar, Jenkins and Ansible with MONIT, ELK and Prometheus for monitoring.
The entire stack is heavily instrumented, in order to support the philosophy of knowing: who, is doing what, from where, and how long things take. Collectively this information is intended to help ensure the stability and scalability of the service in spite of ever-increasing demands.