Since the start of operations in 2013, the ALMA Observatory software has become very large in terms of size and complexity requirements for stability, performance and quality have increased. In this work we describe the current status of infrastructure, tools and practices for software log analysis developed over the years to extract insights on the behavior of the system and to identify common pitfalls and points of failure. Thanks to the design and implementation of a logging infrastructure, temporal domain analysis and visualization tools, machine learning techniques and other ad-hoc solutions we have been able to speed up troubleshooting, anticipate issues and have a better understanding of the overall system behavior.
|