Microsoft Lumos Is Now Open-Source Allowing Monitoring Of Web App Metrics And Quick Detection Of Anomalies By Eliminating False Positives

Microsoft has opened up access to ‘Lumos’, a powerful Python library for automatically detecting and diagnosing metric regressions in “web-scale” applications. The library has reportedly been very active inside Microsoft Teams and Skype. Essentially, a highly powerful and intelligent ‘anomaly detector’ is now open-sourced and available for web developers to spot and address regressions in key performance metrics while nearly eliminating the majority of false positives.

Microsoft Lumos is now open source. It was being actively been used in select Microsoft products, and will now be available for the general web and app development community. The library reportedly allowed engineers to detect hundreds of changes in metrics and reject thousands of false alarms surfaced by anomaly detectors.

Lumos Reduces False-Positive Alert Rate By Over 90 Percent, Claims Microsoft:

Lumos is a new methodology that includes existing, domain-specific anomaly detectors. However, Microsoft assures the Python library can reduce the false-positive alert rate by over 90 percent. In other words, developers can now confidently go after persistent issues instead of intermittent ones which weren’t having a long-term detrimental effect.

The health of online services is usually monitored by tracking Key Performance Indicator (KPI) metrics over time. Engineers conducting ‘Regression Analysis’ require a lot of time and resources to weed out issues which can be indicative of major problems. These problems can result in escalating operational costs and even loss of users if not addressed.

Needless to add, tracking down the root cause of every KPI regression is time-consuming. Moreover, teams often spend a lot of time analyzing the issues only to find they were a mere anomaly. This is where Microsoft Lumos comes in handy. The Python library eliminates the process of establishing whether a change is due to a shift in population or a product update by providing a prioritized list of the most important variables in explaining changes in the metric value.

Microsoft Lumos also serves the wider purpose of understanding the difference in a metric between any two datasets, Interestingly, the platform includes ‘bias’, and by comparing a control and treatment data set while remaining agnostic to the time series component, Lumos can investigate the anomalies.

How Does Microsoft Lumos Work?

Microsoft Lumos works with the principles of A/B testing to compare pairs of data sets. The Python library begins by verifying if the regression in the metric between data sets is statistically significant. It then follows up with a population bias check and bias normalization to account for any population changes between the two data sets. Lumos decides the issue isn’t worth pursuing if there’s no statistically significant regression in the metric. However, if the delta in the metric is statistically significant, Lumos marks the features and ranks them according to their contribution to the delta in the target metric.

The Lumos Python Library serves as the primary tool for scenario monitoring of hundreds of metrics. Developers and teams conducting performance analysis could monitor and work on the reliability of calling, meetings, and public switched telephone network (PSTN) services at Microsoft. The library is operational on Azure Databricks, the company’s Apache-spark-based big data analytics service. It has been configured to run with multiple jobs that are arranged as per priority, complexity, and metrics type. The jobs complete asynchronously. It means if the system detects an anomaly, a Lumos workflow is triggered, and the library then intelligently analyzes and checks if the anomaly is worth pursuing and addressing.

Microsoft has noted that Lumos isn’t guaranteed to catch all regressions in services. Additionally, the service will require a large number of datasets to offer reliable insights. The company is planning to include continuous metrics analysis, perform better feature ranking, and bring in feature clustering as well. These steps should address the primary challenge of multicollinearity in feature ranking.


Alap Naik Desai

A B.Tech Plastics (UDCT) and a Windows enthusiast. Optimizing the OS, exploring software, searching and deploying solutions to strange and weird issues is Alap's main interest.