Anomaly detection is a statistical technique that Analytics Intelligence uses to identify anomalies in time-series data for a given metric, and anomalies within a segment at the same point of time.
Identifying anomalies in metrics over time
Intelligence applies a Bayesian state-space time series model to the historic data to predict the value of the most recent datapoint in the time series. The model produces a prediction and a credible interval that we use to evaluate the observed metric.
Using historical data, Analytics Intelligence predicts the value of the metric at the current time period and flags the datapoint as an anomaly if the actual value falls outside the credible interval. For detection of hourly anomalies, the training period is 2 weeks. For detection of daily anomalies, the training period is 90 days. For detection of weekly anomalies, the training period is 32 weeks.
Identifying anomalies within a segment at the same point in time
While time-series based anomaly detection uses historical data to flag a single metric within a single dimension value, we also provide anomaly detection simultaneously over several metrics and dimension values, at a point in time.
In this approach we use principal components analysis (PCA) to leverage the correlation structure of the metrics along with cross-validation to flag anomalies.
First, we identify the set of dimensions and metrics that will undergo the PCA. Based on all possible dimension values, we create multiple segments, then normalize each metric by the number of users within a segment. Next, we run the PCA for those segments and normalized metrics. If any particular segment demonstrates anomalous behavior across any metric and comprises at least 0.05% of the users on that property, we surface those segments as anomalies. Currently we do this analysis every week.