Methodology

Statistical Layer

Metricstab augments its descriptive reports with a small statistical layer so you can separate signal from noise at a glance. This page documents each technique that's currently live, what report it powers, what question it answers, and how to read the resulting insight card. Every technique runs on the same Search Console data you already see in the report — no new sources required.

At-a-glance

Technique	Where it appears	What it answers
Mann-Kendall trend test	Traffic Trend	"Is this drift up/down a real trend, or just noise?"
Rolling z-score anomaly detection	Traffic Trend	"Which days deviate from a moving baseline?"
Bootstrap confidence interval on means	Traffic Trend, Top Queries	"What's the realistic band around this average?"
Wilson confidence interval on CTR	Top Queries (and internal scoring)	"Given small sample, what's a defensible band for this CTR?"
One-proportion z-test (CTR vs expected)	CTR Underperformers	"Is this query's low CTR statistically below its position-based benchmark?"
Two-proportion z-test (PoP CTR shift)	Winners & Losers	"Is this CTR change real, or sample-size noise?"

Each insight card is only shown when its underlying test clears the standard 95% confidence threshold (p < 0.05). Anything below that bar is treated as noise and hidden.

1. Mann-Kendall trend test

Used on the daily-clicks (and impressions) series of Traffic Trend to decide whether an apparent up- or down-slope is statistically real, or just chart noise. Non-parametric — works on any monotone series without assuming normality.

How it works

For every pair of days (i, j) with i < j, score sign(x_j − x_i).
S = sum of all those signs. Strongly positive ⇒ uptrend, strongly negative ⇒ downtrend.
Z = S / sqrt(Var(S)) → two-sided p-value via the normal CDF.
Slope magnitude reported as the Theil-Sen median slope (median of all pairwise slopes), which is robust to outliers.

Reading the card

Significant uptrend in clicks means trend == increasing and p < 0.05. The narration also reports the Theil-Sen slope in clicks/day and the sample length n.

If p ≥ 0.05 no card is shown — the apparent slope is indistinguishable from random fluctuation.

2. Rolling z-score anomaly detection

Per-day check on the same Traffic Trend clicks series. Flags days whose value sits more than 2σ above or below a 14-day trailing baseline.

How it works

For each day, compare its click count against the mean and standard deviation of the previous 14 days.
The z-score measures how many standard deviations the day sits away from that rolling baseline.
Days with |z| ≥ 2.0 are flagged as anomalies (roughly the worst/best 5% of days under a normal distribution).

Why a rolling baseline? Because the absolute mean of a growing site keeps changing — the only fair benchmark is "what was normal in the last couple of weeks?".

Reading the card

N anomalous day(s) detected — lists the most recent flagged days with the actual click count, the z-score, and the rolling baseline mean it was compared against. Useful for tying spikes/dips to releases, content launches, or algorithm dates.

3. Bootstrap confidence interval on means

Wraps a 95% confidence band around any mean we display (daily clicks, mean clicks per query, mean position). Distribution-free — works even on tiny or skewed samples.

How it works

Resample the underlying daily series with replacement hundreds of times.
Compute the mean of each resample.
Take the 2.5^th and 97.5^th percentiles of those means → the 95% confidence band.

Reading the card

"Mean 1.4 clicks/day, 95% CI [1.1, 1.7]". Use the band as a quick sanity check: if today's number falls inside it, today was statistically ordinary; if it falls well outside, it's worth investigating.

4. Wilson confidence interval on CTR

Used internally on every per-query CTR before we rank or recommend action. The Wilson interval is a binomial-proportion CI that, unlike the textbook normal approximation, behaves well for small n and extreme p (CTRs near 0% or 100%).

Why Wilson, not the textbook formula

The classic normal-approximation interval can produce nonsensical bounds (negative CTRs, or CTRs above 100%) when sample sizes are small or proportions are extreme. The Wilson interval is the modern standard because it stays inside [0%, 100%] and behaves correctly for long-tail queries with only a handful of impressions.

5. One-proportion z-test — CTR vs expected

Powers the Significance column in CTR Underperformers. Tests whether each query's observed CTR is significantly below the site-specific expected CTR for its ranking position (from your own CTR Curve).

What it asks

Null hypothesis: the query's true CTR equals the expected CTR for its ranking position (taken from your own CTR Curve).
Alternative: the true CTR is different (could be lower or higher).

Reading the result

A query whose CTR sits significantly below its position-based benchmark (p < 0.05) is treated as a real underperformer — the title or meta description is failing to convert at the rank it already holds, so a rewrite is justified. Queries that don't clear the threshold are not flagged, which kills false positives on low-impression rows.

6. Two-proportion z-test — period-over-period CTR shift

Powers the blue "N statistically significant CTR shifts" card on Winners & Losers. Compares each mover's current-period CTR against its previous-period CTR and reports whether the change is real or sample-size noise.

What it asks

For each mover, “is the CTR change between the previous and current period real, or could a swing of this size easily happen by chance given the sample sizes?”

Reading the card

Movers that clear p < 0.05 are listed inline with their old & new CTR and the p-value. The companion grey card shows how many rows were tested versus how many cleared significance — a quick reality check on the noise floor of the period.

On the roadmap

Techniques planned for the next iterations of the statistical layer:

Changepoint detection — pinpoint the exact date a query or page broke (algorithm update, re-platform, content change).
Seasonality decomposition — split a metric into its underlying trend, weekly seasonality and residual; surface anomalies in the residual rather than the raw series.
Robust anomaly detection — an outlier-tolerant alternative to the rolling z-score for noisier traffic patterns.
Bayesian CTR posterior — report “probability the true CTR is below benchmark” directly, instead of a frequentist p-value.