Welford's Algorithm
Derivation of recurrence relations and efficient implementation for incremental statistical updates.
Abstract
This article derives and demonstrates the online computation of the arithmetic mean and variance through Welford’s algorithm. The method allows incremental updates as new data points arrive, offering improved numerical stability, constant memory usage, and computational efficiency compared to conventional batch approaches.
1. Derivation of Recurrence Relations
Mean
Given observations \(x_1,\dots,x_n\) and the mean after \(n-1\) samples \(\mu_{n-1}\), the mean after \(n\) samples is defined as
\[ \mu_n = \frac{1}{n}\sum_{k=1}^{n} x_k = \frac{1}{n}\big((n-1)\mu_{n-1} + x_n\big) \]
This yields the incremental update formula: \(\mu_n = \mu_{n-1} + \frac{x_n - \mu_{n-1}}{n}\).
Variance (Welford’s Form)
Define \(M_{2,n}\) as the sum of squared deviations from the current mean:
\[ M_{2,n} = \sum_{k=1}^{n} (x_k - \mu_n)^2 \]
Using \(\delta = x_n - \mu_{n-1}\) and \(\mu_n = \mu_{n-1} + \delta/n\), the recurrence relation for \(M_2\) is \(M_{2,n} = M_{2,n-1} + \delta(x_n - \mu_n)\).
The population and sample variances follow as \(\sigma^2_n = \frac{M_{2,n}}{n}\) and \(s^2_n = \frac{M_{2,n}}{n-1}\), respectively.
2. Implementation Summary
The algorithm requires only three state variables: the sample count \(n\), the running mean \(\mu\), and the second moment accumulator \(M_2\). Each update operates in constant time and memory, making the method suitable for high-frequency or streaming data.
3. Numerical Validation
Deterministic Dataset
Dataset: [2.1, 3.4, 4.0, 5.2] Results: mean = 3.675, sample variance = 1.6625. Online and batch computations are identical to machine precision.
Large-Scale Random Dataset
For 10,000 samples drawn from a normal distribution with large scale, the online algorithm reproduces the batch mean and variance to within numerical precision.
Streaming Evaluation
Incremental updates were compared to full recomputation at intervals of 1000 samples. The differences for both mean and sample variance remained below \(10^{-10}\), confirming the algorithm’s numerical stability.
4. Conclusion
Welford’s algorithm provides a numerically stable and efficient framework for online computation of mean and variance. Its simplicity, robustness, and low computational overhead make it a standard choice in modern statistical and machine learning applications.
Online Mean and Variance (Welford's Algorithm)
Interactive demonstration of incremental updates for mean and variance.
n: 0 | Mean (μ): – | Variance (s²): –