Welford's Algorithm

Welford's Algorithm

Derivation of recurrence relations and efficient implementation for incremental statistical updates.

Abstract

This article derives and demonstrates the online computation of the arithmetic mean and variance through Welford’s algorithm. The method allows incremental updates as new data points arrive, offering improved numerical stability, constant memory usage, and computational efficiency compared to conventional batch approaches.

1. Derivation of Recurrence Relations

Mean

Given observations \(x_1,\dots,x_n\) and the mean after \(n-1\) samples \(\mu_{n-1}\), the mean after \(n\) samples is defined as

\[ \mu_n = \frac{1}{n}\sum_{k=1}^{n} x_k = \frac{1}{n}\big((n-1)\mu_{n-1} + x_n\big) \]

This yields the incremental update formula: \(\mu_n = \mu_{n-1} + \frac{x_n - \mu_{n-1}}{n}\).

Variance (Welford’s Form)

Define \(M_{2,n}\) as the sum of squared deviations from the current mean:

\[ M_{2,n} = \sum_{k=1}^{n} (x_k - \mu_n)^2 \]

Using \(\delta = x_n - \mu_{n-1}\) and \(\mu_n = \mu_{n-1} + \delta/n\), the recurrence relation for \(M_2\) is \(M_{2,n} = M_{2,n-1} + \delta(x_n - \mu_n)\).

The population and sample variances follow as \(\sigma^2_n = \frac{M_{2,n}}{n}\) and \(s^2_n = \frac{M_{2,n}}{n-1}\), respectively.

2. Implementation Summary

The algorithm requires only three state variables: the sample count \(n\), the running mean \(\mu\), and the second moment accumulator \(M_2\). Each update operates in constant time and memory, making the method suitable for high-frequency or streaming data.

3. Numerical Validation

Deterministic Dataset

Dataset: [2.1, 3.4, 4.0, 5.2] Results: mean = 3.675, sample variance = 1.6625. Online and batch computations are identical to machine precision.

Large-Scale Random Dataset

For 10,000 samples drawn from a normal distribution with large scale, the online algorithm reproduces the batch mean and variance to within numerical precision.

Streaming Evaluation

Incremental updates were compared to full recomputation at intervals of 1000 samples. The differences for both mean and sample variance remained below \(10^{-10}\), confirming the algorithm’s numerical stability.

4. Conclusion

Welford’s algorithm provides a numerically stable and efficient framework for online computation of mean and variance. Its simplicity, robustness, and low computational overhead make it a standard choice in modern statistical and machine learning applications.

Online Mean and Variance (Welford's Algorithm)

Interactive demonstration of incremental updates for mean and variance.

n: 0 | Mean (μ): | Variance (s²):