Axiomatic approach in probability

Probability is a fundamental concept in statistics, mathematics, and many scientific disciplines. It quantifies the uncertainty associated with events chosen from a some universe of events. However, not everyone agrees on the meaning or interpretation of probability. There are several schools of thought, each providing its own perspective.

Frequentist Interpretation (Classical)

The frequentist interpretation, also known as the classical interpretation, views probability as the long-run relative frequency of an event's occurrence. According to this view, if we repeat an experiment under the same conditions an infinite number of times, the probability of an event is the limit of the proportion of times the event occurs.

Bayesian Interpretation

The Bayesian interpretation of probability is a measure of the degree of belief. Bayesian probability represents a level of certainty relating to the occurrence of an event, based on prior knowledge or information. It's a subjective view of probability, as it allows for the incorporation of new evidence to update our belief about the probability of an event. In the Bayesian framework, probabilities are updated through Bayes' theorem.

Propensity Interpretation

The propensity interpretation of probability is a more philosophical approach that considers probability as the tendency or disposition of a given type of situation to yield an outcome of a certain kind. It is often used in the context of explaining probabilities in physical systems, such as the probability of a radioactive atom decaying.

Geometric Interpretation

Geometric probability is a tool to deal with the problem of infinite outcomes by measuring the number of outcomes geometrically, in terms of length, area, or volume. In basic probability, we usually encounter problems that are "discrete". However, some of the most interesting problems involve "continuous" variables. Dealing with continuous variables can be tricky, but geometric probability provides a useful approach by allowing us to transform probability problems into geometry problems.

Axiomatic Interpretation

Axiomatic Probability is an extension of the Classical Probability theory. Axiomatic probability has its inherent roots in the philosophy of science. The axioms are used to construct a probability model that is consistent and complete. This axiomatic probability can be applied for solving problems in any field of science.

In the history of probability, mathematicians have represented the underlying assumptions and axioms of probability in three ways. The axiomatic approach to probability is a unifying theory of all these probabilistic approaches that allows one to formalize the presentation.

The concept of probability can only be applied to experiments in which the total number of outcomes is known, so we cannot apply the concept of probability until we have that information.

Kolmogorov Axiom

The standard probability axioms are the foundations of probability theory introduced by Russian mathematician Andrey Kolmogorov in 1933. Like all axiomatic systems, they outline the basic assumptions underlying the application of probability to fields such as pure mathematics and the physical sciences, while avoiding logical paradoxes.

Putting it in simpler terms, axiomatic probability is another way of describing the likelihood of an event. An event’s probability is a number between 0 and 1, where 0, roughly speaking, indicates that the event is impossible and 1, that it is certain. Probabilities indicate the likelihood of an event occurring.

Let there be a sample space $Ω$, an event space E and a probability measure $P$, such that $p(E)$ is the probability of some event $E∈E$. Then, we introduce three axioms of probability:

First Axiom

The probability of an event is a non-negative real number:

$$p(E)∈R,\text{ }p(E)≥0\text{ for all }E∈E$$

Second Axiom

The probability that at least one elementary event in the sample space will occur is one:

$$p(Ω)=1$$

Third Axiom

The probability of any countable sequence of disjoint (i.e. mutually exclusive) events $E1,E2,E3,…$ is equal to the sum of the probabilities of the individual events:

$$p(⋃_{i=1}^{∞}E_i)=\sum_{i=1}^{∞}p(E_i)$$

How the axiomatic approach solves classical problems

The axiomatic approach avoids the specific conceptual issues of classical, frequentist, and subjectivist interpretations:

Avoids the "equally likely" assumption (Classical interpretation): The classical interpretation relies on the assumption of equally likely outcomes, which is problematic when outcomes are not symmetric (e.g., a biased coin) or for continuous probability distributions where outcomes are infinite and cannot be considered "equally likely" in the same way. The axiomatic approach simply requires that probabilities be assigned in a way that is consistent with the axioms, accommodating both equally likely and non-equally likely cases, as well as finite and infinite sample spaces.
Avoids circularity and limits issues (Frequentist interpretation): The frequentist interpretation defines probability as the long-run relative frequency in a repeatable experiment. This faces conceptual issues:
- It's circular to define probability in terms of what happens in the "long run" because the "long run" behavior (convergence) is itself a result derived from the axioms of probability (the Law of Large Numbers).
- It struggles with single-case events (e.g., "what is the probability that a specific historical event happened?"), as they are not repeatable experiments. The axiomatic framework, being an abstract mathematical structure, allows for probabilities to be assigned to single events as long as they adhere to the rules, without specifying how the initial value is determined.
Avoids subjectivity (Subjectivist interpretation): The subjectivist (or Bayesian) interpretation defines probability as a degree of personal belief. While useful for modeling rational belief, it is inherently subjective and can lead to different individuals assigning different probabilities to the same event based on their personal judgment or experience. The axiomatic approach provides an objective, consistent set of rules that any valid assignment of probabilities (whether based on frequency or belief) must follow, ensuring mathematical consistency regardless of the subjective initial inputs.

Probability & measure theory

Measure theory provides the rigorous mathematical foundation for probability theory, treating probability as a specific type of "measure" on a set of possible outcomes. While measure theory is a general framework for measuring size, length, or volume, probability theory applies this framework to a specific type of measure space: a finite measure space where the measure of the entire space is normalized to equal 1. Key probability concepts like "events," "random variables," and "expected value" are defined through their corresponding measure-theoretic counterparts: "measurable sets," "measurable functions," and "integrals," respectively.

Sigma-Algebras (σ-algebras)

A sigma-algebra is a set of subsets of Ω that is closed under:

complementation
countable unions
countable intersections

Because not all subsets of R(or any infinite set) can be consistently measured; sigma-algebras restrict attention to sets where a measure can be defined consistently.

Thus:

A probability space is a measurable space with a measure of total mass 1.

Probability measure as a special type of measure

In measure theory, a measure μ satisfies:

$\mu(\emptyset)=0$
countable additivity:
$\mu(\bigcup_{i=1}^{\infty} A_{i}) = \sum_{i=1}^{\infty} \mu(A_i)$

A probability measure P is just a measure with:

$$P(\Omega) = 1 $$

This makes measurable sets correspond to events, and their measure corresponds to probabilities.

So a probability space is simply:

$$(\Omega, F, P)$$

Measurable functions → Random variables

In measure theory, a measurable function between measurable spaces is one where inverse images of measurable sets are measurable.

A random variable is precisely a measurable function:

$$X : (\Omega, \mathcal{F}) \to (\mathbb{R}, \mathcal{B}) $$

where $\mathcal{B}$ is the Borel sigma-algebra on $\mathbb{R}$.

Why measurability?
Because we want statements like:

$$P(X \le x)$$

to make sense. This probability is:

$$P(X^{-1}((-\infty, x]))$$

Measurability guarantees that this set is in $\mathcal{F}$.

Thus:

Random variables are measurable functions from a probability space to a measurable target space.

Subadditivity

We want to show:

$$P(A \cup B) \le P(A) + P(B)$$

Let's decompose $A \cup B$ into disjoint pieces:

$$A \cup B = A \cup (B \setminus A)$$

Now apply countable additivity to the disjoint union:

$$P(A \cup B) = P(A) + P(B \setminus A)$$

Since $B \setminus A \subseteq B$, by the axiom $C \subseteq D \Rightarrow P(C) \le P(D)$:

$$P(B \setminus A) \le P(B)$$

thus:

$$P(A \cup B) = P(A) + P(B \setminus A) \le P(A) + P(B)$$
$$\blacksquare$$

Inclusion-Exclusion Principle

We want to derive:

$$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$

We now use the same decomposition:

$$A \cup B = A \cup (B \setminus A)$$

with disjoint union.

Apply the additivity:

$$P(A \cup B) = P(A) + P(B \setminus A)$$

Now express B as a disjoint union

$$B = (A \cap B) \cup (B \setminus A)$$

These two sets are disjoint, so:

$$P(B) = P(A \cap B) + P(B \setminus A)$$

Solve for $P(B\setminus A)$:

$$P(B \setminus A) = P(B) - P(A \cap B)$$

Finally we substitute in the previous expression:

$$P(A \cup B) = P(A) + \bigl[P(B) - P(A \cap B)\bigr]$$

$$\boxed{P(A \cup B) = P(A) + P(B) - P(A \cap B)}$$

$$\blacksquare$$