TransWikia.com

What is the probability space of typical real univariate probability distributions?

Mathematics Asked by Lars Ericson on December 15, 2021

Postscript to the question below. In trying to learn from the answers below, all of which I am grateful for, I read a historical article on the origins and legacy of Kolmogorov’s Grundbegriffe. This article helped me understand what basic things people were struggling with when this theory was developed. In particular, the long-term trend towards abstraction and foundation in terms of measure theory, and the early-days focus on the connection between the real world and the probabilistic model. I then re-read the answers and comments. I made a comment that started

We can choose $Ω=Re$ because the domain of the distribution function is $Re$.

This is wrong because the domain of the distribution function is not necessarily mentioned in the declaration of the probability space. I made the convention that random variables $X: Omega rightarrow Re$. So the domain of the distribution function is $Re$ by my convention, but that doesn’t have anything to do with the probability space. $Omega$ is a kind of index set. Suppose we are reasoning about the saturation of the color red in grapes. In that case we are thinking about say a color level in $S=[0,255)$. Nowhere in the definition of a probability space $(Omega,mathcal A,P)$ to support reasoning about $S$ do we need to specify $S$. We do need to demonstrate that there is a 1-1 mapping between $Omega$ and $S$, i.e. that $Omega$ can enumerate $S$. Once we have "built" $(Omega,mathcal A,P)$, we can put it to work and re-use it for any $S$ which $Omega$ can enumerate. The probability space $(Omega,mathcal A,P)$ is a kind of indexing structure. That for me is the key realization. The key cognitive error comes from labelling $Omega$ as the sample space, and $mathcal A$ as the event space. The common sense meaning of those terms implies a connection with the actual samples being reasoned about, when that does not have to be the case. A far less misleading terminology would be to label $Omega$ as the sample index space or just index space, and $mathcal A$ as the index set space. This kind of thing is clearly understood in programming languages, where if I have an array $A$, then $(i,j)$ is an index and I don’t confuse $(i,j)$ with $A[i,j]$, and I don’t confuse the purpose of arrays with the purpose of array indices, but in some contexts I can identify $A[i,j]$ with $(i,j)$.

Short version of the question: How do we formally and correctly define the probability space of the reals which supports the definition of the typical/usual univariate continuous probability distributions, such as uniform and exponential?

Short restatement of the core question that I have: I am hung up on p. 3 section 1.1B of the KPS text. They start with an unspecified probability space $(Omega,mathcal A,P)$. Two distinct random variables $V$, $V in Exp(lambda)$ and $V in U[a,b]$, are said to have distribution functions $F_V=P_V((-infty,x))=P({omega in Omega: V(omega)<x})$. These are distinct and solved separately as $F_{U[a,b]}(x) = mathcal H(x-a) mathcal H(b-x) frac{x-a}{x-b} + mathcal H(x-b)$ and $F_{Exp(lambda)}=mathcal H(x) (1-e^{-lambda x})$, where $mathcal H(x geq 0) = 1, mathcal H(x<0)=0$. My key question is:

  • What is a solution for the $P$ shared by $X$ and $Y$?

Note: Here are some similar questions on Math Stack Exchange

Comment: I was mistakenly assuming that the text above was taking $Omega=Re$ because I saw a similar statement somewhere to the effect of saying "for purposes of discussion let’s say the sample space for continuous random variables is $Re^d$". The cited answer to 2nd question above starts that way but then gets to $[0,1]$. So: I now understand that the $[0,1]$ is the "best fit" sample space, along with Lebesgue measure. So the "right" probability space that I was looking for is the Steinhaus space $([0,1],mathscr B([0,1]), mu)$ where $mu$ is the Lebesgue measure restricted to $[0,1]$. 99.999% of my confusion came from

  • Not recognizing that $[0,1]$ is a "big enough" space to enumerate the domain of a continuous map into $Re$. So it’s "as good as" $Re$.
  • Making the assumption that the convention, was, somehow somewhere, to identify the sample space for $d$-dimensional continuous random variables with $Re^d$, when the "best fit" answer is $[0,1]^d$.

Longer version of the question:

Following this text,

Let $Omega$ be a nonempty set, the sample space.

Let set $mathcal F$ of subsets of $Omega$ be a $sigma$-algebra so that

  • $Omega in mathcal F$
  • $Omega setminus F in mathcal F$ if $F in mathcal F$
  • $bigcup_{n=1}^{infty} F_n in mathcal F$ if all $F_i in mathcal F$

Let $P: mathcal F rightarrow [0,1]$ be a probability measure so that

  • $P(Omega) = 1$
  • $P(Omega setminus F) = 1-P(F)$
  • $P(bigcup_{n=1}^{infty} F_n) = sum_{n=1}^infty P(F_n)$

We call the triple $(Omega, mathcal F, P)$ a probability space.

Suppose $X:Omegarightarrow Re$. We say $X$ is a random variables if ${omega in Omega : X(omega) leq a}$ is in $mathcal F$ for every $a in Re$.

Then the probability distribution function $F_X : Re rightarrow Re$ is defined for all $x in Re$ as

$$F_X(x) = P({omega in Omega : X(omega) < x})$$

Note that $P$ appears unsubscripted in the definition of $F_X$. $P$ does not depend on the particular random variable $X$ whose distribution we are defining. So in that sense it should be possible for the same probability space $(Omega, mathcal F, P)$ to underly probability distribution function constructions for multiple distinct random variables $X$ and $Y$, $X neq Y$, for the same probability space.

For example, let

$$Omega = {0,1}$$
$$mathcal F = {emptyset, {0}, {1}, {0,1}}$$
$$P = begin{cases}
emptyset &mapsto& 0 \
{0} &mapsto& frac{1}{2} \
{1} &mapsto& frac{1}{2} \
{0,1} &mapsto& 1
end{cases}$$

Let $X,Y: Omegarightarrow Re$ and be random variables fully defined by

$$X = begin{cases}
0 &mapsto& 17 \
1 &mapsto& 17
end{cases}$$

$$Y = begin{cases}
0 &mapsto& 42 \
1 &mapsto& 42
end{cases}$$

Then the probability distributions of $X$ and $Y$ are

$$F_X(x) = P({omegainOmega:X(omega)<x}) = begin{cases}
x < 17 &mapsto& 0 \
x geq 17 &mapsto& 1
end{cases}$$

$$F_Y(x) = P({omegainOmega:Y(omega)<x}) = begin{cases}
x < 42 &mapsto& 0 \
x geq 42 &mapsto& 1
end{cases}$$

Clearly $X neq Y$ and $F_X neq F_Y$. In the above discrete example, if I understand the language correctly, there is a single probability space $(Omega,mathcal F,P)$ with a single probability measure $P$ which underlies or supports two distinct probability distributions $F_X$ and $F_Y$ for two distinct random variables $X$ and $Y$.

Now let $(Omega, mathcal F, P)$ be a probability space underlying random variables $X$ and $Y$ where:

  • Random variable $X: Omega rightarrow Re$ is such that $X$ has the uniform distribution $F_X: Re rightarrow [0,1]$ such that

$$F_X(x) = P({omegainOmega:X(omega)<x}) = begin{cases}0 &:& x < a \
frac{x-a}{b-a} &:& a leq x leq b \
1 &:& b < x
end{cases}$$

  • Random variable $Y: Omega rightarrow Re$ is such that $Y$ has the exponential distribution $F_Y: Re rightarrow [0,1]$ such that

$$F_Y(x) = P({omegainOmega:Y(omega)<x}) = begin{cases}0 &:& x < 0 \
1-e^{-lambda x} &:& x geq 0
end{cases}$$

Also, per comment below, one distribution can be supported by multiple probability spaces. (The key understanding here for me is that probability space and probability distribution are separate constructions.)

My questions are (and some answers that I take from my reading of the solutions below):

Q1. Is $(Omega, mathcal F, P) = (Re, mathcal B(Re), mu)$ where $mathcal B(Re)$ is the Borel set of the reals and $mu$ is the Lebesgue measure a probability space which underlies $X$ and $Y$? Answer: No, but the Steinhaus $([0,1], mathcal B([0,1]), mu)$ is good.

Q2. Is it correct to call $(Re, mathcal B(Re), mu)$ the standard probability space of the reals? Is there some other standard notation or language for the probability space underlying the usual continuous probability distributions? Answer: No, but the Steinhaus space is a standard space in the Wikipedia sense.

Q3. Is it correct to say that the notion of probability space is independent of and complementary to the notion of probability distribution, and that the notion of probability distribution is always associated with a particular random variable $X$ presented with a supporting probability space $(Omega, mathcal F, P)$? Answer: Kind of. One distribution can be accompanied by many probability spaces. One probability space can be accompanied by many distributions. I’m using "accompanied" because the worked "supported" may be overloaded in math. I’m looking for some compact synonym of "independent and complementary". The main thing is to demonstrate through examples that the relationship is many-to-many.

5 Answers

In applications of Probability theory, the probabilistic space is seldom specified, it sits there in the background; however, at least conceptually, one may still what key characteristics the underlying space are based on the kinds of things we are observing, and the kinds of things we want to measure.

For theoretical purposes, one often needs to have a precise description of the underlying probability space in order to use known results, verify conditions, or further advance the theory (new theorems, concepts, etc).

It turns out that most theoretical results can be obtained by considering the Steinhaus space $$((0,1),mathscr{B}(0,1),lambda)$$ where $mathscr{B}(0,1)$ is the Borel $sigma$-algebra in $(0,1)$, and $lambda$ is the Lebesgue measure (length measure) restricted to the interval $(0,1)$, as the underlying probability space ( a canonical probabilty space of sorts). By that I mean that one can explicitly generate random samples with values any prescribed distribution, as well as represent conditional expectation by randomization (generation of uniform distributions).

The problem of existence an generation of stochastic processes is a more subtle problem; however, one may use copies of $((0,1),mathscr{B}(0,1))$ with a consistent prescription of finite dimensional distributions to explicitly define a stochastic process on the product of copies of $((0,1),mathscr{B}(0,1)$ with the prescribed finite dimensional distributions.

Here is an attempt to give a an overview of all this.


  1. Generation of i.i.d. Bernoulli random variables (tossing a fair coin):

First notice that in the Steinhause space, the function $theta(x)=x$ is obviously uniformly distributed $U[0,1]$, that is $lambda[thetaleq x] =x$, for all $0<x<1$.

Recall that every $xin[0,1]$ has a unique binary expansion $$x=sum_{ngeq1}r_n/2^n$$ where $r_nin{0,1}$, and $sum_{ngeq1}r_n=infty$ for $x>0$. For each $ninmathbb{N}$, the $n$--th bit map $xmapsto r_n(x)$ defines a measurable function from $([0,1],mathscr{B}([0,1]))$ to $({0,1},2^{{0,1}}))$, where $2^{{0,1}}$ is the collection of all subsets of ${0,1}$.

Therefore, the map $beta:[0,1]rightarrow{0,1}^{mathbb{N}}$ given by $xmapsto(r_n(x))$ is measurable.

The next result is a mathematical formulation of tossing a fair coin.

Lemma 1: Suppose $thetasim U[0,1]$, and let ${X_n=r_ncirctheta}$ its binary expansion. Then, ${X_n}$ is an i.i.d. Bernoulli sequence with rate $p=tfrac12$. Conversely, if $(X_n)$ is an i.i.d. Bernoulli sequence with rate $p=tfrac12$, then $theta=sum_{ngeq1}2^{-n}X_nsim U[0,1]$.

Here is a short proof:

Suppose that $thetasim U(0,1)$. For any $Ninmathbb{N}$ and $k_1,ldots,k_Nin{0,1}$, $$begin{align} bigcap^N_{j=1}{xin(0,1]:r_j(x)=k_j}&=&(sum^N_{j=1}tfrac{k_j}{2^j}, sum^N_{j=1}tfrac{k_j}{2^j}+tfrac{1}{2^N}]\ {xin(0,1]: r_N(x)=0}&=&bigcup^{2^{N-1}-1}_{j=0}(tfrac{2j}{2^N},tfrac{2j+1}{2^N}]\ {xin(0,1]:r_N(x)=1}&=&bigcup^{2^{N-1}-1}_{j=0} (tfrac{2j+1}{2^N},tfrac{2(j+1)}{2^N}] end{align} $$ It follows immediately that $ mathbb{P}[bigcap^N_{j=1}{X_j=k_j}]=tfrac{1}{2^N}=prod^N_{j=1}mathbb{P}[X_j=k_j]$. Hence ${X_n}$ is a Bernoulli sequence with rate $tfrac12$.

Conversely, suppose ${X_n:ngeq1}$ is a Bernoulli sequence with rate $tfrac12$. If $widetilde{theta}sim U(0,1)$, then the first part shows that the sequence of bits ${widetilde{X}_n}stackrel{law}{=}{X_n}$. Therefore, $$ theta:=sum_{ngeq1}2^{-n}X_nstackrel{law}{=} sum_{ngeq1}2^{-n}widetilde{X}_n=widetilde{theta} $$ since $theta$ is a measurable function of ${X_n}$.

All this shows that on the Steinhaus space one can generate explicitly Bernoulli sequences.


  1. Generation of i.i.d sequences of uniform distributions:

One we can generate i.i.d sequences of Bernoulli random variables defined on the Steinhaus space, we can now generate i.i.d sequences of uniform random variables also defined on the Steinhaus space.

Lemma 2: There exist a sequence $(f_n)$ of measurable functions on $[0,1]$ such that for any $thetasim U[0,1]$, $(f_n(theta))$ is an i.i.d sequence random variables with $f_1(theta)sim U[0,1]$.

Here is a short proof:

Reorder the sequence $(r_m)$ of binary bit maps into a two--dimensional array $(h_{n,j}:n,jinmathbb{N})$, and define the function $f_n:=sum_{jgeq1}tfrac{h_{nj}}{2^j}$ on $[0,1]$ for each $n$. From the fist Lemma, ${X_n=r_ncirctheta}$ forms a Bernoulli sequence with rate $p=tfrac12$. Thus, the collections $sigma(X_{nj}:jgeq1)$ are independent. By the first Lemma, it follows that $(f_n)$ is an i.i.d. sequence of $U[0,1]$ random variables.


  1. Generation of any distribution on the real line:

For any probability space $(Omega,mathscr{F},mathbb{P})$ and random variable $X:(Omega,mathscr{B})rightarrow(mathbb{R},mathscr{B}(mathbb{R})$, the law or distribution of $X$ is the measure $mu_X$ on $(mathbb{R},mathscr{B}(mathbb{R}))$ defined by $$mu_X(B)=mathbb{P}[Xin B],quad Binmathscr{F}$$

One can generate a random variable $Q:((0,1),mathbb{R}((0,1),lambda)rightarrow(mathbb{R},mathscr{B}(mathbb{R})$ such that the law of $Q$ is $mu_X$. This may be done by the "quantile function"

$$Q(t)=infbig{xinmathbb{R}: mathbb{P}[Xleq x]geq tbig},quad 0<t<1$$ $Q$ is non-decreasing, right continuous and has left limits. More importantly, $Q$ satisfies

$$ F(x):=mathbb{P}[Xleq x]geq t quadtext{iff}quad Q(t) leq x $$

Form this, it follows that $$lambda[Qleq x]:=lambdabig({tin(0,1): Q(t)leq x}big)=lambdabig({tin(0,1): tleq F(x)}big)=F(x)$$ and so $Q$ has the same distribution function as $X$.

Particular examples are:

  • $Phi(x)=frac{1}{2pi}int^x_{-infty}e^{-t^2/2},dt$. $Phi$ is continuous and strictly monotone increasing. It has then a continuous and strictly increasing inverse. Then $Q(t)=Phi^{-1}(t)$, $0<t<1$, is a random variable defined in the Steinhaus space that has the Normal distributions.

  • $F(x)=1-e^{-x}$ is strictly monotone increasing and has inverse $F^{-1}(t)=-log(1-t)$. Then $Q(t)=F^{-1}(t)$ is a random variable defined on the Steinhaus space and has exponential distribution.


  1. Generation independent sequences of random variables with any prescribed distribution.

Using (2) and (3) we can generate in random variables with any distribution (over $(mathbb{R},mathscr{B}(mathbb{R})$).

Corollary 3. Suppose that $(S_n,mathscr{S}_n,,u_n):=(mathbb{R},mathscr{B}(mathbb{R}),mu_n)$, $ninmathbb{N}$ are Borel probability spaces. Then, there is a map $F:((0,1),mathscr{B}((0,1)),lambda)rightarrow (prod_nS_n,bigotimes_nmathscr{S}_n)$ such that the projections $p_n:mathbf{s}mapsto s_n$, form an independent sequence of random variables on $big(prod_nS_n,bigotimes_nmathscr{S}_n,mubig)$, $mu=lambdacirc F^{-1}$, with $p_nstackrel{d}{=}mu_n$.

Here is a short proof:

Lemma 2 provides a $U[0,1]$--distributed i.i.d. sequence $(f_n)$ of random variables defined on the Steinhaus space. Part 3 shows that for each $n$, there is a map $Q_n:(0,1)rightarrow mathbb{R}$ such that $lambdacirc Q^{-1}_n=mu_n$. The map $F$ given by $xmapsto(Q_n(f_n(x)))$ has the stated properties.


(1) through (4) illustrate that all the basic tools od Probability theory -sampling, law of large numbers for i.i.d sequences, central limit theorem for i.i.d sequences among others- can be developed using the Steinhaus as canonical space.

The next part of the presentation is more subtle and I will skip details by adding references. On one end we illustrate how conditional expectation can be performed by randomization; on the other end, we show how stochastic processes can be constructed.


  1. There is a deep result in Measure theory that states that Borel sets of complete separable metric spaces are measurable isomorphic to $((0,1),mathscr{B}(0,1))$ (if uncountable) or a to a countable subset of $((0,1),mathscr{B})$. This provides another justification for the use of $((0,1),mathscr{B}(0,1))$ as a canonical measurable space. Spaces that are measurably isomorphic to a Borel subset of $(0,1)$ are called Borel spaces.

In particular, in part (4) we can substitute $(mathbb{R},mathscr{B}(mathbb{R}),mu_n)$ by Borel probability spaces, for examples $(S_n,mathscr{B}(S_n),mu_n)$, where $S_n$ is a complete metric space (Polish space) space equipped with its Borel $sigma$-algebra, and $mu_n$ a probability measure on $(S_nmathscr{B}(S_n))$.


  1. Regular conditional expectation:

Another deep result in Probability is the fact that if $(Omega,mathscr{F},mathbb{P})$ is a probability space, and $(S,mathscr{B}(S))$ is a Polish measurable space ( $S$ is a Polish spaced equipped with the Borel $sigma$-algebra), and $mathscr{A}$ is a sub $sigma$-algebra of $mathscr{F}$, then there is a stochastic kernel $nu:Omegatimesmathscr{B}(S)rightarrow[0,1]$ from $(Omega,mathscr{A})$ tp $(S,mathscr{B}(S))$ such $$nu(omega,A)=mathbb{P}[Xin A|mathscr{A}]qquad mathbb{P}-text{a.s.}$$ for all $Ainmathscr{A}$. Here, the map $omegarightarrownu(omega,A)$ is $mathscr{A}$--measurable for any foxed $A$.

This allows for a desintegration formula

Suppose $(S,mathscr{S})$ is a Polish measurable space and $(T,mathscr{T})$ beisany measurable space. Let $mathscr{A}subsetmathscr{F}$ sub--$sigma$--algebra. Let $X:(Omega,mathscr{F})rightarrow(S,mathscr{S})$ be a random variables in $S$ (the observation above guarantees that $mathbb{P}[Xincdot|mathscr{A}]$ has a regular version $nu$). If $Y:(Omega,mathscr{A})rightarrow(T,mathscr{T})$ and $f:(Stimes T,mathscr{S}otimesmathscr{T})rightarrowmathbb{C}$ are functions such that $mathbb{E}[|f(X,Y)|]<infty$ then, $$begin{align} mathbb{E}[f(X,Y)|mathscr{A}](cdot) &=int_S f(x,Y(cdot))nu(cdot,dx)qquad text{$mathbb{P}$--a.s.}label{conditional}\ mathbb{E}[f(X,Y)]&=int_OmegaBig(int_S f(x,Y(omega))nu(omega,dx)Big)mathbb{P}(domega)tag{7}label{disintegration} end{align} $$ If $mathscr{A}=sigma(Y)$ and $mathbb{P}[Xin dx|sigma(Y)]=nu(Y(omega),dx)$ for some stochastic kernel from $(T,mathscr{T})$ to $(S,mathscr{S})$ then, $$begin{align} mathbb{E}[f(X,Y)|sigma(Y)](cdot) &= int_S f(x,Y(cdot))mu(Y(cdot),dx) qquadtext{$mathbb{P}$--a.s.}\ mathbb{E}[f(X,Y)] &=int_OmegaBig(int_S f(x,Y(omega))mu(Y(omega),dx)Big)mathbb{P}(domega) end{align} $$ If $X$ and $Y$ are independent then, $mu(Xin dx|sigma(Y)](cdot)=mathbb{P}[Xin dx]$ $mathbb{P}$--a.s.


  1. Randomization:

Stochastic kernels $nu$ from any measure space $(T,mathscr{T})$ to a Borel space $(S,mathscr{S})$ can also be generated on the Steinhaus space.

Lemma 4. Let $mu$ be a stochastic kernel from a measure space $S$ to a Borel space $T$. There is a function $f:Sotimes[0,1]rightarrow T$ such that if $thetasim U[0,1]$, then the law of $f(s,theta)$ is $nu(s,cdot)$.

Here is a short proof:

By part (5) it suffices to assume $(S,mathscr{S})$ is the $((0,1),mathscr{B}(0,1))$, for there is bijection $phi:(0,1),mathscr{B}((0,1))longrightarrow(S,mathscr{S})$ such that $phi$ and $phi^{-1}$ are measurable in which case we replace $nu$ by $eta(s,B):=nu(s,phi(B))$. Let $g:Ttimes (0,1):rightarrow mathbb{R}$ be defined as the quantile tranformation $$g(t,s)=inf{xin(0,1): nu(t,(-infty,x])geq s}$$ Since $g(t,s)leq x$ iff $nu(t,(-infty,x])geq s$, the measurability of the map $smapstonu(s,(-infty,x])$ implies that $g$ is $mathscr{T}otimesmathscr{B}big((0,1)big)$ measurable. If $thetasim U[0,1]$ (for example, the identity function $theta(t)=t$ on the Steinhaus space), then $$ Pr[g(theta,t)leq x]=Pr[thetaleqnu(t,(-infty,x])]=nu(t,(-infty,x]) $$ This shows that $g(theta,t)sim nu(t,dx)$. Therefore, for $f:=phicirc g$, $f(theta,t)simnu(t,ds)$.


  1. Existence of stochastic process:

Suppose ${(S_t,mathscr{S}_t):tinmathcal{T}}$ is a collection of Borel spaces. For each $mathcal{I}subsetmathcal{T}$. Denote by $(S_mathcal{I},mathscr{S}_I)=big(prod_{tinmathcal{I}}S_t$, $bigotimes_{tinmathcal{I}}mathscr{S}_tbig)$ and let $p_{mathcal{I}}:S_mathcal{T}longrightarrow S_{mathcal{I}}$ be the projection $(s_t:tinmathcal{T})mapsto(s_t:tinmathcal{I})$. A family of probability measures ${mu_mathcal{J}:mathcal{J}subsetmathcal{T},,text{$mathcal{J}$ finite or countable}}$ on $mathscr{S}_mathcal{J}$ is projective if $$ mu_{mathcal{J}}big(cdottimes S_{mathcal{J}setminusmathcal{I}}big) =mu_{mathcal{I}}big(cdotbig),qquad mathcal{I}subsetmathcal{J} $$ for any finite or countable $mathcal{J}subsetmathcal{T}$.

A deep theorem due to Kolmogorov establishes the existence of stochastic process

Theorem 5. Suppose ${(S_t,mathscr{S}_t):tinmathcal{T}}$ is a family of Borel spaces. If ${mu_mathcal{I}:mathcal{I}subsetmathcal{T},,text{$mathcal{I}$ finite}}$ is a projective family of probability measures on $mathscr{S}_mathcal{I}$, then there exists a unique probability measure $mu$ on $mathscr{S}_mathcal{T}$ such that $$ mucirc p^{-1}_mathcal{I}=mu_mathcal{I} $$ for any finite $mathcal{I}subsetmathcal{T}$.

By Part 5, all can be made into copies of a Borel subset of $(0,1)$ or $mathbb{R}$. In such case, the canonical space for stochastic process ${X_t:tinmathcal{T}}$ can be chosen as $big((0,1)^mathcal{T},mathscr{B}^{otimesmathcal{T}}(0,1)big)$ or $big(mathbb{R}^mathcal{T},mathscr{B}^{otimesmathcal{T}}(mathbb{R})big)$


References:

  1. Kallenberg's, Foundations of modern probability covers the probabilitistic aspects of 1 to 8. His proves can be consider probabilistic (as oppose purely to measure theoretic). In particular his proof of Kolmogorov's extension relies in purely probabilistic constrictions.
  2. Parthasaraty's, Probability on Metric spaces is a good reference for the measurable isomorphic theorem that in essence reduces any nice probability space to the measurable space $((0,1),mathscr{B}(0,1))$.
  3. Leo Breiman's classic Probability covers also beautifully Kolmogorov's extension theorem and many aspects of the points I discussed above.

Answered by Oliver Diaz on December 15, 2021

Since Q1 and Q2 are well answered by other answers, I would like add some more details about Q3. Hope I correctly grasped the point of your question.


Although the meaning of distribution slightly varies across the literature and is sometimes misused, we can give a satisfactory definition that works in any abstract setting.

Let $X : Omega to mathcal{S}$ be a $mathcal{S}$-valued random variable from the probability space $(Omega, mathcal{F}, P)$ to a measurable space $(mathcal{S}, Sigma)$. In other words, it is a measurable function from $(Omega, mathcal{F})$ to $(mathcal{S}, Sigma)$.1) Then $X$ induces a probability measure $mu$ on $(mathcal{S}, Sigma)$ via2)

$$ forall E in Sigma : quad mu(E) = P(X in E) = P(X^{-1}(E)) = P({omegainOmega : X(omega) in E}). $$

Then this $mu$ is called the distribution of $X$.

Example 1. Let $Omega = {-1, 0, 1, 2}$ be equipped with the power-set $sigma$-algebra $mathcal{F}=2^{Omega}$ and the normalized counting measure $P(E) = frac{1}{4}#E$. Then

  • $X_1 : Omega to mathbb{R}$ defined by $X_1(omega) = omega$ has the distribution $mu_1$ on $mathbb{R}$ given by $$ mu_1(E) = frac{1}{4} mathbf{1}_{{-1 in E}} + frac{1}{4} mathbf{1}_{{0 in E}} + frac{1}{4} mathbf{1}_{{1 in E}} + frac{1}{4} mathbf{1}_{{2 in E}} $$ for any Borel subset $E$ of $mathbb{R}$.

  • $X_2 : Omega to mathbb{R}$ defined by $X_2(omega) = omega^2$ has the distribution $mu_2$ on $mathbb{R}$ given by $$ mu_2(E) = frac{1}{4} mathbf{1}_{{0 in E}} + frac{1}{2} mathbf{1}_{{1 in E}} + frac{1}{4} mathbf{1}_{{4 in E}} $$ for any Borel subset $E$ of $mathbb{R}$.

  • $X_3 : Omega to {0,1,4}$ defined by $X_3(omega) = omega^2$ has the distribution $mu_3$ on $mathcal{S}={0,1,4}$ given by $$ mu_3(E) = frac{1}{4} mathbf{1}_{{0 in E}} + frac{1}{2} mathbf{1}_{{1 in E}} + frac{1}{4} mathbf{1}_{{4 in E}} $$ for any subset $E$ of $mathcal{S}$.3)

Example 2. Let $Omega=[0,1]^2$ be equipped with the probability measure $P$ which is the Lebesgue measure restricted onto $[0, 1]^2$. Then

  • $X_4 : Omega to mathbb{R}$ defined by $$ X_4(omega_1, omega_2) = begin{cases} 0, & text{if } omega_1 in [0, frac{1}{4}); \ 1, & text{if } omega_1 in [frac{1}{4}, frac{3}{4}); \ 4, & text{if } omega_1 in [frac{3}{4}, 1); \ 2020, & text{if } omega_1 = 1; end{cases} $$ has the same distribution as $X_2$.

  • $X_5, X_6 : Omega to mathbb{R}$ defined by $$ X_5(omega_1, omega_2) = begin{cases} -log omega_1, & text{if } omega_1 in (0, 1]; \ 42, & text{if } omega_1 = 0; end{cases} qquad X_6(omega_1, omega_2) = begin{cases} -log (1-omega_2), & text{if } omega_2 in [0, 1); \ 1, & text{if } omega_2 = 1; end{cases} $$ have the same distribution, which is the exponential distribution of unit rate. In other words, they induce the same probability measure $mu_{5}$ on $mathbb{R}$ defined by $$mu_{5}(E) = int_{E} e^{-x} mathbf{1}_{(0,infty)}(x) , mathrm{d}x $$ for any Borel subset $E$ of $mathbb{R}$.

    The information about $mu_5$ may be encoded in a different way using the cumulative distribution function (CDF). The CDF $F_{X_5}$ of $X_5$ is given by $$ F_{X_5}(x) = P(X_5 leq x) = mu_5((-infty, x]) = begin{cases} 0, & text{if } x < 0; \ 1 - e^{-x}, & text{if} x geq 0; end{cases} $$ Of course, we have $F_{X_5} = F_{X_6}$ in this example.

  • Define $X_7 : Omega to mathbb{R}^2$ by $X_7(omega) = (X_5(omega), X_6(omega))$. Then its distribution $mu_7$ is given by $$ mu_7(E) = iint_{E} e^{-x-y}mathbf{1}_{(0,infty)^2}(x,y) , mathrm{d}xmathrm{d}y $$ for any Borel subset $E$ of $mathbb{R}^2$. It turns out that $mu_7 = mu_5 otimes mu_5$ is the product of two copies of $mu_5$, and its probabilistic implication is that $X_5$ and $X_6$ are independent.

Example 3. Let $mu$ be any probability distribution on $mathbb{R}$, and let $(Omega, mathcal{F}, P) = (mathbb{R}, mathcal{B}(mathbb{R}), mu)$. Also define $X_8(omega) = omega$. Then $X_8$ has the distribution $mu$. For this reason, we often consider the notion of distribution without explicit reference to a random variable. For example, the standard normal distribution is the probability measure on $mathbb{R}$ defined by

$$ E mapsto int_{E} frac{1}{sqrt{2pi}}e^{-x^2/2} , mathrm{d}x $$

for any Borel subset $E$ of $mathbb{R}$. In this regard, we may as well say that the word distribution also stands for the honorable title given to a well-studied probability measure on a familiar space.

This construction also tells that, as long as we are only interested in dealing with a single random variable, the abstract notion of probability spaces is rather redundant and we can stick to this particular realization on $mathbb{R}$. However, such notion provides great flexibility in developing various concepts under a unified framework and allowing to deal with them systematically.


1) If the term 'measurable space' is not familiar to you, you may regard $(mathcal{S}, Sigma)$ as the Euclidean space $(mathbb{R}^d,mathcal{B}(mathbb{R}^d))$ equipped with the Borel $sigma$-algebra. Also, you do not worry too much about what it means by a measurable map at this point.

2) For this reason, $mu$ is sometimes called the pushforward of $P$ by $X$ and denoted by $mu = P circ X^{-1}$.

3) Technically speaking, $mu_2$ and $mu_3$ are different distributions. However, they convey the same amount of information, and so, such difference will never affect any conclusion about the 'randomness' of $X_2$ or $X_3$. My personal impression is that the choice $X_3$ seems to be preferred in elementary probability textbooks for its simplicity, whereas $X_2$ is a more common choice in the literature because this allows to compare different distributions systematically.

Answered by Sangchul Lee on December 15, 2021

First of all, a note on terminology: the (cumulative) distribution function of a random variable $X$ is usually defined as $$F_X(x) = P({omegainOmega: X(omega)leq x}.$$ Note here the $leq$ instead of $<$.

Now let's get to your questions.

Q1: $(mathfrak{R}, mathfrak{B}(mathfrak{R}), mu)$ is not a probability space, because $mu(mathfrak{R}) = infty.$ Instead, what we usually take is $$([0, 1], mathfrak{B}([0, 1]), mu),$$ where $mu$ is Lebesgue measure restricted to $[0, 1]$. This space can underly any probability distribution on $mathfrak{R}.$ Note first of all that the identity function $omegamapsto omega$ itself is a real-valued random variable and that it has a uniform distribution on $[0, 1].$ If we now know two distribution functions $F_X$ and $F_Y,$ then $$X = F^{-1}_X(omega), quad Y = F^{-1}_Y(omega)$$ have distribution functions $F_X$ and $F_Y$ respectively. $F^{-1}_X$ here denotes the generalized inverse of $F_X.$ To see that this is true, see here. This means that this space indeed underlies $X$ and $Y$.

Q2: This space does not satisfy the definition of a standard probability space that you mention, since it is not complete. However, $(mathfrak{R}, mathfrak{B}(mathfrak{R}), P_X)$ can be called a canonical space for the random variable $X$ in the context of stochastic processes. Here, $P_X$ is the distribution of $X$ (which is a measure on $mathfrak{R}$). That is, $P_X((-infty, a]) = F_X(a),$ which is enough to define $P_X$ on $mathfrak{B}(mathfrak{R}).$ Then the identity $omega mapsto omega$ has distribution $F_X$ on this space. More generally, if you have a sequence of random variables $X_1, ..., X_n,$ the canonical probability space is $(mathfrak{R}^n, mathfrak{B}(mathfrak{R}^n), P_X),$ where $P_X$ is the distribution of the vector $(X_1, ..., X_n),$ defined by $$P_X((-infty, a_1]times ... times (-infty, a_n]) = P(X_1leq a_1, ..., X_nleq a_n).$$ Again, the identity then has the same distribution as the vector $(X_1, ..., X_n).$ So you can generealize this idea to a space for multiple random variables.

Q3: probability spaces and distributions are not independent, because as you note, we require probability spaces to be able to define distributions. That is, theoretically, we first construct a probability space $(Omega, mathcal{F}, P).$ Then we define a random variable $X: Omegato mathfrak{R}$ and we can consider its distribution function $F_X(x) = P({omegainOmega: X(omega)leq x})$. That is, a distribution requires the existence of a probability space with a random variable. However, in practice, it suffices to only consider the distribution and forget about the underlying probability space, but this is not always the case, especially when you start getting into stochastic processes and you need to be a bit more careful about measurability concerns. Furthermore, note that a distribution is not associated to a particular probability space and random variable, it just requires that there exists one.

In practice, we usually forget about the fact that such a probability space needs to exist, because it turns out that for any potential distribution function $F:mathfrak{R}to [0,1]$ that is non-decreasing, right-continuous with $lim_{xto-infty}F(x) = 0, lim_{xtoinfty}F(x)=1$, there exists a probability space with a random variable such that it has cumulative distribution function $F.$ We have actually already seen this: the construction in Q1 works for any such $F.$ Hence, we can just dream up a function satisfying these requirements and we can be certain that there exists some probability space with a random variable with that function as its distribution function.

Answered by Dasherman on December 15, 2021

Regarding your first question, I am assuming you meant to use the space $[0,1]$ rather than the whole set of reals (otherwise, it would not be a probability space). Besides that, or the most part, it does not matter. More precisely, given any real-valued random variable $X$, you can find a random variable $X'colon [0,1]to mathbf R$ with the same distribution.

The same is true for random variables with values in any standard Lebesgue space, and in particular, any separable metric space. This implies that given any sequence $(X_n)_n$ of random variables $Omegato mathbf R$, you can find a sequence $(X_n')_n$ of random variables $[0,1]to mathbf R$ with the same joint distribution.

On the other hand, it is not hard to see that there is no sequence $(X_alpha)_{alpha<mathfrak c^+}$ of nontrivially i.i.d. random variables $[0,1]to mathbf R$. It should probably not be too hard to argue that there is no such uncountable sequence, even much shorter than $mathfrak c^+$. So restricting the domain of the random variables does restrict the things we can see.

Since the structure of the domain (as opposed to the join distribution of variables) is usually mostly immaterial in probability theory, it is usually more convenient to leave the domain unspecified and implicit.

Regarding your second question, if there is a "the" standard probability space, then it would either be $[0,1]$ with the Lebesgue measure or ${0,1}^{mathbf N}$ with the usual Haar/coin toss measure. Still, usually, you would speak of "a" standard probability space.

I'm not sure whether I understand your third question. The basic notion is that of a measurable space. Using this, we can define the notion of a measurable function (= random variable), a probability space (= a measurable space with a probability measure), and using those two, we can define the probability distribution (= the pushforward of the probability measure via the random variable). So I would not call these notions independent.

Answered by tomasz on December 15, 2021

Some concepts/definitions that might help:

A probability measure on $left(mathbf{R}^d, mathcal{B}(mathbf{R}^d) right)$ is called distribution. The triplet obtained can be called a distribution space to distinguish it from the general probability space.

Typical distributions are built from Lebesgue measure and $mathcal{B}(mathbf{R}^d)$-measurable functions $h:mathbf{R}^drightarrow [0,infty) $ with $$ int_{mathbf{R}^d} h(x) mu(dx) =1$$ by $$ P_h(B) = int_B h(x) mu(dx) $$ for all $Bin mathcal{B}(mathbf{R}^d)$.

An example of distribution that cannot be built this way is Dirac's distribution concentrated at some point $x_0 in mathbf{R}^d$:

$$ delta_{x_0} (B) = 1_{x_0in B}$$ for all $Bin mathcal{B}(mathbf{R}^d)$.

Also, given probability space $left(Omega, mathcal{F}, Pright)$ and $X:Omegarightarrow mathbf{R}^d$ which is $mathcal{F}/mathcal{B}(mathbf{R}^d)$-measurable, one can build a distribution $P_X$ as follows:

$$ P_X = P circ X^{-1}, $$

usually called the distribution of $X$ (or law of $X$), which suggests that now one can focus only on the distribution space $left(mathbf{R}^d, mathcal{B}(mathbf{R}^d), P_X right)$.

Note: If $Omega = mathbf{R}^d, mathcal{F} = mathcal{B}(mathbf{R}^d)$ and $P$ is a distribution, then taking $X$ to be the identity function, $id$, we have:

$$ P_{X} = P.$$

Note 2: Two random variables, possibly defined on different spaces, can have the same distribution (law).

If $X$ is defined on an abstract space $left(Omega, mathcal{F}, Pright)$ as above, it induces distribution $ P_X$.

Then random variable $id$ defined on $left(mathbf{R}^d, mathcal{B}(mathbf{R}^d), P_X right)$ has the same distribution.

Many models rely on knowing the distribution of a random variable $X$ rather than its explicit form and the probability space on which it is defined.

Note 3: To answer Q3, I guess, we have the following facts:

  1. A distribution space is just a particular case of probability space.

  2. Yes, for a distribution, be it $P_h$ or Dirac type, there is always a random variable on a 'supporting' probability space that induces the same distribution: we take the probability space to be the starting distribution space itself and the random variable to be the identity function.

  3. (Complementing Note 2) If $A,Bin mathcal{F}$ are different events such that $P(A)=P(B)$, then $$1_A not= 1_B,$$ but they are random variables with the same distribution, that is

$$ P_{1_A} = P_{1_B}.$$

  1. If $alpha: left(mathbf{R}^d, mathcal{B}(mathbf{R}^d)right) rightarrow left(mathbf{R}^f, mathcal{B}(mathbf{R}^f) right) $ is measurable, then

$$ P_{alpha circ X} = P_X circ alpha^{-1}. $$

Note 4: I finally realized that you are focusing on the distribution function.

A function $F:mathbf{R}rightarrow mathbf{R}$ which is non-decreasing, bounded, left-continuous and for which $$lim_{xrightarrow -infty} F(x) = 0$$ is called a distribution function. This definition stands on its own (no mention of measures).

The following facts can be proven.

Fact: Let $F$ be a distribution function such that $$lim_{xrightarrow infty} F(x) = 1.$$ Let also $m$ be a measure on $left((0,1), mathcal{B}((0,1))right)$ such that $$ m((0,x))=x $$ for all $xin (0,1]$ (its existence can be proven). Then there is a non-decreasing function $f:(0,1) rightarrow mathbf{R}$ such that measure $mcirc f^{-1}$ has $F$ as distribution function, that is

$$ (mcirc f^{-1})((-infty,x)) = F(x)$$

for all $xin mathbf{R}$.

Fact 2: A measure $mu$ on $(mathbf{R}, mathcal{B}(mathbf{R}))$ is perfectly determined by its distribution function $F_mu$ defined as $$ F_mu(x) = mu ((-infty,x)) $$ for all $xin mathbf{R}$. That is, if two measures on $(mathbf{R}, mathcal{B}(mathbf{R}))$ have the same distribution function, they coincide.

These suggests that specifying the triplet

$$left(mathbf{R}, mathcal{B}(mathbf{R}), mcirc f^{-1}right)$$

for some non-decreasing $f$ or rather a distribution function $F$ (with $lim_{xrightarrow infty} F(x) = 1$, for which we know such $f$ exists) is the essential step in setting up any distribution space.

For a random variable on an abstract probability space, $X:(Omega, mathcal{F}, P) rightarrow (mathbf{R}, mathcal{B}(mathbf{R}))$, as soon as we get $P_X$, the associated distribution, and $F_X$ its distribution function, as defined in the book, we are done (can forget about $X$, in some sense; basically replace it with $id$ introduced in Note 2, as it has the same distribution). Note that:

$$ F_X = F_{P_X} $$

with the second term defined above (in Fact 2).

Answered by ir7 on December 15, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP