Random variables

Formalizing continuous and discrete random variables, distributions, and related topics

Git is a version control system (VCS) that helps track changes while developing software, making development quicker and easier

Random Variables

A random variable is a real valued function defined on the sample space of some experiment. That is, it is a function that maps the outcomes of an experiment to real numbers. It is a variable in that its value depends on the outcome of the experiment, and random in that the experiment on which it is defined is a random process. More formally, a random variable X is a function mapping from a sample space to (typically) the set of real numbers

X:ΩRX : \Omega \mapsto \mathbb{R}

The probability that X takes on a value in a set of real numbers S is

P(XS)=P({ωΩX(ω)S})P(X \in S) = P(\{ \omega \in \Omega | X(\omega) \in S \})

Discrete Random Variables

Discrete random variables are random variables that can take on at most a countable number of values.

Probability mass function

The probability mass function (pmf) is a function that gives the probability of a discrete random variable taking on a particular value. The pmf is the primary way in which discrete probability distributions are characterized or defined. For a discrete random variable X

fX:R[0,1],      fX(x)=P(x)=P(X=x)=P({ωΩX(ω)=x})f_X : \mathbb{R} \mapsto [0,1], \;\;\; f_X(x) = P(x) = P(X=x) = P(\{\omega \in \Omega | X(\omega)=x\})

In line with the axioms of probability, it holds that

P(x)0    and    i=1P(xi)=1P(x) \ge 0 \;\; \text{and} \;\;\sum^{\infty}_{i=1}{P(x_i)} = 1

Cumulative distribution function

pass

Expected value

The expected value of a random variable is a probability-weighted average of the values it can take on.

E[X]=xxP(x)\mathbb{E}[X] = \sum_{\forall x}{xP(x)}

Expected value properties

  • Linearity of expectation

Expectation of a function of a random variable

For a random variable X and some real valued function g, we have that the expected value of g(X)

E[g(X)]=i=1g(xi)p(xi)\mathbb{E}[g(X)] = \sum_{i=1}^{\infty}{g(x_i)p(x_i)}

We first transform the value of X using g, and then make use of the probability mass function of X to weight this new value. This shows, for example, that 𝔼[X²] is not necessarily equal to 𝔼[X]².

Variance

The variance of a random variable is the expected squared deviation of its values from its mean. That is

Var(x)=E[(XE[X])2]=E[X2]E[X]2\text{Var}(x) = \mathbb{E}[(X-\mathbb{E}[X])^2] = \mathbb{E}[X^2]-\mathbb{E}[X]^2

The variance is a common measure of spread among a dataset, indicating how far (on average) the values deviate (squared distance) from their average.

Variance value properties

  • Covariance
  • Density and Likelihood

Continuous Random Variable

Continuous random variables are random variables whose set of possible values is uncountable.

Probability Density Function

The probability density function (PDF) of a random variable is a function that maps points in the random variable’s codomain to values that can be interpreted as the relative likelihood of observing that point. Note that the absolute probability of a continuous random variable taking on any particular value is zero; there are uncountably many possible outcomes, and all outcomes cannot be assigned any positive finite value (note that for a PMF with countably infinite possible outcomes, we can assign positive finite values to all outcomes so long as their infinite sum is one). But then how can the total probability sum up to one if the probability of all outcomes is zero? This is the typical contradicting dichotomy one encounters when attempting to think about PDFs.

It turns out we need to shift our focus to a different question. We can’t really ask about the probability of any particular value occurring, since that probability is zero (and isn’t very useful). We can instead think tractably by asking about the probability that the random variable takes on a value within a range of values, or the probability that the random variable is close to a particular value (dealing in sets of uncountable many objects). That is, we can define a function f such that

P(XB)=BfX(x)dxP(a<X<b)=abfX(x)dxP(X\in B) = \int_B{f_X(x)dx} \\ P(a < X < b) = \int_a^b{f_X(x)dx}

Here f is the probability density function of X. f is defined such that the integral between any two points gives the probability that X takes on a value from within that interval. This aligns with the notion that any particular value has no chance of occurring: as the interval shrinks around a single value, the integral on that interval goes to zero. That is,

P(X=a)=P(a<X<a)=aafX(x)dx=0P(X=a) = P(a < X < a) = \int_a^a{f_X(x)dx} = 0

We can also formulate the scenario where X takes on a value in an infinitesimally small range

P(X=a)P(a<X<a+dx)=aa+dxfX(x)dx=fX(a)dxP(X=a) \approx P(a < X < a+dx) = \int_a^{a+dx}{f_X(x)dx} = f_X(a)dx

which is like taking the area of a single one of the infinitesimally thin slices used to compute an integral.

f also must integrate to one when considering the entire sample space:

P(<X<)=fX(x)dx=1P(-\infty < X < \infty) = \int_{-\infty}^\infty{f_X(x)dx} = 1

In summary, the PDF is a function that, when integrated over an interval, gives the probability of a continuous random variable (having a density according to the PDF) of taking on a value in that interval. One can liken the notion of a PDF to something like a velocity curve: we can integrate the velocity curve between two points in time to find the total distance traveled between those two times. However, if we focus in on any particular point in time, we might have some notion of instantaneous velocity (i.e. the value of the velocity function at that point in time) but the integral is zero, as there’s no time for that velocity to act and thus no movement occurs. This is just like the issue with having a probability of zero for any particular outcome while the PDF returns a value (i.e. a relative likelihood), as well as only having absolute probabilities show up when considering an interval of outcomes. More on this confusing stuff here.

Conditional Probability Density Function

fXY(xy)=fX,Y(x,y)fY(y)=P(X=x,Y=y)P(Y=y)f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)} = \frac{P(X=x,Y=y)}{P(Y=y)}

fY(y)=fX,Y(x,y)dxf_Y(y) = \int{f_{X,Y}(x,y)dx}

Note here that y must be fixed before taking the integral. It’s not that the function itself is equal to t

Joint Probability Distribution

Conditional Probability Distributions

Convolutions