Jason Chen YCHEN148@e.ntu.edu.sg
Copyright Notice
This document is a compilation of original notes by Jason Chen (YCHEN148@e.ntu.edu.sg). Unauthorized reproduction, downloading, or use for commercial purposes is strictly prohibited and may result in legal action. If you have any questions or find any content-related errors, please contact the author at the provided email address. The author will promptly address any necessary corrections.
Sample space: Ω is the set of all possible outcomes in a random phenomenon.
Example 1.1: Throw a coin 3 times, Ω = {hhh, hht, hth, thh, htt, tht, tth, ttt}
Example 1.2: Number of jobs in a print queue, Ω = {0, 1, 2, ...}
Example 1.3: Length of time between successive earthquakes, Ω = {t | t ≥ 0}
Question: What are the differences between Ω’s in these examples?
Answer:
In Example 1.1, the sample space Ω is a finite set consisting of all possible sequences of heads (h) and tails (t) when a coin is tossed three times. This is a discrete and finite sample space.
In Example 1.2, Ω is an infinite countable set representing the number of jobs in a print queue, which can be any non-negative integer. This sample space is also discrete but infinite.
In Example 1.3, Ω is a continuous set representing the possible lengths of time between successive earthquakes, where t can be any non-negative real number. This sample space is continuous and infinite.
The key differences are that Example 1.1 deals with a finite discrete sample space, Example 1.2 with an infinite discrete sample space, and Example 1.3 with a continuous infinite sample space.
Event: A particular subset of Ω.
Example 1.4: Let A be the event that the total number of heads equals 2 when tossing a coin 3 times. A = {HHT,HTH,THH}
Example 1.5: Let A be the event that fewer than 5 jobs are in the print queue. A = {0,1,2,3,4}
Probability measure: A function P from subsets of Ω to the real numbers that satisfies the following axioms:
P(Ω) = 1.
If A ⊆ Ω, then P(A) ≥ 0.
If A₁ and A₂ are disjoint, then P(A₁ ∪ A₂) = P(A₁) + P(A₂).
More generally, if A₁, A₂, ... are mutually disjoint, then P(∪ᵢ Aᵢ) = Σ P(Aᵢ).
Example 1.6: Suppose the coin is fair. For every outcome ω ∈ Ω, P(ω) = 1/8.
Ω = {hhh,hht,hth,thh,htt,tht,tth,ttt}
Let
The Law of Total Probability is an important method for calculating the probability of an event by decomposing event
Definition
If
provided that
The formula can be interpreted as "the probability of
Key Properties
Multiplication Rule:
The probability of the intersection of two events can be written as:
Bayes' Theorem:
Bayes' theorem is a powerful result that relates the conditional probability of two events:
Law of Total Probability:
If
Let
Bayes' Rule is a method to calculate the reverse probability, allowing us to update our estimation of the probability of an initial event given the occurrence of another event. This is crucial for revising probability estimates based on new evidence.
Example 1.8: Monty Hall problem
Three prisoners A, B, and C are on death row. The governor decides to pardon one of them and chooses the prisoner to pardon at random. He informs the warden of this choice but requests that the name be kept secret for a few days.
A tries to get the warden to tell him who had been pardoned. The warden refuses. A then asks which of B or C will be executed. The warden thinks for a while and tells A that B is to be executed. A then thinks that his chance of being pardoned has risen to 1/2. Is he right?
Step 1: Initial Probabilities
Step 2: Conditional Probability After Warden’s Information
If A is actually released, there is a 1/2 chance that the warden will say B will be released; if B is actually released, there is a 0% chance that the warden will say B will be released; if C is actually released, there is a 100% chance that the warden will say B will be released (you can neither admit the truth about C nor tell A the truth)
Step 3: Apply Bayes' Theorem
Using Bayes' Theorem, we can calculate the probability that
Thus:
Two events
A group of events
Independence is a crucial concept in probability theory, implying that the occurrence of one event does not affect the likelihood of the other.
A random variable is a function from the sample space
Example 1.9:
Importance of Random Variables
Statisticians need random variables because they provide a way to quantify and analyze outcomes of random phenomena. By mapping outcomes to the real line, we can use mathematical tools and techniques to study their distributions, moments, and other properties.
A random variable has a sample space on the real line. This brings special ways to describe its probability measure.
2.1 Discrete vs. Continuous Random Variables
Discrete Random Variable: Can take on only a finite or countably infinite number of values.
Continuous Random Variable: Can take on a continuum of values.
2.2 Functions for Describing Distributions
Probability Mass Function (pmf): For discrete random variables, it gives the probability that the random variable takes on a specific value.
Probability Density Function (pdf): For continuous random variables, it describes the relative likelihood for the random variable to take on a given value.
Cumulative Distribution Function (cdf): Describes the probability that the random variable takes on a value less than or equal to a specific value.
Moment Generating Function (mgf): Provides a way to capture all the moments (mean, variance, etc.) of the distribution.
2.3 Summary of Notations and Functions
Discrete | Continuous | |
---|---|---|
Univariate r.v. | Probability Mass Function (pmf) | Probability Density Function (pdf) |
single random variable | Cumulative Distribution Function (cdf) | Cumulative Distribution Function (cdf) |
Moment Generating Function (mgf) | Moment Generating Function (mgf) | |
Multivariate r.v. | Joint Probability Mass Function (joint pmf) | Joint Probability Density Function (joint pdf) |
multiple random variables | Joint Cumulative Distribution Function (joint cdf) | Joint Cumulative Distribution Function (joint cdf) |
Joint Moment Generating Function (joint mgf) | Joint Moment Generating Function (joint mgf) |
3.1 Cumulative Distribution Function (CDF)
A function
3.2 Probability Mass Function (PMF)
A function
For a discrete random variable
3.3 Probability Density Function (PDF)
A function
3.4 Relationships Between PDF and CDF
The CDF
The PDF
3.5 Interpretation of
The PDF
3.6 Properties of CDF
If
For any
For any
There are at most countably many discontinuity points of
Conversely, if a function
4.1 Why We Need Joint Distribution for Multivariate Random Variables
Joint distributions are necessary to describe the behavior of multiple random variables simultaneously. They provide a comprehensive view of how the random variables interact with each other.
Example 1.10 (continued from Example 1.9)
Given:
We can describe the joint distribution of
4.2 Joint CDF and Marginal CDF
Joint CDF
The joint cumulative distribution function (CDF) of
Marginal CDF
The marginal CDF of
4.3 Joint and Marginal Distribution Functions
Discrete:
Joint Probability Mass Function:
Probability for a Set ( A ):
Joint Cumulative Distribution Function:
Marginal Probability Mass Function:
Continuous:
Joint Probability Density Function:
Probability for a Set ( A ):
Joint Cumulative Distribution Function:
Marginal Probability Density Function:
Interpreting Joint Distributions
Joint PMF (Discrete Case): Gives the probability of each possible combination of values of the random variables.
Joint PDF (Continuous Case): Describes the density of the probability distribution over a continuous range of values.
4.4 Joint PMF and its Relationship with Joint CDF
Joint PMF Definition
For discrete random variables
Joint CDF Definition
The joint cumulative distribution function (joint CDF)
Relationship between Joint PMF and Joint CDF
The joint CDF can be expressed in terms of the joint PMF as follows:
Deriving Joint CDF from Joint PMF
Given the joint PMF
Definition of Joint CDF:
The joint CDF
Expressing in terms of PMF:
This can be expressed as the sum of joint PMFs for all
Example
Let's assume
0 | 1 | 2 | |
---|---|---|---|
0 | 0.1 | 0.2 | 0.1 |
1 | 0.1 | 0.1 | 0.2 |
2 | 0.1 | 0.05 | 0.05 |
Calculating
Calculating
The joint PMF
4.5 Joint PDF and its Relationship with Joint CDF (Continuous Case)
Joint PDF Definition
For continuous random variables
Joint CDF Definition
The joint cumulative distribution function (joint CDF)
Relationship between Joint PDF and Joint CDF
The joint CDF can be expressed in terms of the joint PDF as follows:
Detailed Derivation
Given the joint PDF
Definition of Joint CDF:
The joint CDF
Expressing in terms of PDF:
This can be expressed as the double integral of the joint PDF over the region
Example
Let's assume
Calculating
Deriving Joint PDF from Joint CDF
Given the joint CDF
Joint PDF:
The joint PDF
Example
Assume we have the joint CDF values for certain points as follows:
To find
Marginal distributions are derived from the joint distribution by summing (discrete case) or integrating (continuous case) over the other variables.
5.1 Why We Need Joint Distributions
Joint distributions are essential for understanding the relationships between multiple random variables. They allow us to analyze how variables co-vary and the dependencies between them.
5.2 When We Know the Joint CDF, Can We Obtain Every Marginal CDF?
Yes, if we know the joint CDF, we can derive every marginal CDF by taking the appropriate limits.
5.3 Is the Reverse Statement True?
No, knowing the marginal CDFs does not provide enough information to determine the joint CDF. The joint CDF contains information about the dependencies and interactions between the variables that marginal CDFs alone do not capture.
Proof for the Statement: "When We Know the Joint CDF, We Can Derive Every Marginal CDF by Taking the Appropriate Limits"
For Discrete Random Variables:
Given the joint CDF
For Continuous Random Variables:
Given the joint CDF
Proof for the Statement: "Knowing the Marginal CDFs Does Not Provide Enough Information to Determine the Joint CDF"
To prove this, let's consider two random variables
Example 1: Independent Variables
Suppose
Example 2: Dependent Variables
Now consider the case where
This demonstrates that knowing the marginal CDFs
6.1 Definition
Random variables
Discrete Case
For discrete random variables:
Continuous Case
For continuous random variables:
6.2 Property 1: Independence of
Two random variables
6.3 Property 2: Function of Independent Variables
If
Proof: Given two independent random variables
1. Independence Definition for
By definition,
2. Consider the Events for Functions of
Define the events:
Here,
3. Relate the Events to
The events can be written as:
4. Apply the Definition of Independence
Using the independence of
5. Translate Back to the Functions
This equation translates back to the original functions as:
7.1 Discrete Random Variables
For discrete random variables
If
7.2 Continuous Random Variables
For continuous random variables
7.3 Remarks
For each fixed
Multiplication Law:
Law of Total Probability:
Independence:
Question: For given random variables
8.1 Method 1: Method of Events
Let
Example 1.11 (Univariate Discrete Random Variable)
Let
Example 1.12 (Sum of Two Discrete Random Variables)
Let
To find the distribution of
For the exercise
8.2 Method 2: Method of CDF
This is a special case of Method 1. Let
Find the region
Find
For the continuous case, find the PDF of
Example for Continuous Case
Let
Define the region
Differentiate to find the PDF:
Using Leibniz's rule for differentiation under the integral sign, we get:
8.3 Leibniz's Rule
Example for Continuous Case: Finding the PDF of
Given two continuous random variables
Step-by-Step Derivation
Define the Region
Differentiate to Find the PDF:
The PDF
Applying Leibniz's Rule
Leibniz's rule for differentiation under the integral sign states:
In our case:
By applying Leibniz's rule, we get:
Since the lower limit of the inner integral is
Given that
Leibniz's Rule for Differentiation Under the Integral Sign
Leibniz's rule for differentiation under the integral sign provides a way to differentiate an integral with variable limits and an integrand that depends on the variable of differentiation.
Leibniz's Rule Statement
The rule states:
Explanation and Proof
Consider the integral with variable limits:
Differentiate
Differentiate inside the integral (where possible): To handle the differentiation, we need to consider three parts:
The variation of the integrand
The variation of the upper limit
The variation of the lower limit
Apply the Fundamental Theorem of Calculus and the Chain Rule:
By the Fundamental Theorem of Calculus and the Chain Rule, we get:
Integral with respect to
Upper limit
Lower limit
Combine the three components:
Application Example
Consider:
Using Leibniz's rule:
Inner integral:
Differentiate with respect to
Applying Leibniz's rule:
Since:
8.4 Method of PDF for Continuous Random Variables with Differentiable, 1-to-1 Transformations
Statement
For a continuous random variable
Proof
Start with the CDF Method: The CDF of
Transform to the Probability Statement in Terms of
Differentiate the CDF of
Absolute Value for Monotonic Transformations:
Since
Interpretation of
The term
Rate of Change:
If
The absolute value ensures the PDF remains non-negative.
Density Adjustment:
When
When
Example Calculation
Consider a transformation
Find the Inverse Transformation:
Calculate the Derivative:
Apply the Transformation to the PDF:
If
For
The PDF of
8.5 Multivariate Case: Transformation of Continuous Random Variables
General Transformation
Given
Assume
The PDF of
Example 1.14: Find the Distribution of
Given
Solution
Let
Find the Inverse Transformation:
Jacobian Determinant:
Joint PDF of
Marginal PDF of
Interpretation of the Jacobian Determinant
The Jacobian determinant
Question: The Role of
The term
Scaling Factor:
If
The absolute value ensures that the PDF is non-negative.
Adjustment of Density:
When
Conversely, when
By considering these factors, we ensure that the probability density functions are correctly adjusted for the transformations applied.
Application in Probability
When transforming random variables, the joint PDF of the new variables
This formula ensures that the probability density is correctly adjusted for the transformation.
HERE
Problem 1: Find the distribution of
Step 1: Define the Transformation
We start with two random variables
We also define a secondary variable
Step 2: Express the Inverse Transformation
From the definitions of
Step 3: Compute the Jacobian Determinant
The Jacobian matrix
The determinant of this matrix is:
Step 4: Use the Transformation Formula
The joint probability density function (PDF) of
Substituting
Step 5: Find the Marginal PDF of
To find the marginal PDF of
This integral gives the distribution of
Problem 2: Find the distribution of
Step 1: Define the Transformation
For the second problem, define:
Step 2: Express the Inverse Transformation
The original variables
Step 3: Compute the Jacobian Determinant
The Jacobian matrix
The determinant of this matrix is:
Step 4: Use the Transformation Formula
The joint PDF of
Substituting
Step 5: Find the Marginal PDF of
To find the marginal PDF of
This integral gives the distribution of
The expectation (or expected value) of a random variable
1.1 For Discrete Random Variables:
For discrete random variables
1.2 For Continuous Random Variables:
For continuous random variables, the expectation is given by:
1.3 Decision Making Under Uncertainty Using Expectation
Expectation helps in making decisions under uncertainty by providing a criterion to evaluate the outcomes.
Example 1.15:
You pay
If the outcome is
Otherwise, you win nothing.
Solution:
Question:
If you are offered a
The expected value remains the same (
The cost of the ticket is now
Since the expected value (
Mean:
Variance:
Covariance:
Correlation Coefficient:
Variance of a constant and a random variable:
Variance of a sum:
Mean square error:
Independence and Expectation:
Linearity of Expectation:
Independence and Uncorrelation: If
5.1 Definition
The moment generating function
5.2 Properties
Existence: The mgf may or may not exist for any particular value of
Uniqueness Theorem: If the mgf exists for
Moments: If the mgf exists in an open interval containing zero, then:
Linear Transformation: For any constants
Independence: If
5.3 Joint Moment Generating Function (Joint MGF)
For random variables
5.4 Properties of Joint MGF
Marginal MGF:
Uniqueness Theorem: Similar to the univariate case, the joint mgf uniquely determines the joint distribution if it exists for
Independence:
Moments:
Application: Portfolio Returns
Problem Setup
Given:
Portfolio:
Portfolio return:
Expected Return
Portfolio Variance
Using the properties of variance:
Portfolio Standard Deviation
The conditional expectation of
Discrete case:
Continuous case:
For joint pdf
No,
Why is
Non-negativity:
Integrates to 1:
Let's verify this:
Conditional Expectation Calculation
Do it for any
Example Calculation
Given a joint pdf
Important Points
Conditional PDF:
Non-negativity:
Integrates to 1:
Conditional Expectation:
This ensures that
Example:
Let
Properties of Conditional Expectation
If
Let
Law of Total Expectation:
In particular:
Variance Decomposition:
Note:
The δ method is a statistical technique used to approximate the mean and variance of a function of a random variable. The key idea is to use a Taylor series expansion to linearize the function around the mean of the random variable. This method is especially useful when we know the mean and variance of the original random variable but not the full distribution.
Step-by-Step Derivation:
Given:
4.1 Taylor Expansion:
We start by expanding
4.2 Expectation of
To find the expected value of
Since
Since
However, for most practical purposes, especially when
4.3 Variance of
Next, we compute the variance of
Substituting the Taylor expansion into the variance formula:
Since
Using the property that
Substituting
Summary:
Mean approximation:
Variance approximation:
Interpretation:
The δ method provides a way to approximate the mean and variance of a transformed random variable
These approximations are accurate when
The (\delta) method, often called the delta method, is a statistical technique used primarily to approximate the mean and variance of a function of a random variable. This method relies on using a first-order Taylor series expansion around the mean of the random variable to linearize the function.
4.4 Prerequisites for Using the Delta Method
Differentiable Function
The function
Differentiability ensures that we can approximate the function
Known Mean and Variance of
You should know the mean
These parameters are crucial for applying the Taylor expansion and subsequently estimating the mean and variance of
Small Variance Assumption:
The method works best when the variance
If
Linear Approximation Validity:
The function
The accuracy of the
Random Variable
The random variable
This ensures that the higher-order terms in the Taylor series expansion are negligible, making the linear approximation (first-order expansion) sufficient.