Lecture 2: Probability and Statistics Part 2

Jason Chen YCHEN148@e.ntu.edu.sg

Copyright Notice

This document is a compilation of original notes by Jason Chen (YCHEN148@e.ntu.edu.sg). Unauthorized reproduction, downloading, or use for commercial purposes is strictly prohibited and may result in legal action. If you have any questions or find any content-related errors, please contact the author at the provided email address. The author will promptly address any necessary corrections.

I. Commonly Used Distributions-Discrete


1. Uniform Distribution U(a1,,am)

This is a discrete uniform distribution where each outcome a1,,am is equally likely. Let's analyze the components one by one.

1.1 Probability Mass Function (PMF)

The probability mass function (PMF) for the uniform distribution is given by: p(x)={1m,x=a1,,am0,otherwise

Explanation:

Proof: The uniform distribution assigns equal probability to each of the m possible outcomes. Since the sum of all probabilities must equal 1: i=1mp(ai)=1; Since each outcome ai has the same probability 1m: i=1m1m=mm=1. This confirms that p(x)=1m is the correct PMF.

1.2 Moment Generating Function (MGF)

The moment generating function (MGF) for the uniform distribution is given by: mgf:M(t)=j=1meajtm

Explanation:

The MGF is used to find the moments of the distribution. The first moment (mean) and the second central moment (variance) can be derived from it.

Proof: The MGF is defined as: M(t)=E[etx]=j=1mp(aj)eajt=j=1m1meajt=j=1meajtm

1.3 Mean (Expected Value)

The mean (or expected value) μ is given by: μ=E[X]=j=1majma¯

Explanation:

The mean of a uniform distribution is simply the average of all possible outcomes.

Proof: The expected value is calculated as: μ=E[X]=j=1mp(aj)aj=j=1m1maj=j=1majm

1.4 Variance

The variance σ2 is given by: σ2=j=1m(aja¯)2m

Explanation:

Variance measures the spread of the outcomes around the mean. It is the average of the squared differences from the mean.

Proof: Variance is calculated as: σ2=E[(Xμ)2]=E[X2]μ2

Where: E[X2]=j=1mp(aj)aj2=j=1maj2m; Therefore, the variance becomes: σ2=j=1maj2m(j=1majm)2=j=1m(aja¯)2m

1.5 Parameter

This indicates that the parameters ai are real numbers, and there are m such parameters.

1.6 Example

Throw a fair die once: A classic example of a uniform distribution is rolling a fair six-sided die, where each face (1 through 6) has an equal probability of 16.


2. Bernoulli Distribution B(p)

2.1 Probability Mass Function (PMF)

The probability mass function (PMF) for the Bernoulli distribution is given by:

p(x)={px(1p)1x,if x=0 or x=10,otherwise

Explanation:

Proof: The PMF for a Bernoulli random variable can be written as: p(x)=px(1p)1x

This expression works for both cases:

The sum of the probabilities for all possible outcomes must equal 1: p(1)+p(0)=p+(1p)=1

2.2 Moment Generating Function (MGF)

The moment generating function (MGF) for the Bernoulli distribution is given by: mgf:M(t)=pet+1p

Explanation:

Proof: The MGF is defined as: M(t)=E[etx]=pet1+(1p)et0=pet+(1p). This function can be used to derive the mean and variance of the distribution.

2.3 Mean (Expected Value)

The mean (or expected value) μ for the Bernoulli distribution is: μ=E[X]=p

Explanation:

The mean represents the probability of success in a Bernoulli trial.

Proof: The expected value is calculated as: μ=E[X]=p1+(1p)0=p

2.4 Variance

The variance σ2 for the Bernoulli distribution is: σ2=p(1p)

Explanation:

The variance measures the spread of the outcomes around the mean. It reaches its maximum when p=0.5, indicating the highest uncertainty.

Proof: Variance is calculated using the formula: σ2=E[X2]μ2

Since X2=X for X being either 0 or 1: E[X2]=p

Thus, the variance becomes: σ2=pp2=p(1p)

2.5 Parameter

p[0,1]: p is the probability of success in a Bernoulli trial and lies between 0 and 1.

2.6 Example

Toss a coin once, p = probability that head occurs: A typical example is a coin toss, where p represents the probability of getting heads.

If A is an event, then the indicator random variable IA follows the Bernoulli distribution.

An indicator variable IA takes the value 1 if event A occurs and 0 otherwise, which follows a Bernoulli distribution with parameter p=P(A).


3. Binomial Distribution ( B(n, p) )

The binomial distribution describes the number of successes in n independent Bernoulli trials, where each trial has a probability p of success. The parameters n and p define the distribution.

3.1 Probability Mass Function (PMF)

The probability mass function (pmf) for the binomial distribution is given by:

p(x)={(nx)px(1p)nx,x=0,1,,n0,otherwise

Explanation:

Proof: The pmf can be derived as the product of the probability of a specific sequence of x successes and nx failures, multiplied by the number of such sequences:

p(x)=(nx)px(1p)nx Where (nx)=n!x!(nx)! is the binomial coefficient.

3.2 Moment Generating Function (MGF)

The moment generating function (MGF) for the binomial distribution is:

mgf:M(t)=(pet+1p)n

Explanation:

Proof: The MGF for a binomial distribution can be derived by recognizing that the binomial distribution is the sum of n independent Bernoulli random variables:

M(t)=E[etX]=E[i=1netXi]=(E[etXi])n=(pet+1p)n; Here, Xi represents the individual Bernoulli trials.

3.3 Mean (Expected Value)

The mean (or expected value) μ for the binomial distribution is:

μ=E[X]=np

Explanation:

The mean represents the expected number of successes in n trials.

Proof: The mean can be derived from the sum of the expected values of n Bernoulli trials:

μ=E[i=1nXi]=i=1nE[Xi]=np

3.4 Variance

The variance σ2 for the binomial distribution is: σ2=np(1p)

Explanation:

The variance measures the spread of the number of successes around the mean.

Proof: The variance of a sum of independent Bernoulli trials is the sum of the variances of the individual trials:

σ2=V[i=1nXi]=i=1nV[Xi]=np(1p)

Where V[Xi]=p(1p) is the variance of a single Bernoulli trial.

3.5 Parameter

Parameter p[0,1] and n=1,2,:

3.6 Example

Number of heads when tossing a coin n times: If you toss a fair coin n times, the number of heads (successes) follows a binomial distribution with n trials and p=0.5.

Let's break down the problem presented in the slide, which deals with a one-period binomial model for stock pricing. We'll work through the problem step by step to ensure we have the correct solution.

Problem Statement

We are dealing with a stock market model at time t=0 where there are two investment opportunities:

Purchase and (short-) selling of stocks:

Fixed deposit or loan:

Step-by-Step Solution

Analyzing the Trading Strategy

The future value X(T) at time T is dependent on the trading strategy (f,g):

Risk-Neutral Valuation

In a one-period binomial model, the price of a derivative or a contingent claim can be computed using risk-neutral valuation. The risk-neutral probability q is defined as: q=erTdud

The expected value of the future price X(T) under the risk-neutral measure is: EQ[X(T)]=q(ferT+gp1u)+(1q)(ferT+gp1d)

The present value X(0) at time t=0 should equal the discounted expected future value: X(0)=1erTEQ[X(T)]

Given P0(0)=1 and P0(T)=erT, the price at time t=0 becomes: X(0)=qferT+qgp1u+(1q)ferT+(1q)gp1derT

Simplifying this expression gives the initial capital required.

Given that the number of upward movements in the stock price follows a B(1,p)-distributed random variable, it means:


4. Poisson Distribution P(λ) - Limit of Binomial Distribution

Certainly! Let's go through the content in the previous two images step by step, providing detailed explanations and proofs for each component, similar to how we did with the Poisson distribution.

Understanding the Transition from Binomial to Poisson Distribution

Step 1: Start with the Binomial PMF

The binomial distribution for a random variable XnB(n,pn) is given by: P(Xn=x)=(nx)pnx(1pn)nx

Step 2: Transition to Poisson

Consider the case where n becomes very large (n) and pn becomes very small (pn0) such that λ=npn remains constant. In this scenario, the binomial distribution can be approximated by the Poisson distribution.

We start by rewriting the binomial pmf using this condition: P(Xn=x)=(nx)(λn)x(1λn)nx

Thus, the binomial pmf transitions to the Poisson pmf as: limnP(Xn=x)=λxeλx!

This is the Poisson distribution with parameter λ=npn​.

The formula shown in the second image is a well-known limit in calculus:

limn(1+ann)n=ea This result is a fundamental identity used in the derivation of the exponential function and is central to many proofs in probability and calculus.

Explanation:

4.1 Probability Mass Function (PMF):

p(x)=λxeλx!,x=0,1,2, This is derived as shown above, representing the probability of x events occurring in a fixed interval.

4.2 Moment Generating Function (MGF):

The MGF is defined as: MX(t)=E[etX]. For Poisson distribution, we have: MX(t)=x=0etxλxeλx!

Simplifying using the series expansion of the exponential function: MX(t)=eλ(et1)

4.3 Mean (Expected Value):

E[X]=λ The mean is simply λ​, representing the average number of events in the interval.

The expected value (mean) E[X] for a Poisson distribution is: E[X]=x=0xP(X=x)=x=0xλxeλx!

By simplifying this using the properties of sums and derivatives: E[X]=λ

4.4 Variance:

Var(X)=λE[X2]=x=0x2P(X=x)=x=0x2λxeλx!

We can rewrite x2 as x(x1)+x: E[X2]=x=0x(x1)λxeλx!+x=0xλxeλx!

The first term x=0x(x1)λxeλx! simplifies to: x=2λ2λx2eλ(x2)!=λ2

The second term x=0xλxeλx! is just the expected value E[X], which is λ.

Thus: E[X2]=λ2+λ; Finally, the variance Var(X) is: Var(X)=E[X2](E[X])2=(λ2+λ)λ2=λ

4.5 Parameter:

λ>0: λ is the rate parameter, indicating the expected number of occurrences in a given time or space interval.

4.6 Example:

Number of phone calls coming into an exchange during a unit of time:

If λ represents the average number of calls per hour, the number of calls in a given hour follows a Poisson distribution with parameter λ.

Summation of Independent Poisson Random Variables

If XiP(λi) are independent, then their sum: Y=X1++XkP(λ1++λk)

This means the sum of independent Poisson-distributed variables is also Poisson-distributed, with the rate parameter being the sum of individual rate parameters.

Proof of the Summation of Independent Poisson Random Variables

Statement: If XiP(λi) are independent Poisson random variables, then their sum Y=X1++Xk is also Poisson-distributed with rate parameter λ=λ1++λk.

Definition of Poisson Distribution: A random variable X is said to follow a Poisson distribution with rate parameter λ if its probability mass function (PMF) is given by: P(X=x)=λxeλx!,x=0,1,2,

2. Moment Generating Function (MGF): The moment generating function (MGF) of a Poisson random variable XP(λ) is: MX(t)=E[etX]=x=0etxλxeλx!=eλ(et1)

3. MGF of the Sum: Let Y=X1+X2++Xk. The MGF of Y is the product of the MGFs of the independent Xi: MY(t)=MX1(t)MX2(t)MXk(t)

Substituting the MGF of each Xi: MY(t)=eλ1(et1)eλ2(et1)eλk(et1)

Simplifying the expression: MY(t)=e(λ1+λ2++λk)(et1)=eλ(et1); where λ=λ1+λ2++λk.


5. Negative Binomial Distribution NB(r,p)

The negative binomial distribution describes the number of trials X required to achieve the rth success in a sequence of independent Bernoulli trials, where each trial has a probability p of success.

5.1 Probability Mass Function (PMF)

The PMF for the negative binomial distribution is given by:

p(x)={(x1xr)pr(1p)xr,x=r,r+1,0,otherwise

Derivation of the PMF:

  1. Number of Failures: Before the rth success, there must be xr failures.

  2. Success-Failure Pattern: The probability of having xr failures and r successes (with the last trial being a success) is: p(x)=(x1r1)pr(1p)xr

The binomial coefficient (x1r1) represents the number of ways to arrange x1 trials into r1 successes and xr failures, followed by the final success.

5.2 Moment Generating Function (MGF)

The MGF MX(t) for the negative binomial distribution is given by: MX(t)=(pet1(1p)et)r,for t<log(1p)

Derivation of the MGF:

  1. The MGF is defined as: MX(t)=E[etX]

  2. By summing over all possible values of X: MX(t)=x=retx(x1r1)pr(1p)xr

  3. This series is simplified using the sum of a geometric series, leading to the compact form:

MX(t)=(pet1(1p)et)r; This function is valid as long as t<log(1p) to ensure convergence.

5.3 Mean

The mean E[X] of the negative binomial distribution is given by: E[X]=rp

Derivation of the Mean:

From the definition of expectation: E[X]=x=rx(x1r1)pr(1p)xr ; This sum can be calculated directly or derived using the MGF by taking the first derivative with respect to t and evaluating at t=0: E[X]=dMX(t)dt|t=0=rp

5.4 Variance

The variance Var(X) of the negative binomial distribution is given by: Var(X)=r(1p)p2

Derivation of the Variance: The variance is given by: Var(X)=E[X2](E[X])2

The second moment E[X2] can be calculated by taking the second derivative of the MGF: Var(X)=d2MX(t)dt2|t=0(rp)2 =Var(X)=r(1p)p2

5.5 Parameter Range:

5.6 Example:

Lottery: If a person must purchase tickets until they achieve the rth winning ticket, the total number of tickets bought follows a negative binomial distribution.


6. Hypergeometric Distribution HG(r,n,m)

6.1 Probability Mass Function (PMF)

The PMF of the hypergeometric distribution is given by: p(x)=(nx)(mrx)(n+mr),x=0,1,,min(r,n), and rxm

Derivation of the PMF:

  1. Combination Calculation: The number of ways to draw x black balls from n black balls is given by (nx). Similarly, the number of ways to draw rx white balls from m white balls is (mrx).

  2. Total Combinations: The total number of ways to draw r balls from n+m balls (black and white) is (n+mr).

  3. PMF Expression: The probability of drawing exactly x black balls in r draws is then the ratio of the favorable outcomes to the total outcomes: p(x)=(nx)(mrx)(n+mr)

6.2 Moment Generating Function (MGF)

The hypergeometric distribution does not have an explicit closed-form moment generating function (MGF). This is because the lack of replacement complicates the analysis of the moments in a way that precludes a simple closed-form expression.

6.3 Mean

The mean E[X] of the hypergeometric distribution is given by: E[X]=rnn+m

Derivation of the Mean:

  1. The mean is derived from the linearity of expectation. Since the hypergeometric distribution models the number of successes (black balls) drawn, the expected value is proportional to the fraction of the total balls that are black.

  2. n is the number of black balls, and n+m is the total number of balls. Thus, the mean number of black balls drawn is: E[X]=rnn+m

6.4 Variance

The variance Var(X) of the hypergeometric distribution is given by: Var(X)=rnm(n+mr)(n+m)2(n+m1)

Derivation of the Variance:

  1. Variance accounts for both the proportion of successes and the finite size of the population.

  2. The formula takes into consideration the fact that the draws are without replacement, which introduces negative dependence between the draws (i.e., drawing one black ball decreases the probability of drawing another black ball).

  3. The variance formula is derived from the second moment and adjusted for the non-independence of the draws: Var(X)=rnn+mmn+mn+mrn+m1

6.5 Parameters

6.6 Example:

Sampling Industrial Products: If you want to determine the number of defective items (black balls) in a sample drawn from a batch without replacement, the hypergeometric distribution models the probability of finding a specific number of defective items.

Relationship to Binomial Distribution

The hypergeometric distribution can be related to the binomial distribution as the population size becomes large:

(nx)(mrx)/(n+mr)(rx)px(1p)rx

Where: p=nn+m is the probability of success in each draw.

This approximation holds under the condition that n,m while maintaining a fixed ratio.


II. Commonly Used Distributions-Continuous


1. Uniform Distribution U(a,b)

1.1 Probability Density Function (PDF)

The PDF of the uniform distribution U(a,b) is given by: f(x)={1ba,if axb0,otherwise

Derivation of the PDF:

1.2 Cumulative Distribution Function (CDF)

The CDF F(x) of the uniform distribution is given by: F(x)={0,if x<axaba,if axb1,if x>b

Derivation of the CDF:

1.3 Moment Generating Function (MGF)

The MGF MX(t) of the uniform distribution U(a,b) is given by: MX(t)=ebteatt(ba),tR

Derivation of the MGF:

  1. Applying the PDF: MX(t)=abetx1badx

  2. Integration: MX(t)=1baabetxdx=1baetxt|ab

  3. Simplifying: MX(t)=ebteatt(ba)

1.4 Mean

The mean E[X] of the uniform distribution is given by: E[X]=a+b2

Derivation of the Mean:

  1. Applying the PDF: E[X]=abx1badx

  2. Integration: E[X]=1bax22|ab=1bab2a22

  3. Simplifying: E[X]=b+a2

1.5 Variance

The variance Var(X) of the uniform distribution is given by: Var(X)=(ba)212

Derivation of the Variance:


2. Exponential Distribution EXP(λ)

The exponential distribution is commonly used to model the time between events in a Poisson process, such as the time between arrivals in a queue or the time until a component fails.

2.1 Probability Density Function (PDF)

The PDF of the exponential distribution EXP(λ) is given by: f(x)={λeλx,if x00,if x<0

Derivation of the PDF:

2.2 Cumulative Distribution Function (CDF)

The CDF F(x) of the exponential distribution is given by: F(x)={1eλx,if x00,if x<0

Derivation of the CDF:

2.3 Moment Generating Function (MGF)

The MGF MX(t) of the exponential distribution EXP(λ) is given by: MX(t)=λλt,t<λ

Derivation of the MGF:

  1. Applying the PDF: MX(t)=0etxλeλxdx=λ0e(λt)xdx

  2. Integration: MX(t)=λ1λt,for t<λ This condition t<λ ensures the integral converges.

2.4 Mean

The mean E[X] of the exponential distribution is given by: E[X]=1λ

Derivation of the Mean:

2.5 Variance

The variance Var(X) of the exponential distribution is given by: Var(X)=1λ2

Derivation of the Variance:

  1. Integration by Parts: Apply integration by parts twice, or use the fact that: E[X2]=2λ2

  2. Variance: Var(X)=2λ2(1λ)2=1λ2

2.6 Memorylessness Property

Mathematically: P(X>s+tX>t)=P(X>s) Where X is a random variable representing the time until the next event, s and t are time intervals.

Explanation:

This property implies that the exponential distribution "forgets" how much time has already passed. For example, if a light bulb's lifetime follows an exponential distribution, the probability that it lasts another hour is the same regardless of how long it has already been on.

2.7 Relationship between Exponential and Poisson Distributions

The exponential distribution is closely related to the Poisson distribution.

Setup:

Key Result:

Reverse Relationship:

Interpretation:

2.8 Interpreting λ

The parameter λ in the exponential and Poisson distributions has an important interpretation.

Exponential Distribution EXP(λ):

Poisson Distribution P(λt):

Intuitive Example:


3. Gamma Distribution Gamma(α,λ)

3.1 Probability Density Function (PDF)

The PDF of the gamma distribution Gamma(α,λ) is given by: f(x)={λαΓ(α)xα1eλx,if x00,if x<0

Gamma Function Γ(α):

The gamma function Γ(α) is defined as: Γ(α)=0yα1eydy

Derivation of the PDF:

3.2 Cumulative Distribution Function (CDF)

The CDF of the gamma distribution does not have a simple closed form like the exponential distribution. However, it can be expressed in terms of the incomplete gamma function: F(x)=γ(α,λx)Γ(α)

Where γ(α,λx) is the lower incomplete gamma function defined by: γ(α,λx)=0λxyα1eydy

3.3 Moment Generating Function (MGF)

The MGF MX(t) of the gamma distribution is given by: MX(t)=(λλt)α,t<λ

Derivation of the MGF:

3.4 Mean

The mean E[X] of the gamma distribution is given by: E[X]=αλ

Derivation of the Mean:

3.5 Variance

The variance Var(X) of the gamma distribution is given by: Var(X)=αλ2

Derivation of the Variance:

3.6 Parameters:

3.7 Example:

Used to model the default rate of credit portfolios in risk management. The gamma distribution is highly flexible and can model a variety of shapes by adjusting the parameters α and λ​. This makes it useful in many applications, especially in fields like finance and risk management.

3.8 Property

We are dealing with the sum of multiple independent and identically distributed (i.i.d.) normal random variables. We want to confirm that the sum of squares of these normal variables follows a Gamma distribution. Specifically, if we have X1,X2,,Xn as i.i.d. normal random variables, we want to examine the distribution of S=i=1nXi2.

Basic Properties of Normal Distribution

Let XiN(μ,σ2) for i=1,2,,n. The square of a standard normal random variable ZiN(0,1) follows a chi-square distribution with 1 degree of freedom: Zi2χ2(1)

For a normal random variable with a non-zero mean, XiN(μ,σ2), the distribution of (Xiμ)2σ2 is χ2(1).

Step 3: Sum of Squared Normals as Gamma Distribution

We know that the sum of independent chi-square random variables each with 1 degree of freedom follows a chi-square distribution with degrees of freedom equal to the number of variables: S=i=1nZi2χ2(n)

If each XiN(0,σ2), then: Sσ2=1σ2i=1nXi2χ2(n)

The chi-square distribution with n degrees of freedom is a special case of the Gamma distribution with shape parameter k=n2 and scale parameter θ=2σ2. Specifically: χ2(n)Γ(n2,2)

Hence, if XiN(0,σ2), then: i=1nXi2Γ(n2,2σ2)

Step 4: Moment Generating Function (mgf) Approach

The moment generating function (mgf) of a chi-square distribution with n degrees of freedom is given by:

Mχ2(n)(t)=(12t)n2,t<12

This matches the mgf of a Gamma distribution Γ(n2,2), which is: MΓ(t)=(1θt)k2with k=n2 and θ=2

Step 5: Summing Up

When we sum up the squared i.i.d. normal variables X1,X2,,Xn, where each XiN(0,σ2), the distribution of their sum S=i=1nXi2 is a Gamma distribution: SΓ(n2,2σ2)

This shows that the sum of squares of multiple i.i.d. normal variables indeed follows a Gamma distribution.

3.9 Property Chi-Square Distribution of Sample Variance

To prove that (n1)S2σ2χn12, where S2 is the sample variance of n i.i.d. normal random variables, we will use the following steps:

Step 1: Start with the Definition of Sample Variance

Given X1,X2,,Xn are i.i.d. normal random variables XiN(μ,σ2), the sample mean X¯ is defined as:

X¯=1ni=1nXi The sample variance S2 is: S2=1n1i=1n(XiX¯)2

Step 2: Express the Sum of Squared Deviations

We can express the sum of squared deviations from the mean as:

i=1n(XiX¯)2=i=1n(Xiμ)2n(X¯μ)2

This equation holds because the first term on the right-hand side represents the total variance, while the second term adjusts for the difference between the sample mean and the population mean.

Step 3: Use the Distribution of the Sample Mean

Since XiN(μ,σ2), the sample mean X¯ follows: X¯N(μ,σ2n)

We know that i=1n(Xiμ)2σ2χn2, and X¯ is independent of the variance.

Step 4: Simplify the Expression

Using the fact that i=1n(XiX¯)2 is independent of X¯, we can express: i=1n(XiX¯)2σ2=i=1n(Xiμ)2n(X¯μ)2σ2

This simplifies to: (n1)S2σ2χn12 The left side of the equation is the sum of independent normal variables squared, which follows the chi-square distribution with n1 degrees of freedom.


4. Normal Distribution N(μ,σ2)

The normal distribution is one of the most important distributions in probability and statistics, often used to model real-world phenomena due to the Central Limit Theorem.

4.1 Probability Density Function (PDF)

The PDF of the normal distribution N(μ,σ2) is given by: f(x)=1σ2πe(xμ)22σ2,xR

Derivation of the PDF:

4.2 Cumulative Distribution Function (CDF)

The CDF F(x) of the normal distribution does not have a closed-form expression. However, it is defined as: F(x)=P(Xx)=xf(u)du

Properties of the CDF:

4.3 Moment Generating Function (MGF)

The MGF MX(t) of the normal distribution is given by: MX(t)=eμt+σ2t22,tR

Derivation of the MGF:

4.4 Mean

The mean E[X] of the normal distribution is simply: E[X]=μ

Derivation of the Mean:

4.5 Variance

The variance Var(X) of the normal distribution is: Var(X)=σ2

Derivation of the Variance:

4.6 Important Notes


5. Chi-Square Distribution χn2

The chi-square distribution is widely used in statistics, particularly in hypothesis testing and confidence interval estimation for variance. It is a special case of the gamma distribution and arises as the distribution of the sum of squared standard normal variables.

5.1 Probability Density Function (PDF)

The PDF of the chi-square distribution with n degrees of freedom is given by: f(x)={12n/2Γ(n/2)xn21ex/2,if x00,if x<0

Derivation of the PDF:

Special Case of Gamma Distribution: The chi-square distribution is a special case of the gamma distribution where α=n2 and λ=12. The gamma distribution's PDF is: f(x)=λαΓ(α)xα1eλx

Substituting α=n2 and λ=12 gives: f(x)=12n/2Γ(n/2)xn21ex/2

5.2 Cumulative Distribution Function (CDF)

The CDF of the chi-square distribution is not expressed in a simple closed form. However, it can be represented using the lower incomplete gamma function γ(α,x): F(x)=γ(n/2,x/2)Γ(n/2)

Where γ(α,x) is the lower incomplete gamma function: γ(α,x)=0xtα1etdt

5.3 Moment Generating Function (MGF)

The MGF MX(t) of the chi-square distribution is given by: MX(t)=(112t)n/2,t<12

Derivation of the MGF:

Recognizing the integral as the gamma function gives: MX(t)=(112t)n/2

5.4 Mean

The mean E[X] of the chi-square distribution is given by: E[X]=n

Derivation of the Mean:

Sum of Squared Normals: The chi-square distribution with n degrees of freedom is the sum of n independent standard normal random variables squared. Since each Zi2 has an expected value of 1: E[X]=E[i=1nZi2]=i=1nE[Zi2]=n

5.5 Variance

The variance Var(X) of the chi-square distribution is given by: Var(X)=2n

Derivation of the Variance:

Variance of Sum of Independent Variables: The variance of the sum of independent random variables is the sum of their variances. Since each Zi2 has a variance of 2: Var(X)=Var(i=1nZi2)=i=1nVar(Zi2)=2n

5.6 Important Notes

5.7 Detailed Proof of the Two Properties

1. Sum of Independent Chi-Square Variables

Statement: If X1,,Xk are independent and Xiχni2, then the sum Y=X1++Xk follows a chi-square distribution with degrees of freedom n1++nk, i.e., Y=X1++Xkχn1++nk2.

Proof:

  1. Moment Generating Function (MGF) of a Chi-Square Distribution:

    The MGF of a chi-square random variable Xiχni2 is given by: MXi(t)=(12t)ni2,for t<12.

  2. MGF of the Sum of Independent Variables:

    Since the Xi are independent, the MGF of the sum Y is the product of their MGFs: MY(t)=MX1(t)MX2(t)MXk(t)

    Substituting the MGFs of the chi-square distributions: MY(t)=(12t)n12(12t)n22(12t)nk2.

  3. Simplifying the Expression: Combine the exponents: MY(t)=(12t)n1+n2++nk2.

  4. Conclusion: The MGF of Y is identical to the MGF of a chi-square distribution with degrees of freedom n1+n2++nk. Hence, Y follows a chi-square distribution with these degrees of freedom: Yχn1+n2++nk2.

2. Relationship with Standard Normal

Statement: If ZN(0,1), then Z2χ12.

Proof:

  1. PDF of Standard Normal Distribution: The probability density function (PDF) of the standard normal distribution ZN(0,1) is: fZ(z)=12πez22,zR.

  2. Transforming to a Chi-Square Distribution: Consider the transformation Y=Z2. The distribution of Y can be derived using the change of variables technique.

  3. Change of Variables: Let y=z2, so that z=±y. The Jacobian determinant for the transformation is 12y.

    The PDF of Y=Z2 is then: fY(y)=12π(ey2y+ey2y)=12πyey2,y0.

    Simplifying: fY(y)=1212Γ(12)y121ey2,y0.

  4. Identifying the Chi-Square Distribution: The PDF derived above matches the PDF of the chi-square distribution with 1 degree of freedom χ12: fY(y)=1Γ(12)212y121ey2,y0. Therefore: Y=Z2χ12.


6. F-Distribution Fm,n

The F-distribution arises frequently in the context of analysis of variance (ANOVA) and is used to compare variances. It is the distribution of the ratio of two scaled chi-square distributions.

6.1 Probability Density Function (PDF)

The PDF of the F-distribution with m and n degrees of freedom is complex and given by: f(x)=(mn)m(m+n)m+n(xm)m/21(1+mxn)(m+n)/2B(m2,n2)

Derivation of the PDF:

6.2 Cumulative Distribution Function (CDF)

The CDF of the F-distribution does not have a simple closed-form expression but can be computed using the incomplete beta function Ix(a,b), related to the regularized incomplete beta function Bx(a,b): F(x)=Imxmx+n(m2,n2)

6.3 Moment Generating Function (MGF)

The MGF of the F-distribution does not have a closed-form expression, largely due to the complex nature of the distribution. Instead, its characteristic function or the first few moments (mean, variance) are often used to understand its properties.

6.4 Mean

The mean E[X] of the F-distribution is: E[X]=nn2for n>2

Derivation of the Mean:

6.5 Variance

The variance Var(X) of the F-distribution is given by: Var(X)=2n2(m+n2)m(n2)2(n4)for n>4

Derivation of the Variance:

6.6 Important Notes

  1. Relationship with Chi-square Distribution:

    • If Uχm2 and Vχn2 are independent, then U/mV/nFm,n.

  2. Relationship with t-Distribution:

    • If Xtn, then X2F1,n.

  3. Inverse F-Distribution:

    • If XFm,n, then X1Fn,m.

6.7 Detailed Proof of the Three Notes

Note 1: Relationship between Chi-Square Distributions and the F-Distribution

Statement: Let Uχm2 and Vχn2 be independent. Then the ratio U/mV/n follows an F-distribution with m and n degrees of freedom, denoted as Fm,n.

Proof:

  1. Chi-Square Distribution:

    • Uχm2 implies U is the sum of squares of m independent standard normal variables.

    • Vχn2 implies V is the sum of squares of n independent standard normal variables.

  2. Define the Ratio:

    • The random variable F=U/mV/n is defined.

    • Simplify this to F=UnVm.

  3. Distribution of the Ratio:

    • The distribution of F can be derived by finding the joint distribution of U and V and using the transformation method.

    • The result is that FFm,n, meaning F follows an F-distribution with m and n degrees of freedom.

Note 2: Relationship between the t-Distribution and the F-Distribution

Statement: Let Xtn, then Y=X2 follows an F-distribution with 1 and n degrees of freedom, denoted as F1,n.

Proof:

  1. t-Distribution:

    • Xtn implies X=ZV/n, where ZN(0,1) and Vχn2 are independent.

  2. Square the t-Variable:

    • Square X to obtain Y=X2=Z2V/n.

  3. Recognize the Distribution:

    • Z2χ12, and hence Y=Z2/nV/n=Z2V/n.

    • Therefore, YF1,n.

Note 3: Reciprocal of F-distribution

Statement: If XFm,n, then the reciprocal X1 follows an F-distribution with degrees of freedom swapped, i.e., X1Fn,m.

Proof:

  1. Consider XFm,n:

    • X=U/mV/n where Uχm2 and Vχn2, and both are independent.

  2. Reciprocal of X:

    • Take the reciprocal X1=V/nU/m=VmUn.

  3. Recognize the Distribution:

    • The new random variable X1 has the same form as an F-distribution but with degrees of freedom swapped, so X1Fn,m.


7. Mean and Variance of Normal Random Sample

Let's break down the problem step by step to thoroughly understand the concepts and the reasoning behind the degrees of freedom n1 instead of n. This will involve some important concepts in statistics, particularly when dealing with normal random samples.

7.1 Sample Mean and Sample Variance

Given a random sample X1,X2,,Xn from a normal distribution N(μ,σ2), the sample mean X¯n and sample variance Sn2 are defined as:

7.2 Distribution of the Sample Mean X¯n

The sample mean X¯n follows a normal distribution: X¯nN(μ,σ2n)

This result comes from the fact that the sum of independent normal random variables is also normally distributed. The mean of the sample mean is μ, and the variance of the sample mean is σ2n.

7.3 Independence of X¯n and Sn2

The sample mean X¯n and the sample variance Sn2 are independent. This independence is a critical property that comes from the fact that the distribution of Sn2 depends only on the deviations XiX¯n, which are independent of the sample mean.

7.4 Distribution of the Sample Variance Sn2

The sample variance Sn2 is related to the chi-square distribution. Specifically: (n1)Sn2σ2χn12

This result indicates that the scaled sample variance follows a chi-square distribution with n1 degrees of freedom.

7.5 Distribution of n(X¯nμ)σ

The term n(X¯nμ)σ follows a standard normal distribution: n(X¯nμ)σN(0,1)

7.6 t-Distribution and Its Relation

The ratio n(X¯nμ)Sn follows a t-distribution with n1 degrees of freedom: n(X¯nμ)Sntn1

This result is due to the relationship between the normal distribution, chi-square distribution, and the t-distribution.

7.7 Why the Degrees of Freedom is n1 Instead of n

The degrees of freedom in the sample variance calculation are reduced by 1 because the sample mean X¯n is itself an estimate and is used in calculating the variance. Each sample point Xi contributes one piece of information, but since X¯n is calculated from all Xi, the degrees of freedom are reduced by 1 to account for the estimation of μ by X¯n.

In other words, there is a constraint on the data, as the sum of the deviations from the mean must be zero: i=1n(XiX¯n)=0. This constraint reduces the effective number of independent pieces of information from n to n1.

7.8 Are (X1X¯n,,XnX¯n) Independent?

No, the terms (X1X¯n,,XnX¯n) are not independent. This is because they are constrained by the condition: i=1n(XiX¯n)=0 This constraint creates a dependency among the terms. For example, if you know n1​​ of these differences, the last one is automatically determined.


8. t-Distibution

Definition: If ZN(0,1) and Vχk2 , and Z and V are independent, then ZV/ktk

1. Understanding the Setup:

2. Definition of the t-Distribution:

The t-distribution with k degrees of freedom is defined as the distribution of the ratio: ZV/k.

To prove this, we need to show that this ratio has the properties of a t-distribution.

3. Deriving the Ratio:

Given ZN(0,1) and Vχk2, consider the ratio: T=ZV/k.

4. Transforming the Chi-Squared Variable:

Let's express the denominator V/k in a form that relates to the standard normal distribution: Vk=1ki=1kZi2.

Since V/k is the average of k squared standard normal variables, this represents the sample variance of k observations from a normal distribution.

5. Independence and Distribution:

Given that Z is independent of V, and V is scaled by k, the ratio: T=ZV/k is distributed according to the t-distribution with k degrees of freedom.

6. Connection to the t-Distribution:

By the definition of the t-distribution, T should have the same characteristics as the t-distribution, namely:

Proving that the statistic T=n(X¯nμ)Sntn1

Step 1: Understanding the Setup

We want to show that the statistic T defined as: T=n(X¯nμ)Sn follows a t-distribution with n1 degrees of freedom.

Step 2: Decomposing the Statistic

First, let's express T in a form that can be related to a known distribution: T=n(X¯nμ)Sn=n(X¯nμ)1n1i=1n(XiX¯n)2.

Step 3: Standardizing the Sample Mean

The sample mean X¯n is normally distributed: X¯nN(μ,σ2n).

If we standardize X¯n, we have: n(X¯nμ)σN(0,1).

This means the numerator of T follows a standard normal distribution: Z=n(X¯nμ)σN(0,1).

Step 4: Chi-Square Distribution of the Sample Variance

The sample variance Sn2 is related to a chi-square distribution: (n1)Sn2σ2χn12. This result follows from the fact that the sum of squared deviations of n independent normal variables follows a chi-square distribution with n1 degrees of freedom.

Step 5: Independence of X¯n and Sn

One key property of normal distributions is that the sample mean X¯n and the sample variance Sn2 are independent. This allows us to write the statistic T as: T=Z(n1)Sn2σ2=ZVn1where V=(n1)Sn2σ2. Since V follows a chi-square distribution with n1 degrees of freedom, the statistic T can be expressed as: T=ZV/(n1).

Step 6: Deriving the t-Distribution

The expression we have for T matches the form of a t-distribution with n1 degrees of freedom: T=ZV/(n1)tn1.

Thus, the statistic T=n(X¯nμ)Sn follows a t-distribution with n1 degrees of freedom.


III. Types of Convergence


1. Convergence Concepts with the Coin Toss Example

Question 1: Toss a fair coin 2 or 3 times. Can you accurately predict the average appearance of heads?

When you toss a fair coin only a few times (like 2 or 3 times), the outcome is highly variable. For example:

Since there are so few trials, your estimate of the average appearance of heads is not likely to be accurate.

Question 2: Toss a fair coin many times. What will you predict the average appearance of heads?

As you toss the coin more and more times, the Law of Large Numbers tells us that the average number of heads will converge to the true probability of getting a head in a single toss, which is 0.5 for a fair coin.


2. Types of Convergence

2.1 Almost Sure Convergence

Definition: A sequence of random variables {Zn} converges almost surely to a random variable Z if, for any ϵ>0,

P(limn|ZnZ|<ϵ)=1

This means that as n approaches infinity, the probability that Zn is within any small distance ϵ of Z becomes 1. In other words, Zn will get closer and closer to Z for almost every outcome.

Example: Consider our coin toss example. Let Zn represent the proportion of heads after n tosses. The sequence {Zn} converges almost surely to 0.5 as n increases.

2. Convergence in Probability

limnP(|ZnZ|<ϵ)=1

This means that as n increases, the probability that Zn is within ϵ of Z approaches 1. However, this does not guarantee that Zn will always get closer to Z in every sequence (unlike almost sure convergence).

Example: Again, using the coin toss, the proportion of heads Zn converges in probability to 0.5. As you toss the coin more times, the probability that the proportion of heads is close to 0.5 gets closer to 1.

Relationship Between Convergence Concepts

几乎必然收敛 的一个例子是抛硬币的实验:

假设你有一个硬币,每次抛硬币得到正面(1)或反面(0)的概率都是 12。定义一个随机变量 Xn 表示前 n 次抛硬币中出现正面的比例。显然,随着 n 的增大,Xn 会趋向于 12,也就是说,Xn 几乎必然收敛于 12。因为当 n 无限增大时,抛硬币的频率法则(大数定律)告诉我们,得到正面的比例必然趋于 12

然而,如果我们考虑一个序列 Yn,它是这样定义的:当 n 是偶数时 Yn=0,当 n 是奇数时 Yn=1。这个序列依概率收敛于 12,但不几乎必然收敛。因为虽然在大多数情况下,Yn 的值是 01,但它不会在任意一个单一值上集中。所以它并不满足几乎必然收敛的条件,但它在概率上可以看作是“均值”趋于 12

Understanding n in the Context of Large Data

In statistical contexts, n often represents a situation where the sample size becomes very large. As n increases, the observed averages and other statistics derived from the data are expected to converge to their true underlying values due to the Law of Large Numbers and the Central Limit Theorem.


3. Convergence in Distribution

Convergence in distribution (also known as weak convergence) is concerned with the convergence of the cumulative distribution functions (CDFs) of a sequence of random variables.

This means that as n increases, the distribution of Zn becomes closer to the distribution of Z.


4. Properties of the Three Types of Convergence

The following properties explain the relationships between almost sure convergence, convergence in probability, and convergence in distribution:

  1. Almost sure convergence implies convergence in probability, which in turn implies convergence in distribution:

    • If Zna.s.Z, then ZnPZ.

    • If ZnPZ, then ZndZ.

    Reasoning: Almost sure convergence is the strongest form, ensuring that the sequence converges almost everywhere. This naturally implies convergence in probability, which in turn implies convergence in distribution, as distributional convergence is a weaker condition.

  2. Convergence in probability implies convergence in distribution:

    • If ZnPZ, then ZndZ.

    Reasoning: Since convergence in probability ensures that for any small positive ϵ, the probability that |ZnZ| is smaller than ϵ approaches 1 as n increases, this leads to convergence in the distributional sense.

  3. Convergence to a constant:

    • If Zndc, where c is a constant, then ZnPc and Zna.s.c.

    Reasoning: Convergence in distribution to a constant implies that the random variables Zn are becoming increasingly concentrated around the constant value c. This ensures that they will converge in probability and almost surely to c.

  4. Convergence preserved by continuous transformations:

    • If ZndZ and g is a continuous function, then g(Zn)dg(Z).

    Reasoning: A continuous transformation of a convergent sequence of random variables preserves the convergence in distribution. The continuous mapping theorem formalizes this concept.


5. Slutsky’s Theorem

Slutsky's Theorem is a powerful result that relates products, sums, and ratios of converging sequences of random variables.

Theorem: If XndX and YnPa where a is a constant, then:

Reasoning: Slutsky's Theorem combines converging sequences in different manners, demonstrating that convergence in distribution is preserved under certain algebraic operations, provided one of the sequences converges in probability.


6. Limit Theorem for the Delta Method

The Delta Method is used to approximate the distribution of a function of a random variable that is asymptotically normal.

Theorem: Suppose n(Ynθ)/σdN(0,1). For a function g such that g(θ)0, n[g(Yn)g(θ)]/σ|g(θ)|dN(0,1)

Reasoning: The Delta Method leverages the fact that if Yn converges to θ and is asymptotically normal, then g(Yn) will also be asymptotically normal, with the scaling factor being determined by the derivative of g at θ.


IV. Law of Large Number


1. Law of Large Numbers (LLN)

The Law of Large Numbers states that as the number of independent, identically distributed (i.i.d.) random variables X1,X2,,Xn increases, the sample average Xn (the average of the first n observations) will converge in probability to the expected value μ=E(Xi). Mathematically, it can be expressed as:

Xn=1ni=1nXiPμ where Xn is the sample mean, and P indicates convergence in probability.

Proof Outline:


2. Monte Carlo Integration

Monte Carlo integration is a method used to estimate the value of an integral using random sampling.

Goal:

To calculate the integral I(f)=01f(x)dx using a Monte Carlo method.

Steps:

  1. Generate Sample Points: Generate n i.i.d. random variables X1,X2,,Xn uniformly distributed on [0,1], i.e., XiU(0,1).

  2. Compute the Sample Mean: Calculate the sample mean of the function values at these points: I^(f)=1ni=1nf(Xi) This sample mean I^(f) serves as the Monte Carlo estimate of the integral I(f).

  3. Apply the Law of Large Numbers: According to LLN, as n becomes large, the sample mean I^(f) will converge in probability to the expected value of f(X), which is the value of the integral I(f): I^(f)PE[f(X)]=01f(x)dx=I(f)


V. Central Limit Theorem

The Central Limit Theorem (CLT) is a fundamental theorem in probability theory. It states that, given a sufficiently large number of independent and identically distributed (i.i.d.) random variables with a finite mean and variance, the distribution of the sum (or average) of these variables approaches a normal distribution as the number of variables increases.


1. Mathematical Formulation

Let X1,X2,,Xn be i.i.d. random variables with mean μ and variance σ2. Define: Xn=1ni=1nXi as the sample mean and Tn=nXn as the sum of these variables.

According to the CLT: limnP(Tnnμσnx)=limnP(n(Xnμ)σx)=Φ(x), where Φ(x) is the cumulative distribution function (CDF) of the standard normal distribution N(0,1).

This means that as n becomes large, the standardized sum Tnnμσn (or equivalently, the standardized mean n(Xnμ)σ) converges in distribution to N(0,1)​.

Given that X1,X2,,Xn are independent and identically distributed (i.i.d.) random variables with mean μ and variance σ2, let's analyze the distribution of the sample mean X¯n when each Xi follows a normal distribution.

Assumptions

Distribution of the Sample Mean X¯n

Since each Xi is normally distributed, the sum of these i.i.d. normal random variables also follows a normal distribution. The key properties of X¯n are:

  1. Mean of X¯n: The expected value of the sample mean X¯n is: E[X¯n]=E[1ni=1nXi]=1ni=1nE[Xi]=1nnμ=μ

  2. Variance of X¯n: The variance of the sample mean X¯n is:

    Var(X¯n)=Var(1ni=1nXi)=1n2i=1nVar(Xi)=1n2nσ2=σ2n

  3. Distribution of X¯n: Since XiN(μ,σ2), the sample mean X¯n follows a normal distribution due to the properties of the normal distribution. Specifically: X¯nN(μ,σ2n)


2. Normal Approximation to Binomial Distribution

Let's consider the binomial distribution as an example of applying the CLT.

Suppose X1,X2, are i.i.d. Bernoulli random variables with parameter p, i.e., XiB(1,p). The sum Tn=X1+X2++Xn is binomially distributed, TnB(n,p).

The mean and variance of each Xi are: E(Xi)=p,Var(Xi)=p(1p) Therefore, the sum Tn has: E(Tn)=np,Var(Tn)=np(1p)

According to the CLT, when n is large enough, the standardized version of Tn can be approximated by a normal distribution: Tnnpnp(1p)dN(0,1)

This means that the binomial distribution B(n,p) can be approximated by a normal distribution N(np,np(1p)) when n is large. This approximation is particularly useful because the normal distribution is easier to work with, especially when dealing with probabilities and percentiles.

3. Sampling error

  1. Context:

    • Suppose you're interested in knowing the average income μ of families in Singapore.

    • If you could ask every family in Singapore for their income, you would get the true average income μ.

  2. Sampling:

    • Instead of asking every family, you take a random sample of 1000 families and calculate the average income of these 1000 families. Let's denote this sample average by X1000.

  3. Sampling Error:

    • The difference between the sample average X1000 and the true average μ is called the sampling error, denoted by X1000μ.

    • This error arises because the sample might not perfectly represent the entire population.

  4. Assessment of Sampling Error:

    • The sampling error can be assessed by understanding its distribution. Under certain conditions (like the Central Limit Theorem), the sampling distribution of X1000 is approximately normal with mean μ and variance σ21000, where σ2 is the population variance.

    • Therefore, the sampling error can be assessed using confidence intervals or hypothesis testing to determine how close X1000 is likely to be to μ.

4. Experimental error

  1. Context

    • Consider an experimental error ϵ, which is the sum of multiple small component errors: ϵ=a1ϵ1+a2ϵ2++anϵn.

    • These errors could arise from various sources, such as measurement inaccuracies, variations in raw materials, or differences in experimental conditions.

  2. Why Normal Distribution?

    • According to the Central Limit Theorem (CLT), when these component errors are independent and identically distributed, the sum (or a linear combination) of these errors will tend to follow a normal distribution as the number of components becomes large.

  3. Implication

    • Because of this property, the overall experimental error ϵ can often be assumed to be normally distributed, even if the individual components are not normally distributed.

    • This assumption of normality simplifies statistical analysis and is foundational for many statistical methods, such as regression analysis and hypothesis testing.