Introduction to the Law of Large Numbers

Mehmet Emre CETIN
Oct 6, 2023
5 min read

Updated: Nov 14, 2023

Illustration of the Law of Large Numbers made by DALL-E

Understanding the Law of Large Numbers: Its Implications in Coin Flips, Dice Rolls, and Statistical Hypotheses

The law of large numbers (LLN) is a theorem in probability theory that states that as the number of trials in an experiment approaches infinity, the average of the results will get closer and closer to the expected value. In other words, the LLN says that if you repeat an experiment a large number of times, the average of the results will be very close to the true average of the population.

Flipping a coin: If you flip a coin 10 times, you might not get exactly 5 heads and 5 tails. However, if you flip a coin 1000 times, you will be very close to getting 500 heads and 500 tails. This is because the LLN states that the average of the results of flipping a coin will get closer and closer to 50% heads as the number of trials increases.
Rolling a die: If you roll a die 10 times, you might not get exactly 1.5 as the average result. However, if you roll a die 1000 times, you will be very close to getting 3.5 as the average result. This is because the LLN states that the average of the results of rolling a die will get closer and closer to 3.5 as the number of trials increases.
Estimating the population mean: The LLN can be used to estimate the population mean. For example, if you want to estimate the average height of all adults in the United States, you could randomly sample 1000 adults and calculate the average height of the sample. The LLN states that the average height of the sample will be very close to the average height of the population as the sample size increases.
Testing statistical hypotheses: The LLN can also be used to test statistical hypotheses. For example, you could use the LLN to test the hypothesis that the average height of all adults in the United States is 5 feet 10 inches. You could randomly sample 1000 adults and calculate the average height of the sample. If the average height of the sample is significantly different from 5 feet 10 inches, then you could reject the hypothesis that the average height of the population is 5 feet 10 inches.

Data and Methodology

Data:

The analysis revolves around simulated exponential distributions. The primary characteristic of interest in this exponential distribution is its rate parameter, denoted as λ. For this analysis, λ is set at 0.3. The mean (Simmean) and standard deviation (σ) of this distribution are derived as the inverse of λ.

Methodology:

Setting Parameters and Seeds:

Parameters for the exponential distribution, including �λ, are defined.
A seed is set to ensure reproducibility in the random number generation process.

Initial Visualization:

1000 random numbers are drawn from the exponential distribution with rate �λ. A histogram showcases the distribution of these 1000 random exponentials.

Sampling and Distribution of Sample Means:

For each of the 1000 simulations, a sample of size 50 is drawn from the exponential distribution, and its mean is computed.
A histogram is plotted to visualize the distribution of these sample means.

Comparative Analysis with Increasing Simulations:

The process of drawing samples and calculating their means is reiterated for varying numbers of simulations: 50, 100, 500, and 1000.
For each simulation size:
- Sample means are computed and stored.
- A histogram is plotted to show the distribution of sample means. This histogram also includes vertical lines to indicate the actual computed mean and the theoretical mean of the distribution.
- Overlaid on the histogram is the theoretical normal distribution (based on the Central Limit Theorem) with the mean (SimmeanSimmean) and standard deviation adjusted for the sample size.

The methodology provides a comprehensive simulation-based exploration into the behavior of sample means derived from an exponential distribution. The comparison across different numbers of simulations demonstrates the convergence of the sample mean to the theoretical mean, reinforcing the Central Limit Theorem's implications in practice.

Calling Libraries and Creating Distribution and Sample Histogram Graphics

library(ggplot2)
lambda=0.3
Simmean<-1/lambda
sigma<-1/lambda

set.seed(500)
par(mfrow=c(1,2))
hist(rexp(1000,lambda),
main="Distribution of \n1000 \nrandom exponentials",col="lightblue")
n=50m=NULL
for(i in 1:1000)
{m=c(m,mean(rexp(n,lambda)))}

hist(m,xlab="Sample mean",
main=paste("Distribution of SAMPLE MEAN \n(samples of 50 random exps)","\n1000 simulations"),col="lightblue")

Creating Histograms for 50, 100, 500, and 1000 Simulations

par(mfrow=c(2,2))
mm<-NULL 
#mean of means
varm<-NULL 

#variance of means#No number of simulations increasing:simulations=c(50,100,500,1000)
for(No in simulations){m<-NULLfor(i in 1:No){m=c(m,mean(rexp(n,lambda)))}

#mean of samplemm0=round(mean(m),3) #mean of means

mm=c(mm,mm0)varm0=round(var(m),3) #variance of means
varm=c(varm,varm0)hist(m,xlab="Sample mean",
main=paste(No,"simulations\n","mean=",mm0," var=",varm0),
prob=TRUE,col="lightblue")
abline(v=mm0,col="darkblue",lwd=4)

##Theoretical exponential distribution:
x=seq(min(m), max(m),length=100)
y=dnorm(x,mean=Simmean,sd=sigma/sqrt(n))
lines(x,y,pch=22,col="red",lwd=2)
abline(v=Simmean,col="red",lwd=2)}

Deciphering the Law of Large Numbers: Expected Values, Convergence, and Practical Applications in R Simulations

The R code I provided simulates the LLN for a population of exponential random variables. The code first generates 1000 random exponential variables with a lambda of 0.3. The mean of this population is 1/lambda = 3.33. The code then calculates the mean of 50 random samples of size 50 from this population. This is repeated 1000 times. The results show that as the number of simulations increases, the distribution of the sample means becomes more and more concentrated around the population mean of 3.33. This is a visual representation of the LLN.

The code also plots the theoretical exponential distribution for the sample means. The theoretical distribution is a normal distribution with a mean of 3.33 and a standard deviation of 1/sqrt(50) = 0.164. The code shows that the distribution of the sample means is converging to the theoretical distribution as the number of simulations increases.

LLN can be summarized 1 2 3 4 as the given topic titles;

The expected value is the average of all possible outcomes of an experiment.
The variance is a measure of how spread out the possible outcomes of an experiment are.
The standard deviation is the square root of the variance.

The weak law of large numbers states that the average of the results of an experiment will get closer and closer to the expected value as the number of trials approaches infinity.

The strong law of large numbers states that the average of the results of an experiment will converge to the expected value with probability 1 as the number of trials approaches infinity.

The LLN is based on the idea of convergence. This means that as the number of trials increases, the average of the results will get closer and closer to a specific value, which is the expected value of the experiment.
The LLN is a strong theorem. This means that it is true with probability 1. In other words, if you repeat an experiment an infinite number of times, the average of the results will converge to the expected value with probability 1.
The LLN is not applicable to all experiments. For example, the LLN does not apply to experiments where the outcomes are not independent.

The LLN is used in a variety of applications, such as:

Estimating population parameters
Testing statistical hypotheses
Making predictions about future events

Conclusion

The results of the R code provide a visual representation of the law of large numbers. The code shows that as the number of simulations increases, the distribution of the sample means becomes more and more concentrated around the population mean. This is because the LLN states that the average of the results of an experiment will get closer and closer to the expected value as the number of trials approaches infinity.

The R code can be used to simulate the LLN for any population. This can be a useful tool for understanding how the LLN works and how it can be used to make inferences about populations. In this code example, we explored the Law of Large Numbers by generating random exponential distributions and examining the behavior of sample means for various sample sizes. The Law of Large Numbers states that as the sample size increases, the sample mean converges to the population mean. The histograms and theoretical distribution curves displayed in the code illustrate this principle, as the sample means become more concentrated around the true population mean as the number of simulations increases.

Understanding the Law of Large Numbers is crucial in statistical analysis and decision-making processes. It provides a solid foundation for making inferences about a population based on sample data. By recognizing the convergence of sample means to the population mean, we can draw reliable conclusions and make accurate predictions.