Intro to Statistics

Back to UWaterloo


Data Summaries

Conditional Probability:

Counting Rules:

Random Distributions:



Sampling Distribution:

Point Estimation:
Find θ-hat. L(θ, y1...yn) = P(Y1 = y1, Y2 = y2, ... Yn = yn)

Maximal Likelihood Estimate:

By extension, Invariance Property states for θ-hat as the MLE of θ, under mild conditions g(θ-hat) is the MLE of g(θ) forall cont. function g

Interval Estimation:

Confidence Intervals:

Goodness of Fit Tests: X1...Xn ~ f(xi; θ's)

  1. construct frequency tables for intervals
  2. calculate the MLE for each of the θ's
  3. compute an estimated probability for each range in the frequency table: Pi = integral a to b of 1/θ-hat • e^(-x/θhat)dx
  4. Calculate expected frequencies ei = n•Pi
  5. Compute λ = 2 ∑ yi • log(yi/ei)
  6. p-value = P(Λ ≥ λ) ~ X2n-k-1: n = categories, k = parameters


Central Tendency

the degree of clustering of values of a statistical distribution

Measure of Dispersion

Measure of Skewness

SKEWNESS = Mean - Median
         = (1/n) ∑ (yi - ȳ)^3
         = (1/n) [∑ (yi - ȳ)^2]^(3/2)

Measure of Association

Graphical Summaries of Data Sets

identifies distribution from which your sample is drawn, so you apply the right model

Statistical Inference

Theory of Probability

Rules of Probability:

Counting Rules

Random Variable Distribution

A random variable Y is a function that assigns a number ot each outcome of a random experiment

Cumulative Distribution Function

Discrete Distributions

Continuous Distributions

X is said to be a continuous random variable if X takes values in an interval [a,b], a ≠ b

Normal Distribution

A random variable Z is called a standard normal distribution if

  1. Z takes values (-infinity, +infinity)
  2. f(Z) = bell curve formula
  3. Z ~ N(0,1) where (mean, variance); maximum at zero, and symmetric around 0

Theorem: Any normal problem, you can convert to a Z-problem; If X ~ Normal with mean µ and variance σ^2, then X-µ/σ = Z ~ N(0,1), where sigma is standard deviation, not variance; These are just transformations on the normal distribution

Some applications:

  1. given mew, sigma and a value A, find the probabilities P(x ≤ A) = ?
  2. given mew, sigma and probability, find the value: P(x≤A) = 0.9 find A
  3. Given probabilities and values, we are asked to find mew and sigma (using standardization)

Distribution of the Sample Mean

Theorem: If x1, x2, ..., xn are independent normally distributed random variables with mean mew and variance sigma squared, then mean of x = ( x1 + x2 + ... + xn ) / n

Theorem: If X ~ N(µ1, sig1^2) and Y ~ N(µ2, sig2^2), then aX + bY ~ N(a•µ1 + b•µ2, a^2•sig1^2 + b^2•sig2^2)

x ~ N(80, 64) weight of canadians. 20 canadians get on an elevators, whats the probability it will break:

Central Limit Theorem: Suppose x1 ... xn have a mean µ and variance sig^2 (not necessarily normal). For x bar = the mean of any independent random variables, then if n is large, x bar ~apprx. N(µ, sig^2/n)


X is said to follow an exponential distribution with mean µ, X ~ Exp(µ) if X takes values between 0 and infinity and the density function of X f(x) = (1/µ)•e^(-X/µ), X > 0

Statistical Inference

Toss coin 100 times

There are two general methods of interval estimation:

θ is unknown parameter of interest for a population.

For finding the MLE of θ for a Uniform Distribution, for Y from [0, θ], intuitively it is double the mean of the sample.

We can think of the likelihood function simply as the probability that a given sample happens, plotted as a function of some unknown parameter, θ which the distribution function depends on. It tells us how likely a sample is for a given distribution.

Invariance Property Theorem: If θ-hat is the MLE for θ, then for non-extreme values, g(θ-hat) is the MLE for g(θ) for any continuous function g

Interval Estimation (What values of θ are reasonable?)

Same problem setup as finding the MLE, given a sample from a large population, find an interval [l, u] which contains θ with "high probability"

For the MLE, find one # which represents our best guess θ-hat. We can create an interval near θ-hat of values that we deem plausible in some sense.

For interval estimation, we can use relative likelihood function R(θ) or Likelihood interval through sampling distribution

By extension, we can use a continuous region to find θ for a specified percent certainty (since the likelihood function is a probability density function)

Estimates can be thought of as an outcomes of some r.v. that needs to be identified to construct intervals. We can use a sample {y1...yn} to estimate the interval [L,U] s.t. the interval [L,U] contains θ with high probability

To calculate the confidence interval,

Chi-Squared Distribution

A random variable W follows Chi-Squared Distribution with degrees of freedom k where k is a positive integer W ~ Χk^2 if W = z12 + ... + zk2 where zi ~ N(0,1) for independent r.v.

Student's T-Distribution

Can be used to estimate the mean of a normal distribution where the standard deviation is unknown.

Def: Let T be a r.v. T is said to follow a student's T-distribution with k degrees of freedom, T ~ T(k) if T is a ratio of two indep r.v.s T = Z/W

Prediction Intervals (Constructing Confidence Interval for next element)

Also called time-series analysis/forecasting. n historical observations Y1, ..., Yn from a Normal population with mean &mew; and s.d. σ. We want to find an interval [a,b] based on data sample containing Yn+1 with a high degree of confidence. Predict the next element

Model: Y1,...Yn ~ N(μ, σ^2). Yn+1 = ?

Hypothesis Testing (Is θ reasonable?)

Def: A statement made about attributes of a population. Null Hypothesis is current belief Ho. Alternate hypothesis for measuring evidence for/against Ho

Hypothesis Testing Steps:

  1. construct the discrepancy measure D. D is a random variable that measures how much the data disagrees with Ho
  2. calculate the value d (outcome from our sample)
  3. The p-value: P(D ≥ d); p-value is determining how unusual the sample is, assuming the null hypothesis is true. If p-value is 1%, then only 1% of samples have this outcome, or more likely, the null hypothesis is wrong.

Suppose we want to deterimine if a coin is fair. Ho: θ = 0.5, H1: θ ≠ 0.5. Then D = |Y - 50| if Y is the number of successes out of 100 trials. d, for some sample, is computed and then we solve for p-value P(D ≥ d).

We are heavily in favour of the null hypothesis. So our conventions state:

Type I error: Convicting innocient, Type II: Acquilting guilty people

Hypothesis Testing for different distributions

Concentrate on two sided test

yi ~ f(yi, θ), i = 1...n, yi's independent r.v, θ is unknown parameter, {y1...yn} is sample

We want to determine whether H0: θ = θ0 or H1: θ ≠ θ0

For normal problem, recall Y-bar - μ/s/n^(1/2) ~ Tn-1

For Binomial Problem,

For Poisson,

Relationship between CI and Hypothesis testing: If θo belongs to the 100q% C.I, the p-value of the test (H0: θ = θ0, H1: ≠) will be greater than 1 - q

For large samples, we can construct tests for parameters using the Likelihood Ratio Test

  1. Λ(θ) ~ X12, Λ is a special case of D for hypothesis testing. It applies to all distributions and can be thought of as general case
  2. Calculate λ(θ0) = -2log(L(θ0)/L(θ-hat)
  3. Calculate p-value P(A ≥ λ) = P(X1^2 ≥ λ) = P(Z^2 ≥ λ)

Equality of two means:

Simple Linear Regression Model

We make some assumptions:

  1. Given our x, Y's are normally distributed
  2. E(Yi) = α + β•Xi (the average of the Yi's are a linear function of the X's)
  3. V(Yi) = σ^2 is independent of the value of X (doesn't change with respect to X)

Regression is represented with a deterministic part and random part.

X = given constants, X is the regressor which helps explain Y

  1. Y = r.v., mean depends on x in a linear function E(Y) = α + βx, unknown constants
  2. Y's are all normally distributed (Gaussian REgression Model)
  3. Var(Yi) = σ^2, independent of X -> Homoscedasticity

The population means are a linear function of X

Yi ~ N(α + βxi, σ^2), i = 1...n and Yi's independent

Method of Least Squares

Method of fitting an estimated regression line to the data

We get some equations to help us find α-hat, β-hat, σ-hat and s.

Sxx is the sum of the residuals for x^2, Syy for y^2 and Sxy for x•y

we get s = Residual standard, the error of the regression model, measuring the amount of variability of Y that cannot be explained by X.

No we can do hypothesis testing for α and β. Take β for example, if we want to HT for β we're testing for existance of a linear relationship between Y and X. H0: β = β0

Residuals rx-hat = Residual = yi - α-hat - β-hat•xi

Goodness of fit tests:
Given X1...Xn, check if Xi ~ f(ni;θ)

Test variable independence, construct observed and expected tables
   S    NS        S    NS
L 40    60   100  
R 50    50   100
  90    110  200

Two Population Problem

Two different populations, check if their means are equal


Y = α + βx + R, R ~ N(0, σ^2)

  1. α-hat, β-hat, σ-hat, Se

you can apply the regression method for unmatched by setting one outcome to 1 and other to 0


  1. (n-1)S^2/σ^2 ~ X^2n-1

  2. use pivotal quantity for normal, isolate y-bar, sigma is known