Intro to Statistics
Back to UWaterloo
Summary:
Data Summaries
 Discrete, continuous, categorical, ordinal
 units, variables, attributes, summarizing data
 center, variability, symmetry, skewness, measuring association between two variables
 summarizing algebraically and graphically
 sample mean, mode, median & variability (5 num summary)
 IQR = Q3  Q1, Range = Max  min, s^2 = 1/(n1)∑(yiybar)^2 sample variance
 kurtosis = 3 for normal
 measure of Association, relative risk
 sample correlation coefficient r_{xy} = ∑(xi  xbar)(yi  ybar)/(∑(xixbar)^2•(yi  ybar)^2)
 histogram, box and whisker, whiskers are Q3 + 1.5•IQR
Conditional Probability:
 Bayes Theorem:
P(BiA) = P(ABi)*P(Bi)/∑(P(ABj)*P(Bj))
Counting Rules:
 Permutations:
nPr = n!/(nr)!
 Combinations:
nCr = nPr/r!
; like permutation formula without the r! permutations for each set of r values
Random Distributions:
Discrete:
 Binomial: P of r successes in n bernouli trials:
P(X = r) = nCr • p^r • (1p)^(nr)
; there are nCr combinations of exactly r successes in n trials
 Y ~ Bin(n, θ)
 E(X) = nθ
 V(X) = nθ(1θ)
 HyperGeometric: P of r successes without replacement: P(X = r) = ^{n}C_{r} • ^{Nn}C_{nr} / ^{N}C_{r}; we calculate what percent of all outcomes have r successes and nr failures, for n
 Geometric: probability of k1 failures before a success, for bernouli trials:
P(X = k) = (1p)^(k1)•p
 Negative Binomial: P of k failures before x successes
 Poisson: models number of event occurences in a time interval:
P(X=k) = λ^k•e^λ/k!
; λ is E(X) and Var(X)
Continuous:
 Normal
 Z ~ N(0,1)
 Y ~ N(μ, σ^2)
 Yi...Yn ~ N(μ, σ^2), then Ybar ~ N(μ, σ^2/n),
 Exponential: the time before an occurence of an event (time between events in poisson)
 Chi Square: distribution for the sum of a set of squared normally distributed r.v. It is a function of degrees of freedom v, and always has a mean of v and variance of 2v
 if df =1, X^2 = Z^2
 if df = 2, X^2 = Exp(2)
if df ≥50, X^2 ~ N(k, 2k)
 Student's T Distribution is the ratio between two independent r.v.s T = Z/W; Z is normal and W is ChiSquare. T has k degrees of freedom, and as k > infinity, T > Z
 Symmetric, and if df is large, T ~ Z
Sampling Distribution:
 samples are a subset of the population, used to infer properties of the population. xbar is a specific sample mean, s^2 is a sample's variance, while Xbar and S^2 (note the capitals) are the mean and variance of a generalized random sample
 since a statistic is a random variable that depends only on the observed sample, it has a distribution, the sampling distribution (but remember the population's mean and variance are constant and unknown)
 sampling distributions of Xbar and S^2 enable us to make inferences about the population parameters μ and σ^2.
 Central Limit Theorem: For Z = Xbar  μ/(σ/n^{1/2}), as n > infinity then the distribution approaches the standard normal n(0,1). Applies for large n ≥ 30, or for sample distributions whose original distribution is close to normal
 if a population is normally distributed, we can calculate the sample variance S^2: Chi^2 = (n1)•S^2/σ^2, degrees of freedom v = n1
Point Estimation:
Find θhat. L(θ, y1...yn) = P(Y1 = y1, Y2 = y2, ... Yn = yn)
Maximal Likelihood Estimate:
 purpose is to estimate θ for an instance of a distribution
 take a sample of population for a known probability distribution
 construct a likelihood function: L(θ) = Πf(yi)
 use the log likelihood function and find its max for θhat
 θhat is the MLE if θhat maximizes L(θ or l(θ)
 Binomial θhat = y/n, y is num of successes
 Poisson = ybar
 exponential = ybar
 normal μhat = ybar, and σhat = (1/n ∑(yiybar)^2)^(1/2)
By extension, Invariance Property states for θhat as the MLE of θ, under mild conditions g(θhat) is the MLE of g(θ) forall cont. function g
Interval Estimation:
 using relative likelihood, the 100p% likelihood interval for θ says that for 0 ≤ p ≤ 1, the 10% likelihood interval (p = 0.1•100 = 10%) says that any θ in this interval is at least 10% of L(θhat). It's the interval of all θ values that are at least p% of the optimal value
 the x confidence interval says that x% of all samples we choose will compute a theoretical confidence interval that contains the actual unknown parameter.
 with known σ, then by Central Limit Theorem, Z = Xbar  μ/(σ/n^{1/2}), and μ_{Xbar} = μ and σ_{Xbar} = σ/n^(1/2) we can construct a confidence interval for μ by subbing in our value for Z, using the ztable to compute the probability, then isolating Xbar to find the interval.
 with unknown population σ, a random sample from a normal distribution has a sample distribution of T = Xbar  μ/(S/n^(1/2)), with n1 degrees of freedom. S is the sample standard deviation. Now we use the ttable to calculate the to find the bounds of the interval given a desired probability of confidence, then isolate for the parameter.
 by convention, for any population distribution, a sample n ≥ 30, we can replace σ with s and the confidence interval xbar ± z * • s/n^(1/2) can be used, z * is the ztable value that gives us the desired confidence probability
 the x% confidence interval says with x% confidence that the difference between the sample mean and the actual mean will not exceed the interval bounds z_{a/2}•σ/n^{1/2}
Confidence Intervals:
 Binomial: CI: θhat ± Z * • (θhat(1θhat)/n)^(1/2), n ≥(Z * / l)^2 • 1/y, l = MOE
 Normal: CI for μ ybar ± Z * σ / (n)^(1/2) if σ is known. ybar ± t * • s / (n)^(1/2) where df = n1 for σ unknown
 Normal: CI for σ^2 [(n1)s] ???
 Poisson: ybar ± Z * (ybar/n)^(1/2)
 Relationship between LI and CI = 2log(l(θ)/L(θ) ~ X^2 ???
 95% CI is e^(z^2/2) LI
Goodness of Fit Tests: X1...Xn ~ f(xi; θ's)
 construct frequency tables for intervals
 calculate the MLE for each of the θ's
 compute an estimated probability for each range in the frequency table: Pi = integral a to b of 1/θhat • e^(x/θhat)dx
 Calculate expected frequencies ei = n•Pi
 Compute λ = 2 ∑ yi • log(yi/ei)
 pvalue = P(Λ ≥ λ) ~ X^{2}_{nk1}: n = categories, k = parameters
Notes
Central Tendency
the degree of clustering of values of a statistical distribution
 arithmetic mean is
(1/n)∑(yi)
for 1 ≤ i ≤ n
 median or geometric mean is
(y1•y2•...•yn)^(1/n)
 harmonic mean is the reciprocal of the arithmetic mean of the reciprocals
 For any two #s, AM ≥ GM ≥ HM
 median is the middle element of a set
 Quartiles is splitting data into four equal parts
 Percentiles break data in to 100 parts
 Mode is the element of maximum frequency
Measure of Dispersion
 Range = Max  min
 InterQuartile Range (IQR) = Q3  Q1 (where Q1=25%, Q2=50%, Q3=75%)
 Variance =
σ^2
, the average of the squared deviations from the mean for mean ȳ,
σ^2 = (1/(n1)) ∑ (yi  ȳ)^2
 Standard deviation
σ
= positive square root of the variance
 for some transformation of the form
yi = a + bxi
, variance = b^2 • var(x)
, sd(y) = b•sd(x)
Measure of Skewness
SKEWNESS = Mean  Median
= (1/n) ∑ (yi  ȳ)^3
= (1/n) [∑ (yi  ȳ)^2]^(3/2)

Kurtosis is the measure of frequency of extreme observations, compared to a normal distribution

K = some nast formula
, the fourth standardized moment
 ASIDE, a standardized moment of a probability distribution is a measure of the shape of a set of points
 the nth moment of a continuous function ƒ(x) about a value
c
is µn = ∫(xc)^n • ƒ(x) dx
Measure of Association
 bivariate data is data that is dependent on two variables
 sample correlation coefficient quantifies the degree at which two variables are correlated
 r ~= 0 shows no linear relationship, while r ~= 1 shows strong linear relationship
Graphical Summaries of Data Sets
identifies distribution from which your sample is drawn, so you apply the right model
 Relative Frequency Histogram
 Density Histogram, the height is chosen s.t. the relative freqency of the bin = area of the corresponding rectangle in the histogram (accounts well for variable bin widths)
 relative freqency is
frequency of each bin/total sample size
 Empirical cumulative distribution function, ƒ(y) =
# of observations ≤ y/Total # of observations
 plots an upwards staircase
 Box and Whiser Plot plots quaritles
 five num summary = {min, Q1, Q2, Q3, max}
 quartiles are calculated with
(n+1) * percentile
, averaging its neighbours for non integer values
 outliers are numbers outside of
Q3 + 1.5 * IQR
and Q1  1.5 * IQR
 left skewed if distribution function has a left slope (right vise versa)
Statistical Inference
 Descriptive Statistics describe the properties of the data
 Statistical inference is analyzing the properties
 Three major types of inference problems
 Estimation: using sample of a population to estimate unknowns
 Hypothesis Testing: drawing samples from a population to test if a hypothesis is reasonable
 Prediction Problems: predicting future value based on past data (time series analysis)
 statistical modelling is the identification of the distribution from which your data set and unknown are related
Theory of Probability
 Random experiment, whose outcomes are uncertain
 S or Omega = Sample Space, the set of all possible outcomes (equally likely)
 event A is a subset of S
 simple event is one with only one outcome
Rules of Probability:
 Fi is the empty set, P(omega) = 1, P(Fi) = 0
 if A is a subset of B, P(A) <= P(B)
 A and B are mutually exclusive if both events cannot occur simultaneously (intersection of A and B = Fi)
 A and B are EXHAUSTIVE events if at least one of them have to happen (P(A union B) = P(Omega))
 complement of P(Ac) = 1  P(A)
P(A u B) = P(A) + P(B)  P(A intersection B)
 Bayes Theorem:
P(BiA) = P(ABi)*P(Bi)/∑(P(ABj)*P(Bj))
 denominator is the sum of intersections of A with all events in the event space. This works because B1...Bk are exhaustive AND mutually exclusive
Counting Rules
Random Variable Distribution
A random variable Y is a function that assigns a number ot each outcome of a random experiment

Y : omega > Reals
is a map from sample space to real line two coin flips, {HH, HT, TH, TT} > {2, 1, 1, 0}
 Random Variables: Y, X. Outcomes: y, x. Then X is said to be a discrete random variable if x takes integer values only, or continuous if the range is Real
 the distribution function of x:
f(x) = P(X=x)
, probability that r.v. X takes on the value of x probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value
 a distribution table for the random variable x is the set of all possible values of x, with their probabilities (charting the distribution function)

E(x) = μ
, mean. Var(x) = σ^2
variance
 variance is the average of the squared deviations from the mean, or
E(Y  E(Y))^2 = sumof (yi  mean)^2 * f(yi)
 a discrete random variable can take infinite values (flip a coin until heads)
 linearity of expectation is the property that the expected value of the sum of random variables is equal to the sum of individual expected values, regardless of whether they are independent (think weighted average)
 Y = a + b * X, then E(Y) = a + b * E(X) and V(Y) = b^2 * Var(X)
Cumulative Distribution Function
 let Y be a r.v. The C.D.F of Y is given by
F(y) = P(Y <= y) forall y
 remember CDF is the step function for discrete distributions, a piecewise definied function
 probability/distribution function and CDF are two ways to represent some distribution of data
f(y) = F(y)  F(y1)
Discrete Distributions
 Uniform Discrete Distribution

Omega = {1,2,...,n}
, f(y) = P(Y=y) = 1/n forall y
. That is, all yi in Sample Space have the same probabilities
 Hypergeometric Distribution, describes the probability of r successes in n draws without replacement from a finite population size N that contains exactly R successes, where each draw is either a success or failure
P(X=r) = R choose r * (NR) choose (nr) / N choose n
 Binomial Distribution is same, but with replacement. The probability is the same for each trial
 Geometric/Negative Binomial distribution is # of failures before 1st success
 X ~ Poi(mew), Poisson distribution, x follows a Poisson distribution with mean mew if x = 0,1,2,3,....
 P(X = x) = f(x) = e^(mew) * mew^x / x!
 approximation to a binomial probabilities, limit of a binomial distribution, lim n>infinity, lim p > 0, np = mew
 Binomial(n,p) > Poi(mew)
Continuous Distributions
X is said to be a continuous random variable if X takes values in an interval [a,b], a ≠ b
 for any continuous distribution X, P(X=c) = 0 forall c
 F is called the cumulative distribution of x if F(x) = P(X=x) forall x
 probability density function, P(a ≤ x ≤ b) = a∫b f(x)dx (the area under the curve is the probability of the region
 it's just calculus. Now we're dealing with regions instead of discrete values
 ∫f(x) = 1, total area is 1
 uniform distribution: X ~ Uni[a,b] if f(x) = { 1/(ba) if a ≤ x ≤ b else 0 }
 constant function on the interval [a,b]
Normal Distribution
A random variable Z is called a standard normal distribution if
 Z takes values (infinity, +infinity)
 f(Z) = bell curve formula
 Z ~ N(0,1) where (mean, variance); maximum at zero, and symmetric around 0
 P(1 ≤ z ≤ 2) = P(2 ≤ z ≤ 1), property of symmetry
 for normal distribution, F(x) is the zscore for x, the probability of P(z ≤ x)
 100th percentile is all data, 1st is lowest 1%
Theorem: Any normal problem, you can convert to a Zproblem; If X ~ Normal with mean µ and variance σ^2, then Xµ/σ = Z ~ N(0,1)
, where sigma is standard deviation, not variance; These are just transformations on the normal distribution
Some applications:
 given mew, sigma and a value A, find the probabilities P(x ≤ A) = ?
 given mew, sigma and probability, find the value: P(x≤A) = 0.9 find A
 Given probabilities and values, we are asked to find mew and sigma (using standardization)
Distribution of the Sample Mean
Theorem: If x1, x2, ..., xn are independent normally distributed random variables with mean mew and variance sigma squared, then mean of x = ( x1 + x2 + ... + xn ) / n
Theorem: If X ~ N(µ1, sig1^2) and Y ~ N(µ2, sig2^2), then aX + bY ~ N(a•µ1 + b•µ2, a^2•sig1^2 + b^2•sig2^2)
x ~ N(80, 64) weight of canadians. 20 canadians get on an elevators, whats the probability it will break:
 Find P((x1 + x2 + ... + x20)/n ≥ 1000/20) = P(x bar ≥ 50)
 x bar ~ N(80, 64/20)
Central Limit Theorem: Suppose x1 ... xn have a mean µ and variance sig^2 (not necessarily normal). For x bar = the mean of any independent random variables, then if n is large, x bar ~apprx. N(µ, sig^2/n)
 if we have a large enough sample size, the sample means are normally distributed regardless of the original distribution
 arbitrarily, we'll call n ≥ 30 LARGE
Exponential
X is said to follow an exponential distribution with mean µ, X ~ Exp(µ)
if X takes values between 0 and infinity and the density function of X f(x) = (1/µ)•e^(X/µ)
, X > 0
 looks right skewed
 CDF of the exponential
F(x) = Pr(X≤x) = 1  e^(x/µ)
 since CDF is integral of density function, from min to infinity, and integral from min to infinity is 1 forall f(x)
Statistical Inference
Toss coin 100 times
There are two general methods of interval estimation:
 relative likelihood function R(theta) (Likelihood interval)
 R(theta) = L(theta)/L(thetahat) , thetahat is the MLE
 LOG RELATIVE LIKELIHOOD FUNCTION: gamma(theta) = log R(theta) = log L(theta)  log L(thetahat)
 sampling distributions (confidence interval)
θ is unknown parameter of interest for a population.

{y1, ..., yn}
is independently drawn sample from this population
 θhat is most likely value of theta that generated the sample
 def: let Y1, ... Yn are independent random variables with distribution function f(y, theta)
 L(θ) = product of the distribution function evaluated at all sample points Π f(yi, θ) for 1 ≤ i ≤ n
 the MLE (maximal likelihood estimate) is simply the max of the likelihood function L
 we can use the loglikelihood for computational convenience. The log of a product becomes a summation.
 recall for twovar functions, the firstorder condition for extrema is when
∂ℓ/∂x = 0 and ∂ℓ/∂y = 0
For finding the MLE of θ for a Uniform Distribution, for Y from [0, θ], intuitively it is double the mean of the sample.
 density function f = 1/(ba) = 1/θ
 distribution function f(y) = 1/θ if 0 ≤ y ≤ θ else 0
 we define L(θ) to be the product of f(yi) at each sample point, so L(θ) = 1/θ^n, if all yi [0, θ]
We can think of the likelihood function simply as the probability that a given sample happens, plotted as a function of some unknown parameter, θ which the distribution function depends on. It tells us how likely a sample is for a given distribution.
Invariance Property Theorem: If θhat is the MLE for θ, then for nonextreme values, g(θhat) is the MLE for g(θ) for any continuous function g
Interval Estimation (What values of θ are reasonable?)
Same problem setup as finding the MLE, given a sample from a large population, find an interval [l, u] which contains θ with "high probability"
For the MLE, find one # which represents our best guess θhat. We can create an interval near θhat of values that we deem plausible in some sense.
For interval estimation, we can use relative likelihood function R(θ) or Likelihood interval through sampling distribution
 def: R(θ) = L(θ)/L(θhat)
 def: Log relative likelihood function γ(θ) = log R(θ) = log L(θ)  log L(θhat)
 def: 100•p% likelihood interval for θ = {θ : R(θ) ≥ p }, p on interval (0,1)
 the higher the value, the narrower the interval because 100% likelihood is essentially a point
 by our convention, θ in 50% = very plausible. &theta in 10%  50% = plausible, &theta in 1%  10% = implausible, θ not in 1% = very implausible
By extension, we can use a continuous region to find θ for a specified percent certainty (since the likelihood function is a probability density function)
Estimates can be thought of as an outcomes of some r.v. that needs to be identified to construct intervals. We can use a sample {y1...yn} to estimate the interval [L,U] s.t. the interval [L,U] contains θ with high probability
 [L,U] with random variables is the coverage interval
 [l,u] as functions of the sample is the confidence interval (by subbing the mean in place of r.v. in the coverage interval). This gives us an interval for which θ has a 95% likelihood of being in (or whatever % you want to calculate)
To calculate the confidence interval,
 find the pivotal quantity, with the pivotal distribution.
 Use the pivotal distribution to find end points of the interval.
 construct the coverage interval
 use the data to construct the confidence interval
ChiSquared Distribution
A random variable W follows ChiSquared Distribution with degrees of freedom k
where k
is a positive integer W ~ Χ_{k}^2 if W = z_{1}^{2} + ... + z_{k}^{2} where z_{i} ~ N(0,1) for independent r.v.
 if W ~X_{k}^{2}, it takes values between 0 and &infinity;
 E(W) = k; V(W) = 2k
 For k > 2, ChiSquared distribution is not symmetric but right skewed
 if W_{1} ~ X_{k1}^2 and W_{2} ~ X_{k2}^2 and w1 and w2 are independent, then w1 + w2 ~ X_{k1 + k2}^2
Student's TDistribution
Can be used to estimate the mean of a normal distribution where the standard deviation is unknown.
Def: Let T be a r.v. T is said to follow a student's Tdistribution with k degrees of freedom, T ~ T(k) if T is a ratio of two indep r.v.s T = Z/W
 Tdistribution looks like Normal but with more extreme obs
Prediction Intervals (Constructing Confidence Interval for next element)
Also called timeseries analysis/forecasting. n historical observations Y1, ..., Yn from a Normal population with mean &mew; and s.d. σ. We want to find an interval [a,b] based on data sample containing Y_{n+1} with a high degree of confidence. Predict the next element
 ex: predict the quality of future job candidates
 we can only do this if yi is independent random variable, only works for complete independence, not for stocks. Forcasting often requires model fitting.
Model: Y1,...Yn ~ N(μ, σ^2). Y_{n+1} = ?
 Yn+1 ~ N(μ, σ^2)
 Ynmean ~ N(μ, σ^2/n) from properties of normal
 recall two normal dist can be summed and multiplied
 anytime you replace σ by its estimator s, replace z distribution with T.
Hypothesis Testing (Is θ reasonable?)
Def: A statement made about attributes of a population. Null Hypothesis is current belief H_{o}. Alternate hypothesis for measuring evidence for/against H_{o}
 Setup: Ho = θ = θo; twosided alternate hypothesis H1 ≠
 objective: draw sample from population and calculate if H0 or H1 is "reasonable"
Hypothesis Testing Steps:
 construct the discrepancy measure D. D is a random variable that measures how much the data disagrees with H_{o}
 D ≥ 0, where D = 0 is the best evidence to support the null hypothesis H_{o}
 D's distribution is known
 calculate the value d (outcome from our sample)
 The pvalue: P(D ≥ d); pvalue is determining how unusual the sample is, assuming the null hypothesis is true. If pvalue is 1%, then only 1% of samples have this outcome, or more likely, the null hypothesis is wrong.
Suppose we want to deterimine if a coin is fair. Ho: θ = 0.5, H1: θ ≠ 0.5. Then D = Y  50 if Y is the number of successes out of 100 trials. d, for some sample, is computed and then we solve for pvalue P(D ≥ d).
We are heavily in favour of the null hypothesis. So our conventions state:
 pvalue ≥ 0.1 => no evidence against null hypothesis
 5% ≤ pvalue ≤ 10% => weak evidence against H0
 1% ≤ pvalue < 5% => evidence against H0
 pvalue < 1% => strong evidence against H0
Type I error: Convicting innocient, Type II: Acquilting guilty people
Hypothesis Testing for different distributions
Concentrate on two sided test
yi ~ f(yi, θ), i = 1...n, yi's independent r.v, θ is unknown parameter, {y1...yn} is sample
We want to determine whether H0: θ = θ0 or H1: θ ≠ θ0
For normal problem, recall Ybar  μ/s/n^(1/2) ~ Tn1
 construct the test statistic, D = Ybar  (48)/s/n^(1/2), so D ~ Tn1 by the properties of the test statistic?
For Binomial Problem,
 D = θ~  θ0/(θ(1θ0)/n)^(1/2)
For Poisson,
 D = Ybar  μ0/(μ0(1θ0)/n)^(1/2) (> Z)
Relationship between CI and Hypothesis testing: If θ_{o} belongs to the 100q% C.I, the pvalue of the test (H0: θ = θ0, H1: ≠) will be greater than 1  q
For large samples, we can construct tests for parameters using the Likelihood Ratio Test
 Λ(θ) ~ X_{1}^{2}, Λ is a special case of D for hypothesis testing. It applies to all distributions and can be thought of as general case
 Calculate λ(θ0) = 2log(L(θ0)/L(θhat)
 Calculate pvalue P(A ≥ λ) = P(X1^2 ≥ λ) = P(Z^2 ≥ λ)
Equality of two means:
matched pair > look at differences, test to see if the difference is 0
 unmatched > Y and X, check H0: β = 0
Simple Linear Regression Model
We make some assumptions:
 Given our x, Y's are normally distributed
 E(Yi) = α + β•Xi (the average of the Yi's are a linear function of the X's)
 V(Yi) = σ^2 is independent of the value of X (doesn't change with respect to X)
Regression is represented with a deterministic part and random part.
 Yi ~ N(α + β • xi, σ^2) then
 Yi = α +β•Xi + Ri, R ~ N(0, σ^2)
 and so Yi = μ + Ri
 df = n  # of unknowns in the deterministic part of the model
 now given a data set of tupled (x,y)s,
 we know the distribution for Yi is normal, so find the MLEs using multivariate calc on Yi (these final equations are on the formula sheet)
X = given constants, X is the regressor which helps explain Y
 Y = r.v., mean depends on x in a linear function E(Y) = α + βx, unknown constants
 Y's are all normally distributed (Gaussian REgression Model)
 Var(Yi) = σ^2, independent of X > Homoscedasticity
The population means are a linear function of X
Yi ~ N(α + βxi, σ^2), i = 1...n and Yi's independent
Method of Least Squares
Method of fitting an estimated regression line to the data
 a residual is essentially an error in the fit of the model ybar = b0 + b1•x
 ei = yi ybar is the residual for i = 1...n
 the least squares method minimizes the sum of the squares of the residuals
We get some equations to help us find αhat, βhat, σhat and s.
Sxx is the sum of the residuals for x^2, Syy for y^2 and Sxy for x•y
we get s = Residual standard, the error of the regression model, measuring the amount of variability of Y that cannot be explained by X.
No we can do hypothesis testing for α and β. Take β for example, if we want to HT for β we're testing for existance of a linear relationship between Y and X. H0: β = β0
Residuals r_{x}hat = Residual = yi  αhat  βhat•xi
 if our model is correct, r_{i}hat's act like N(0, σ^2)
Goodness of fit tests:
Given X1...Xn, check if Xi ~ f(ni;θ)
 assume distribution, calculate λ
 divide data into groups, find expected frequency of groups given observed frequencies
Test variable independence, construct observed and expected tables
S NS S NS
L 40 60 100
R 50 50 100
90 110 200
Two Population Problem
Two different populations, check if their means are equal
 we have matched with tuples for samples, and unmatched for two sets
 for matched, H0: μ1 = μ2, and yi = bi  ai
Regression
Y = α + βx + R, R ~ N(0, σ^2)
 αhat, βhat, σhat, Se
 see picture for solving these
you can apply the regression method for unmatched by setting one outcome to 1 and other to 0
QUIZ
(n1)S^2/σ^2 ~ X^2_{n1}
use pivotal quantity for normal, isolate ybar, sigma is known