# Dimensionality Reduction

Back to Data-Science

MNIST is a computer vision dataset, classic example is the set of handwritten digits

• digit can be represented as a point in 784-dimensional space
• MNIST digits intuitively live in a lower dimensional subspace than 784
• goal of dimensionality reduction is to think of other ways to encode information in lower dimensions, losing only irrelevent information -- source

### Principal Axis Theorem

Motivation:

• x^2/9 + y^2/25 = 1 defines an elipse
• x^2/9 - y^2/25 = 1 is a hyperbola
• with cross terms, 5x^2 + 8xy + 5y^2 = 1, it is not obvious whether it's elipse or hyperbola
• through completing the square, a special case of matrix diagonalization

### PCA, principal component analysis

applying orthogonal tranformations to convert a set of correlated variables into a set of linearly uncorrelated variables, called principle components

• orthogonal transformations preserve euclidean distance between points, so in 2&3-dimensional euclidean space, rotations, reflections or improper rotations (combination)
• number of principle components is less than or equal to number of original variables

Covariance is a measure of joint variability of two random variables

• covariance is positive when there is a positive correlation between variables
• variance is the average squared deviation of one variable
• covariance is the average product of deviations in two variables
• covXY = σxy = E[(X - μx)(Y - μy)]
• covariance is in units obtained by multiplying the units of X and Y
• correlation is normalized, dimensionless version of covariance
• covariance matrix is holds covariance for every combination of variables
• joint probability distribution describes a composition of multiple random variables, bivariate or multivariate distributions