Foundations of Data Science

Back to Data-Science

Background

2: High-Dimensional Space

Starting off with some simple high-dimensional intuition:

Properties of the Unit Ball

for a unit ball of d-dimensions, volume -> 0 as d -> infinity.

Random Projection and Johnson Lindenstrauss Lemma

The nearest Neighbour Problem is an example of a problem that benefits from dimension reduction with projection f: R^d -> R^k k << d

Eigendecomposition of a Matrix

Recall an Eigenvector of a linear mapping is a non-zero vector that does not change direction after the mapping is applied

3: Best Fit Subspaces & SVD

Singular Value Decomposition of a matrix is finding the best-fitting k-dim subspace (k is a natural number)

The best fitting subspace algorithm:

  1. start with special case, 1-dim line of best fit; line through the origin
  2. perform k applications of the best fitting line algorithm, where in the ith iteration, we find the best fitting line perpendicular to each of the i-1 lines