Kernel PCA

Linear separation of nonlinear data via transforming the space to be linearly separable via kernels.

\[\phi : \mathbb{R}^2 \xrightarrow{} \mathbb{R}^3\]

[1]:

from utils import disp
disp('kernelpca.drawio.png')

\[\begin{split}K = \begin{bmatrix} K(x_1, x_1), k(x_1, x_2), ..., k(x_1, x_n)\\ K(x_2, x_1), k(x_2, x_2), ..., k(x_2, x_n)\\ \vdots\\ K(x_m, x_1), k(x_m, x_2), ..., k(x_m, x_n)\\ \end{bmatrix}\end{split}\]

where a kernel could be (for instance) \(K(x_i, x_j) = \frac{1}{2} \exp ( -h ||x_i - x_j||^2 )\) where \(h\) is a scaling factor, and the samples \(x_i, x_j \in D\) are in the data. A kernel is a pairwise distance function. Therefore it is symmetric and invertable (so we can map between our space and back to original with no loss).

The kernel function can be broken down into a matrix multiplication \(K = \phi(x) \phi(x)^T\) where (as stated above) \(\phi\) is the model.

Note: \(K\) must be centered.

\[\tilde{\phi}(x_i) = \phi(x_i) - \frac{1}{d} \sum_{k=d}^l (x_l)\]

\[\tilde{K}(x_j) = \tilde{\phi}(x_i)^T \tilde{\phi}(x_i)\]

\[= \phi(x_i) - \frac{1}{d} \sum_{k=1}^d \phi(x_k)^T \phi(x_j) - \frac{1}{d} \sum_{i=1}^d \phi(x_k)\]

\[= K(x_i, x_j) - \frac{1}{d} \sum_{i=1}^d k(x_i, x_j) - \frac{1}{d} \sum_{k=1}^d K(x_i, x_k) + \frac{1}{d^2} \sum_{k=1}^d K(x_d, x_k)\]

Algorithm

Pick kernel \(K(x_i, x_j)\)
Calculate kernel matrix \(K\)
Center \(K\)
Find eigenvectors \(Kv = \lambda v\)
Project data onto new dimensions \(z_{new} = \phi^T K(x^.)\)

\(K(x^.)\) project each data point into original space to new data point in kernel space.