项目作者: kevinzakka

项目描述 :
A PyTorch implementation of Neighbourhood Components Analysis.
高级语言: Python
项目地址: git://github.com/kevinzakka/torchnca.git
创建时间: 2020-01-25T21:43:11Z
项目社区:https://github.com/kevinzakka/torchnca

开源协议:MIT License

下载


torchnca

A PyTorch implementation of Neighbourhood Components Analysis by J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov. NCA is a metric learning technique that learns a linear transformation of the dataset such that the expected leave-one-out performance of kNN in the transformed space is maximized.

For a more detailed explanation of NCA, check out the accompanying blog post.

Installation

You can install torchnca with pip:

  1. pip install torchnca

API

  1. from torchnca import NCA
  2. # instantiate torchnca object and initialize with
  3. # an identity matrix
  4. nca = NCA(dim=2, init="identity")
  5. # fit an torchnca model to a dataset
  6. # normalize the input data before
  7. # running the optimization
  8. nca.train(X, y, batch_size=64, normalize=True)
  9. # apply the learned linear map to the data
  10. X_nca = nca(X)

Dimensionality Reduction

We generate a 3-D dataset where the first 2 dimensions are concentric rings and the third dimension is Gaussian noise. We plot the result of PCA, LDA and NCA with 2 components.



Notice how PCA has failed to project out the noise, a result of a high noise variance in the third dimension. LDA also struggles to recover the concentric pattern since the classes themselves are not linearly separable.

kNN on MNIST

We compute the classification error, computation time and storage cost of two algorithms:

  • kNN (k = 5) on the raw 784 dimensional MNIST dataset
  • kNN (k = 5) on a learned 32 dimensional NCA projection of the MNIST dataset
Method NCA + kNN Raw kNN
Time 2.37s 155.25s
Storage 6.40 Mb 156.8 Mb
Error 3.27% 2.82%