项目作者: csirmaz

项目描述 :
Train neural nets using smart random walk without backpropagation
高级语言: Python
项目地址: git://github.com/csirmaz/nn_rnd_training.git
创建时间: 2019-04-08T20:16:38Z
项目社区:https://github.com/csirmaz/nn_rnd_training

开源协议:MIT License

下载


nn_rnd_training

This module trains a completely free neural network (a single repeated fully connected layer)
using a smart random walk of the weight space to avoid the vanishing/exploding gradients problem.
This is largely equivalent to using a genetic algorithm (evolution / mutations) to optimise the weights (except gene exchange).

It uses TensorFlow.

A completely free network has a number of neurons with arbitrary
connections. This is modelled using a weight matrix that maps an
input+bias+activations vector to an activations vector,
and is applied a set number of times before the output is read.
This can model recursion as well as multiple levels of a feedforward network.
The output is a (prefix) slice of the activations; that is, the activations vector
is separated into output+hidden states.

  1. From:
  2. bias+input output+hidden
  3. To: +-----------+--------------+
  4. out | | |
  5. hid | | |
  6. +-----------+--------------+

While a completely free neural model
(i.e. single repeated interconnected layer) could be trained with backprop,
that would still suffer from vanishing/exploding gradients
due to the weight matrix being applied repeatedly.
This is true even if nonlinearities with derivatives close to 1 are used, like ReLUs:
see e.g. Rohan Saxena’s answer at https://www.quora.com/What-is-the-vanishing-gradient-problem .

To avoid this, we use a “smart” random walk to approximate the gradient in the weight space.
We run the graph for each datapoint in the batch,
and then run it with slightly adjusted weights on the same datapoint.
This adjustment is different for each datapoint in the batch.
Then we calculate the change in the loss,
and multiply the negative change with the adjustment
(if the change is good, we want to go in that direction,
if it’s bad, in the opposite direction),
then update the weights using the averages of these multiplied adjustments across the batch.

Usage

  1. import nn_rnd_training
  2. import numpy as np
  3. from scipy.special import expit # sigmoid
  4. config = {
  5. 'num_input': 3, # number of input datapoints
  6. 'sequence_input': True, # whether the input data contains data points for each timestep
  7. 'net_input_add_onehot': False, # whether to add a one_hot encoding layer to the input. Only if sequence_input is true
  8. 'num_hidden': 30, # number of hidden neurons, excluding output neurons
  9. 'num_output': 2, # number of output neurons
  10. 'sequence_output': False, # whether the output should contain the output at each timestep
  11. 'timesteps': 4, # number of iterations to do before reading the output
  12. 'net_add_softmax': False, # whether to add a softmax layer at the end
  13. 'loss': # the loss function
  14. 'mean_squared_error',
  15. 'test_stddev': 0.01, # stddev for weight adjustments for the random step
  16. 'batch_size': 1000, # this many points are tested around the current point in the weight space
  17. 'epoch_steps': 400, # steps in an epoch
  18. 'epochs': 10000, # number of epochs
  19. 'debug': False # whether to print tensors during runs
  20. }
  21. # Data generator
  22. # Generate random training data
  23. class DataIterator:
  24. """Data generator"""
  25. def __init__(self):
  26. pass
  27. def __next__(self):
  28. data = np.random.random((config['batch_size'], config['timesteps'], config['num_input'])) * 2. - 1.
  29. data[:,1,:] = data[:,0,:] + 1
  30. data[:,2,:] = data[:,0,:] + 2
  31. data[:,3,:] = data[:,0,:] + 3
  32. target = np.ones((config['batch_size'], config['num_output']))
  33. target[:,0] = data[:,0,0] + data[:,0,1]
  34. target[:,1] = data[:,0,0] + data[:,0,2]
  35. target = expit(target)
  36. return (data, target)
  37. nn_rnd_training.NNRndTraining(config).train(DataIterator())

Todo