项目作者: MichiganCOG

项目描述 :
一站式商店,满足您的所有活动识别需求。
高级语言: Python
项目地址: git://github.com/MichiganCOG/M-PACT.git
创建时间: 2018-04-16T21:28:10Z
项目社区:https://github.com/MichiganCOG/M-PACT

开源协议:MIT License

下载


M-PACT: Michigan Platform for Activity Classification in Tensorflow

This python framework provides modular access to common activity recognition models for the use of baseline comparisons between the current state of the art and custom models.

This README will walk you through the process of installing dependencies, downloading and formatting datasets, testing the framework, and expanding the framework to train your own models.

This repository holds the code and models for the paper

M-PACT: Michigan Platform for Activity Classification in Tensorflow, Eric Hofesmann, Madan Ravi Ganesh, and Jason J. Corso, arXiv, April 2018.

ATTENTION: Please cite the arXiv paper introducing this platform when releasing any work that used this code.

Link: https://arxiv.org/abs/1804.05879

Implemented Model’s Classification Accuracy:

Model Architecture Dataset (Split 1) M-PACT Accuracy (%) Original Authors Accuracy (%)
I3D HMDB51 68.10 74.80*
C3D HMDB51 51.90 50.30*
TSN HMDB51 51.70 54.40
ResNet50 + LSTM HMDB51 43.86 43.90
I3D UCF101 92.55 95.60*
C3D UCF101 93.66 82.30*
TSN UCF101 85.25 85.50
ResNet50 + LSTM UCF101 80.20 84.30

(*) Indicates that results are shown across all three splits

Table of Contents

Introduction and Setup

Common Datasets:

  • HMDB51
  • UCF101
  • Kinetics
  • Moments in Time

Requirements

Hardware and Software:

  • Nvidia Graphics Card
  • Ubuntu 16.04
  • Python 2.7
  • Cuda
  • Cudnn
  • Gflags

Python Dependencies (All can be installed using pip):

Configuring Datasets

In order to use this framework, the datasets will need to be downloaded and formatted correctly. Datasets are not included and must be downloaded and converted to TFRecords format. Converting dataset videos into TFRecords binary files allows for optimized tensorflow data loading and processing.

Methods to import and configure datasets correctly can be found in the section Adding a Dataset.

Using the framework

From the root directory, the training and testing is done through train.py and test.py.
Implemented models can be used if the weights have been acquired.
Download weights and mean files by running the script sh scripts/shell/download_weights.sh.

NOTE: The download links may not work for users in China. Alternative downloads can be found here: http://academictorrents.com/details/dcea7fa53925b31215bd8437d2f0805253d6b00f
and https://app.nihaocloud.com/d/fb0c387c9a644f86b257/

Ex. Train ResNet50+LSTM on HMDB51 using 4 GPUs

  1. python train.py --model resnet --dataset HMDB51 --numGpus 4 --load 0 --size 224 --inputDims 50 --outputDims 51 --seqLength 50 --dropoutRate 0.5 --expName example_1 --numVids 3570 --lr 0.01 --nEpochs 30 --baseDataPath /data --fName trainlist --optChoice adam

The parameters to train are:

  1. python train.py \
  2. --model The model archetecture to be used (i3d, c3d, tsn, resnet) **REQUIRED**
  3. --dataset The dataset to use for training (UCF101, HMDB51) **REQUIRED**
  4. --size Size of the input frame into network, sets both height and width (224 for ResNet, I3D, TSN and 112 for C3D) **REQUIRED**
  5. --inputDims Input dimensions (number of frames to pass into model) **REQUIRED**
  6. --outputDims Output dimensions (number of classes in dataset) **REQUIRED**
  7. --seqLength Sequence length when output from model (50 for ResNet50, 250 for TSN, 1 for I3D and C3D) **REQUIRED**
  8. --expName Experiment name **REQUIRED**
  9. --baseDataPath The path to where all datasets are stored (Ex. For HMDB51, this directory should then contain tfrecords_HMDB51/Split1/trainlist/exampleVidName.tfrecords) **REQUIRED**
  10. --fName Which dataset list to use (trainlist, testlist, vallist) **REQUIRED**
  11. --numGpus Number of GPUs to train on over a single node (default 1)
  12. --train 1 or 0 whether to set up model in testing or training format (default 1)
  13. --load 1 or 0 whether to use the current trained checkpoints with the same experiment_name or to train from random initialized weights
  14. --modelAlpha Optional rsampling factor constant value resampling or initializing other resampling strategies maininly during training.
  15. --inputAlpha Resampling factor for constant value resampling of input video, used mainly for testing models.
  16. --dropoutRate Value indicating proability of keeping inputs of the model's dropout layers. (defaulat 0.5)
  17. --freeze Freeze weights during training of any layers within the model that have the option manually set. (default 0)
  18. --numVids Number of videos to train on within the specified split
  19. --lr Initial learning rate (Default 0.001)
  20. --wd Weight decay value for training layers (Defaults 0.0)
  21. --lossType String defining loss type associated with chosen model (multiple losses are optionally defined in model)
  22. --nEpochs Number of epochs to train over (default 1)
  23. --split Dataset split to use (deafult 1)
  24. --saveFreq Frequency in epochs to save model checkpoints (default 1 aka every epoch)
  25. --returnLayer Which model layers to be returned by the model's inference during testing. ('logits' during training)
  26. --optChoice String indication optimizer choice (Default sgd)
  27. --gradClipValue Value of normalized gradient at which to clip (Default 5.0)
  28. --clipLength Length of clips to cut video into (default -1 indicates using the entire video as one clip)
  29. --videoOffset (none or random) indicating where to begin selecting video clips assuming clipOffset is none
  30. --clipOffset (none or random) indicating if clips are selected sequentially or randomly
  31. --numClips Number of clips to break video into, -1 indicates breaking the video into the maximum number of clips based on clipLength, clipOverlap, and clipOffset
  32. --clipStride Number of frames that overlap between clips, 0 indicates no overlap and -1 indicates clips are randomly selected and not sequential
  33. --batchSize Number of clips to load into the model each step (default 1)
  34. --metricsDir Name of sub directory within experiment to store metrics. Unique directory names allow for parallel testing.
  35. --metricsMethod Which method to use to calculate accuracy metrics. During training only used to set up correct file structure. (avg_pooling, last_frame, svm, svm_train or extract_features)
  36. --preprocMethod Which preprocessing method to use, allows for the use of multiple preprocessing files per model architecture
  37. --randomInit Randomly initialize model weights, not loading from any files (deafult False)
  38. --shuffleSeed Seed integer for random shuffle of files in load_dataset function
  39. --preprocDebugging Boolean indicating whether to load videos and clips in a queue or to load them directly for debugging. Errors in preprocessing setup will not show up properly otherwise (Default 0)
  40. --loadedCheckpoint Specify the step of the saved model checkpoint that will be loaded for testing. Defaults to most recently saved checkpoint.
  41. --gpuList List of GPU IDs to be used
  42. --gradClipValue Value of normalized gradient at which to clip.
  43. --lrboundary List of boundary epochs at which lr will be updated
  44. --lrvalues List of lr multiplier values, length of list must equal lrboundary
  45. --loadWeights String which can be used to specify the default weights to load.
  46. --verbose Boolean switch to display all print statements or not

The parameters to test are:

  1. python test.py \
  2. --model The model archetecture to be used (i3d, c3d, tsn, resnet) **REQUIRED**
  3. --dataset The dataset to use (UCF101, HMDB51) **REQUIRED**
  4. --size Size of the input frame into network, sets both height and width (224 for ResNet, I3D, TSN and 112 for C3D) **REQUIRED**
  5. --inputDims Input dimensions (number of frames to pass into model) **REQUIRED**
  6. --outputDims Output dimensions(number of classes in dataset) **REQUIRED**
  7. --seqLength Sequence length when output from model (50 for ResNet50, 250 for TSN, 1 for I3D and C3D) **REQUIRED**
  8. --expName Experiment name **REQUIRED**
  9. --numVids Number of videos to test on within the split **REQUIRED**
  10. --fName Which dataset list to use (trainlist, testlist, vallist) **REQUIRED**
  11. --loadedDataset Dataset that the model was trained on. This is to be used when testing a model on a different dataset than it was trained on. **REQUIRED**
  12. --train 0 or 1 whether to set up model in testing or training format (default 0)
  13. --load 1 or 0 whether to use the current trained checkpoints with the same experiment_name or to test from default weights (default 1)
  14. --modelAlpha Resampling factor constant value resampling or initializing other resampling strategies maininly during training, optional.
  15. --inputAlpha Resampling factor for constant value resampling of input video, used mainly for testing models.
  16. --dropoutRate Value indicating proability of keeping inputs of the model's dropout layers. (defaulat 0.5)
  17. --freeze Freeze weights during training of any layers within the model that have the option manually set. (default 0)
  18. --split Dataset split to use (default 1)
  19. --baseDataPath The path to where all datasets are stored (Ex. For HMDB51, this directory should then contain tfrecords_HMDB51/Split1/testlist/exampleVidName.tfrecords)
  20. --returnLayer String indicating which layer to apply 'metricsMethod' on (default ['logits'])
  21. --gpuList List of GPU device ids to be used, must be <= 1 for testing.
  22. --clipLength Length of clips to cut video into, -1 indicates using the entire video as one clip
  23. --videoOffset (none or random) indicating where to begin selecting video clips assuming clipOffset is none
  24. --clipOffset (none or random) indicating if clips are selected sequentially or randomly
  25. --numClips Number of clips to break video into, -1 indicates breaking the video into the maximum number of clips based on clipLength, clipOverlap, and clipOffset
  26. --clipStride Number of frames that overlap between clips, 0 indicates no overlap and -1 indicates clips are randomly selected and not sequential
  27. --metricsMethod Which method to use to calculate accuracy metrics. (avg_pooling, last_frame, svm, svm_train or extract_features)
  28. --preprocMethod Which preprocessing method to use, allows for the use of multiple preprocessing files per model architecture
  29. --batchSize Number of clips to load into the model each step (default 1)
  30. --metricsDir Name of sub directory within experiment to store metrics. Unique directory names allow for parallel testing.
  31. --loadedCheckpoint Specify the step of the saved model checkpoint that will be loaded for testing. (Defaults to most recent checkpoint)
  32. --randomInit Randomly initialize model weights, not loading from any files (Default 0)
  33. --avgClips Boolean indicating whether to average predictions across clips (Default 0)
  34. --useSoftmax Boolean indicating whether to apply softmax to the inference of the model (Default 1)
  35. --preprocDebugging Boolean indicating whether to load videos and clips in a queue or to load them directly for debugging. Errors in preprocessing setup will not show up properly otherwise (Default 0)
  36. --loadWeights String which can be used to specify the default weights to load.
  37. --verbose Boolean switch to display all print statements or not

Ex. Test C3D on UCF101 split 1

  1. python test.py --model c3d --dataset UCF101 --loadedDataset UCF101 --load 1 --inputDims 16 --outputDims 101 --seqLength 1 --size 112 --expName example_2 --numClips 1 --clipLength 16 --clipOffset random --numVids 3783 --split 1 --baseDataPath /data --fName testlist

Framework File Structure

  1. /tf-activity-recognition-framework
  2. train.py
  3. test.py
  4. create_model.py
  5. load_a_video.py
  6. /models
  7. /model_name
  8. modelname_model.py
  9. default_preprocessing.py
  10. model_weights.npy shortcut to ../weights/model_weights.npy (Optional)
  11. /weights
  12. model_weights.npy
  13. /results
  14. /model_name
  15. /dataset_name
  16. /preprocessing_method
  17. /experiment_name
  18. /checkpoints
  19. checkpoint
  20. checkpoint-100.npy
  21. checkpoint-100.dat
  22. /metrics_method
  23. testing_results.npy
  24. /logs
  25. /model_name
  26. /dataset_name
  27. /preprocessing_method
  28. /metrics_method
  29. /experiment_name
  30. tensorboard_log
  31. /scripts
  32. /shell
  33. download_weights.sh
  34. /utils
  35. generate_tfrecords_dataset.py
  36. convert_checkpoint.py
  37. checkpoint_utils.py
  38. layers_utils.py
  39. metrics_utils.py
  40. preprocessing_utils.py
  41. sys_utils.py
  42. logger.py

train.py - Train a model

test.py - Test a model

create_model.py - Create model and preprocessing files for your custom model, include function that need to be filled in that can be found at Adding a Model

load_a_video.py - Load a video using the M-PACT input pipeline to ensure proper conversion of a dataset.

models - Includes the model class and video preprocessing required for that model

results - Saved model weights at specified checkpoints

logs - Tensorboard logs

scripts - Scripts to set up the platform. Ex: downloading necessary weights

utils - Python programs containing functions commonly used across other modules in this platform

Examples of Common Uses

Testing using existing models

Training models from scratch

Add Custom Components

Adding a model

Step 1: Create Model Directory Structure

Run the python prgoram create_model.py:

  1. python create_model.py --modelName MyModel
Step 2: Add Model Functions

Navigate to the model file:

  1. /models/mymodel/mymodel_model.py

Required functions to fill in:

inference():

  1. def inference(self, inputs, is_training, input_dims, output_dims, seq_length, batch_size, scope, dropout_rate = 0.5, return_layer=['logits'], weight_decay=0.0):
  2. """
  3. Args:
  4. :inputs: Input to model of shape [BatchSize x Frames x Height x Width x Channels]
  5. :is_training: Boolean variable indicating phase (TRAIN OR TEST)
  6. :input_dims: Length of input sequence
  7. :output_dims: Integer indicating total number of classes in final prediction
  8. :seq_length: Length of output sequence from LSTM
  9. :scope: Scope name for current model instance
  10. :dropout_rate: Value indicating proability of keep inputs
  11. :return_layer: String matching name of a layer in current model
  12. :weight_decay: Double value of weight decay
  13. :batch_size: Number of videos or clips to process at a time
  14. Return:
  15. :layers[return_layer]: The requested layer's output tensor
  16. """
  17. ############################################################################
  18. # Add MODELNAME Network Layers HERE #
  19. ############################################################################
  20. if self.verbose:
  21. print('Generating MODELNAME network layers')
  22. # END IF
  23. with tf.name_scope(scope, 'MODELNAME', [inputs]):
  24. layers = {}
  25. ########################################################################################
  26. # TODO: Add any desired layers from layers_utils to this layers dictionary #
  27. # #
  28. # EX: layers['conv1'] = conv3d_layer(input_tensor=inputs, #
  29. # filter_dims=[dim1, dim2, dim3, dim4], #
  30. # name=NAME, #
  31. # weight_decay = wd) #
  32. ########################################################################################
  33. ########################################################################################
  34. # TODO: Final Layer must be 'logits' #
  35. # #
  36. # EX: layers['logits'] = [fully_connected_layer(input_tensor=layers['previous'], #
  37. # out_dim=output_dims, non_linear_fn=None, #
  38. # name='out', weight_decay=weight_decay)] #
  39. ########################################################################################
  40. layers['logits'] = # TODO Every model must return a layer named 'logits'
  41. layers['logits'] = tf.reshape(layers['logits'], [batch_size, seq_length, output_dims])
  42. # END WITH
  43. return [layers[x] for x in return_layer]

preprocess_tfrecords():

  1. def preprocess_tfrecords(self, input_data_tensor, frames, height, width, channel, input_dims, output_dims, seq_length, size, label, istraining, video_step):
  2. """
  3. Args:
  4. :input_data_tensor: Data loaded from tfrecords containing either video or clips
  5. :frames: Number of frames in loaded video or clip
  6. :height: Pixel height of loaded video or clip
  7. :width: Pixel width of loaded video or clip
  8. :channel: Number of channels in video or clip, usually 3 (RGB)
  9. :input_dims: Number of frames used in input
  10. :output_dims: Integer number of classes in current dataset
  11. :seq_length: Length of output sequence
  12. :size: List detailing values of height and width for final frames
  13. :label: Label for loaded data
  14. :is_training: Boolean value indication phase (TRAIN OR TEST)
  15. :video_step: Tensorflow variable indicating the total number of videos (not clips) that have been loaded
  16. """
  17. ####################################################
  18. # TODO: Add more preprcessing arguments if desired #
  19. ####################################################
  20. return preprocess(input_data_tensor, frames, height, width, channel, input_dims, output_dims, seq_length, size, label, istraining, video_step, self.input_alpha)

loss():

  1. """ Function to return loss calculated on given network """
  2. def loss(self, logits, labels, loss_type):
  3. """
  4. Args:
  5. :logits: Unscaled logits returned from final layer in model
  6. :labels: True labels corresponding to loaded data
  7. :loss_type: Allow for multiple losses that can be selected at run time. Implemented through if statements
  8. """
  9. ####################################################################################
  10. # TODO: ADD CUSTOM LOSS HERE, DEFAULT IS CROSS ENTROPY LOSS #
  11. # #
  12. # EX: labels = tf.cast(labels, tf.int64) #
  13. # cross_entropy_loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, #
  14. # logits=logits) #
  15. # return cross_entropy_loss #
  16. ####################################################################################

(Optional) load_default_weights():

  1. def load_default_weights(self):
  2. """
  3. return: Numpy dictionary containing the names and values of the weight tensors used to initialize this model
  4. """
  5. ############################################################################
  6. # TODO: Add default model weights to models/weights/ and import them here #
  7. # ( OPTIONAL ) #
  8. # #
  9. # EX: return np.load('models/weights/model_weights.npy') #
  10. # #
  11. ############################################################################

Step 3: Add Model Preprocessing Steps

Navigate to the preprocessing file:

  1. /models/mymodel/default_preprocessing.py

Required functions to fill in:

preprocess():

  1. def preprocess(input_data_tensor, frames, height, width, channel, input_dims, output_dims, seq_length, size, label, istraining, video_step, input_alpha=1.0):
  2. """
  3. Preprocessing function corresponding to the chosen model
  4. Args:
  5. :input_data_tensor: Raw input data
  6. :frames: Total number of frames
  7. :height: Height of frame
  8. :width: Width of frame
  9. :channel: Total number of color channels
  10. :input_dims: Number of frames to be provided as input to model
  11. :output_dims: Total number of labels
  12. :seq_length: Number of frames expected as output of model
  13. :size: Output size of preprocessed frames
  14. :label: Label of current sample
  15. :istraining: Boolean indicating training or testing phase
  16. Return:
  17. Preprocessing input data and labels tensor
  18. """
  19. # Allow for resampling of input during testing for evaluation of the model's stability over video speeds
  20. input_data_tensor = tf.cast(input_data_tensor, tf.float32)
  21. input_data_tensor = resample_input(input_data_tensor, frames, frames, input_alpha)
  22. # Apply preprocessing related to individual frames (cropping, flipping, resize, etc.... )
  23. input_data_tensor = tf.map_fn(lambda img: preprocess_image(img, size[0], size[1], is_training=istraining, resize_side_min=size[0]), input_data_tensor)
  24. ##########################################################################################################################
  25. # #
  26. # TODO: Add any video related preprocessing (looping, resampling, etc.... Options found in utils/preprocessing_utils.py) #
  27. # #
  28. ##########################################################################################################################
  29. return input_data_tensor

preprocess_for_train():

  1. def preprocess_for_train(image, output_height, output_width, resize_side):
  2. """Preprocesses the given image for training.
  3. Args:
  4. image: A `Tensor` representing an image of arbitrary size.
  5. output_height: The height of the image after preprocessing.
  6. output_width: The width of the image after preprocessing.
  7. resize_side: The smallest side of the image for aspect-preserving resizing.
  8. Returns:
  9. A preprocessed image.
  10. """
  11. ############################################################################
  12. # TODO: Add preprocessing done during training phase #
  13. # Preprocessing option found in utils/preprocessing_utils.py #
  14. # #
  15. # EX: image = aspect_preserving_resize(image, resize_side) #
  16. # image = central_crop([image], output_height, output_width)[0] #
  17. # image.set_shape([output_height, output_width, 3]) #
  18. # image = tf.to_float(image) #
  19. # return image #
  20. ############################################################################

preprocess_for_eval():

  1. def preprocess_for_eval(image, output_height, output_width, resize_side):
  2. """Preprocesses the given image for evaluation.
  3. Args:
  4. image: A `Tensor` representing an image of arbitrary size.
  5. output_height: The height of the image after preprocessing.
  6. output_width: The width of the image after preprocessing.
  7. resize_side: The smallest side of the image for aspect-preserving resizing.
  8. Returns:
  9. A preprocessed image.
  10. """
  11. ############################################################################
  12. # TODO: Add preprocessing done during training phase #
  13. # Preprocessing option found in utils/preprocessing_utils.py #
  14. # #
  15. # EX: image = aspect_preserving_resize(image, resize_side) #
  16. # image = central_crop([image], output_height, output_width)[0] #
  17. # image.set_shape([output_height, output_width, 3]) #
  18. # image = tf.to_float(image) #
  19. # return image #
  20. ############################################################################

Adding a dataset

Adding a new dataset requires that the videos converted to tfrecords and stored in a specific format. A tfrecord is simply a method of storing a video and information about the video in a binary file that is easily imported into tensorflow graphs.

Each tfrecord contains a dictionary with the following information from the original video:

  • Label - Action class the video belongs to (type int64)
  • Data - RGB or optical flow values for the entire video (type bytes)
  • Frames - Total number of frames in the video (type int64)
  • Height - Frame height in pixels (type int64)
  • Width - Frame width in pixels (type int64)
  • Channels - Number of channels (3 for RGB) (type int64)
  • Name - Name of the video (type bytes)

We provide a script that converts a dataset to tfrecords using OpenCV, as long as the dataset is being stored using the correct file structure.

  1. /dataset
  2. /action_class
  3. /video1.avi

An important note is that the TFRecords for each dataset must be stored in a specific file structure, HMDB51 for example:

  1. /tfrecords_HMDB51
  2. /Split1
  3. /trainlist
  4. vidName1.tfrecords
  5. vidName2.tfrecords
  6. /testlist
  7. /vallist
  8. /Split2
  9. /Split3

This means that either before or after the videos are converted, they need to be arranged into this file structure!!!
A vallist is not required, just a trainlist and testlist stored inside the folder ‘Split1’.
Additionally, if only one split is desired, it still must be named ‘Split1’

You can also manually convert your dataset to tfrecords if need be.
The following code snipped is an example of how to convert a single video to tfrecords given the video data in the form of a numpy array.

  1. def save_tfrecords(data, label, vidname, save_dir):
  2. filename = os.path.join(save_dir, vidname+'.tfrecords')
  3. writer = tf.python_io.TFRecordWriter(filename)
  4. features = {}
  5. features['Label'] = _int64(label)
  6. features['Data'] = _bytes(np.array(data).tostring())
  7. features['Frames'] = _int64(data.shape[0])
  8. features['Height'] = _int64(data.shape[1])
  9. features['Width'] = _int64(data.shape[2])
  10. features['Channels'] = _int64(data.shape[3])
  11. features['Name'] = _bytes(str(vidname))
  12. example = tf.train.Example(features=tf.train.Features(feature=features))
  13. writer.write(example.SerializeToString())
  14. writer.close()
  15. def _int64(value):
  16. return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
  17. def _bytes(value):
  18. return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

A prerequisite to this is that the video must be passed in as a numpy or python array of floats/ints which can be done a number of ways. For example using OpenCV, matplotlib, or any other desired method.

Expected Results

Accuracies of Models

The install of this framework can be tested by comparing the output with these expected testing results of the various models trained on the datasets.

Model Architecture Dataset (Split 1) M-PACT Accuracy (%) Original Authors Accuracy (%)
I3D HMDB51 68.10 74.80*
C3D HMDB51 51.90 50.30*
TSN HMDB51 51.70 54.40
ResNet50 + LSTM HMDB51 43.86 43.90
I3D UCF101 92.55 95.60*
C3D UCF101 93.66 82.30*
TSN UCF101 85.25 85.50
ResNet50 + LSTM UCF101 80.20 84.30

(*) Indicates that results are shown across all three splits

Command to Execute Model Training and Testing

ResNet50 + LSTM Training (HMDB51)

  1. python train.py --model resnet --inputDims 50 --outputDims 51 --dataset HMDB51 --load 0 --fName trainlist --seqLength 50 --size 224 --baseDataPath /z/dat --train 1 --numGpus 4 --expName resnet_half_loss_HMDB51 --numVids 3570 --split 1 --wd 0.0 --lr 0.001 --nEpochs 30 --saveFreq 1 --dropoutRate 0.5 --freeze 1 --lossType half_loss

ResNet50 + LSTM Testing (HMDB51)

  1. python test.py --model resnet --dataset HMDB51 --loadedDataset HMDB51 --train 0 --load 1 --inputDims 50 --outputDims 51 --seqLength 50 --size 224 --expName resnet_half_loss_HMDB51 --numVids 1530 --split 1 --baseDataPath /z/dat --fName testlist --freeze 1

ResNet50 + LSTM Training (UCF101)

  1. python train.py --model resnet --inputDims 50 --outputDims 101 --dataset UCF101 --load 0 --fName trainlist --seqLength 50 --size 224 --baseDataPath /z/dat --train 1 --numGpus 4 --expName resnet_half_loss_UCF101 --numVids 9537 --split 1 --wd 0.0 --lr 0.001 --nEpochs 11 --saveFreq 1 --dropoutRate 0.5 --freeze 1 --lossType half_loss

ResNet50 + LSTM Testing (UCF101)

  1. python test.py --model resnet --dataset UCF101 --loadedDataset UCF101 --train 0 --load 1 --inputDims 50 --outputDims 101 --seqLength 50 --size 224 --expName resnet_half_loss_UCF101 --numVids 3783 --split 1 --baseDataPath /z/dat --metricsMethod last_frame --fName testlist --freeze 1

I3D Training (HMDB51)

  1. python train.py --model i3d --inputDims 64 --outputDims 51 --dataset HMDB51 --load 0 --expName i3d_HMDB51 --numVids 3570 --fName trainlist --seqLength 1 --size 224 --numGpus 4 --train 1 --split 1 --wd 0.0 --lr 0.01 --nEpochs 30 --baseDataPath /z/dat --saveFreq 1 --dropoutRate 0.5 --gradClipValue 100.0 --optChoice adam --batchSize 16

I3D Testing (HMDB51)

  1. python test.py --model i3d --numGpus 1 --dataset HMDB51 --loadedDataset HMDB51 --train 0 --load 1 --inputDims 250 --outputDims 51 --seqLength 1 --size 224 --expName i3d_0_5_crop_0_5_HMDB51 --numVids 1530 --split 1 --baseDataPath /z/dat --fName testlist --verbose 1 --loadedCheckpoint 837 --metricsDir checkpoint_837
  • Currently best performing checkpoint - 837

I3D Training (UCF101)

  1. python train.py --model i3d --inputDims 64 --outputDims 101 --dataset UCF101 --load 0 --expName i3d_UCF101 --numVids 9537 --fName trainlist --seqLength 1 --size 224 --numGpus 4 --train 1 --split 1 --wd 0.0 --lr 0.01 --nEpochs 11 --baseDataPath /z/dat --saveFreq 1 --dropoutRate 0.5 --gradClipValue 100.0 --optChoice adam --batchSize 10

I3D Testing (UCF101)

  1. python test.py --model i3d --numGpus 1 --dataset UCF101 --loadedDataset UCF101 --train 0 --load 1 --inputDims 250 --outputDims 101 --seqLength 1 --size 224 --expName i3d_UCF101 --numVids 3783 --split 1 --baseDataPath /z/dat --fName testlist --verbose 1 --loadedCheckpoint 2146 --metricsDir checkpoint_2146
  • Currently best performing checkpoint - 2146

C3D Training (HMDB51)

  1. python train.py --model c3d --numGpus 4 --dataset HMDB51 --load 0 --inputDims 16 --outputDims 51 --seqLength 1 --size 112 --expName c3d_HMDB51 --numClips 5 --clipLength 16 --clipOffset random --numVids 3570 --split 1 --wd 0.0005 --lr 0.0001 --nEpochs 41 --baseDataPath /z/dat --fName trainlist --saveFreq 1 --verbose 1 --batchSize 10

C3D Testing (HMDB51)

  1. python test.py --model c3d --dataset HMDB51 --loadedDataset HMDB51 --load 1 --inputDims 16 --outputDims 51 --seqLength 1 --size 112 --expName c3d_HMDB51 --numClips 1 --clipLength 16 --clipOffset random --numVids 1530 --split 1 --baseDataPath /z/dat --fName testlist --verbose 1

C3D Training (UCF101)

(NOTE: Results not shown)

  1. python train.py --model c3d --numGpus 4 --dataset UCF101 --load 0 --inputDims 16 --outputDims 101 --seqLength 1 --size 112 --expName c3d_sports1m_UCF101 --numClips 5 --clipLength 16 --clipOffset random --numVids 9537 --split 1 --wd 0.0005 --lr 0.0001 --nEpochs 10 --baseDataPath /z/dat --fName trainlist --saveFreq 1 --verbose 1 --batchSize 10

C3D Testing (UCF101)

(NOTE: Results not shown)

  1. python test.py --model c3d --dataset UCF101 --loadedDataset UCF101 --load 1 --inputDims 16 --outputDims 101 --seqLength 1 --size 112 --expName c3d_sports1m_UCF101 --numClips 1 --clipLength 16 --clipOffset random --numVids 3783 --split 1 --baseDataPath /z/dat --fName testlist --verbose 1

C3D Fine-tuning (UCF101)

(NOTE: Best results are shown, 93.55% when fine-tuning on the model “C3D UCF101 split1” at https://github.com/hx173149/C3D-tensorflow)

  1. python train.py --model c3d --numGpus 4 --dataset UCF101 --train 1 --load 0 --inputDims 16 --outputDims 101 --seqLength 1 --size 112 --expName c3d_finetune_UCF101 --numClips 5 --clipLength 16 --clipOffset random --numVids 9537 --split 1 --wd 0.0005 --lr 0.0001 --nEpochs 10 --baseDataPath /z/dat --fName trainlist --saveFreq 1 --verbose 1 --batchSize 10 --loadWeights Sports1M_finetune_UCF101

C3D Fine-tuned Testing (UCF101)

(NOTE: Best results are shown, 93.55% when fine-tuning on the model “C3D UCF101 split1” at https://github.com/hx173149/C3D-tensorflow)

  1. python test.py --model c3d --dataset UCF101 --loadedDataset UCF101 --load 1 --inputDims 16 --outputDims 101 --seqLength 1 --size 112 --expName c3d_finetune_UCF101 --numClips 1 --clipLength 16 --clipOffset random --numVids 3783 --split 1 --baseDataPath /z/dat --fName testlist --verbose 1

TSN Training (HMDB51)

  1. python train.py --model tsn --dataset HMDB51 --loadedDataset HMDB51 --numGpus 4 --load 0 --inputDims 3 --outputDims 51 --batchSize 56 --seqLength 3 --size 224 --expName tsn_HMDB51 --numVids 3570 --lr 0.001 --wd 0.0005 --nEpochs 40 --split 1 --baseDataPath /z/dat --fName trainlist --gradClipVal 20 --optChoice momentum

TSN Testing (HMDB51) (NOTE: Results do not use our trained model. Uses weights from the original author.)

  1. python test.py --model tsn --dataset HMDB51 --loadedDataset HMDB51 --load 1 --inputDims 250 --outputDims 51 --seqLength 250 --size 224 --expName tsn_HMDB51 --numVids 1530 --split 1 --baseDataPath /z/dat --fName testlist --loadWeights pretrained_HMDB51

TSN Training (UCF101)

  1. python train.py --model tsn --dataset UCF101 --loadedDataset UCF101 --numGpus 4 --load 0 --inputDims 3 --outputDims 101 --batchSize 56 --seqLength 3 --size 224 --expName tsn_UCF101 --numVids 9537 --lr 0.001 --wd 0.0005 --nEpochs 80 --split 1 --baseDataPath /z/dat --fName trainlist --gradClipVal 20 --optChoice momentum

TSN Testing (UCF101) (NOTE: Results do not use our trained model. Uses weights from the original author.)

  1. python test.py --model tsn --dataset UCF101 --loadedDataset UCF101 --load 1 --inputDims 250 --outputDims 101 --seqLength 250 --size 224 --expName tsn_UCF101 --numVids 3783 --split 1 --baseDataPath /z/dat --fName testlist --loadWeights pretrained_UCF101

Version History

Current Version: 3.0

Version 3.0 (GitHub Release)

Automated the generation of models and preprocessing files as well as importing models. Provide weights and mean files available for download. Matched authors performance of most models (C3D, TSN, ResNet50+LSTM, I3D) on UCF101 and HMDB51 datasets.

Version 2.0

Implemented TFRecords based data loading to replace HDF5 files for increased performance. Training has been updated to allow models to be trained on multiple GPUs concurrently. Parallel data loading has been incorporated using TFRecords queues to allow maximized use of available GPUs. The tensorflow saver checkpoints have been replaced with a custom version which reads and writes models weights directly to numpy arrays. This will allow existing model weights from other sources to be more easily imported into this framework. Currently validation is not compatible with this tfrecords framework.

Version 1.0

Initial release. Using pre generated HDF5 files, test LRCN model on UCF101 dataset and train ResNet and VGG16 models on HMDB51 dataset. Tensorboard supported, single processor and single GPU implementation with the ability to cancel and resume training every 50 steps. Documentation includes basic overview and example of training and testing commands.

Future features:

  • Include validation during training
  • Add training and testing on optical flow

Acknowledgements

Supported by the Intelligence Advanced Research Projects Activity (IARPA) via
Department of Interior/ Interior Business Center (DOI/IBC) contract number
D17PC00341. The U.S. Government is authorized to reproduce and distribute
reprints for Governmental purposes notwithstanding any copyright annotation
thereon. Disclaimer: The views and conclusions contained herein are those of
the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or
the U.S. Government.
This work was also partially supported by NIST 60NANB17D191 and ARO
W911NF-15-1-0354.

Code Acknowledgements

We would like to thank the following contributors for helping shape our platform and their invaluable input in achieving current levels of performance,

References

[1] D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks, ICCV 2015

[2] J. Carreira, A. Zisserman, Quo vadis, action recognition? a new model and the kinetics dataset, CVPR 2017

[3] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, L. Van Gool, Temporal segment networks: Towards good practices for deep action recognition, ECCV 2016

[4] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T. Darrell, Long-term recurrent convolutional networks for visual recognition and description, CVPR 2015

[5] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, CVPR 2016.