- Here is an example of recognising a person playing a guitar.
Most of the HAR models out there are just too heavy and cannot be deployed on low power hardware like Raspberry Pi, Jetson Nano etc. Even in Laptops the inference time is very high and causes a lot of lag. This model efficiently solves this problem,
A Binary HAR classifier that can be trained and deployed in less than 10 lines of code.
As this is a time series problem using an LSTM was an apparent choice.
The LSTM had to be taught the relative motion of body joints for a certain action.
- Dataset Used : Kinetic400
In preprocessing I have inferred Posenet (TF-lite model) using tf.Interpreter().
- Posenet returns a HeatMap and an OffsetMap.
- Using this we extract the location of the 17 Keypoints/Body Joints that posenet detects.
- If a certain joint is not in frame it is assigned [0,0].
- From every video we sample 9 frames that are equally spaced.
- Now each frame of the 9 frames in each video will contain 17 lists of [x,y].
- This implies 34 points in each frame.
- Using the path to the dataset we can make a .csv file which is essentially our training data.
- There are 'x' no of videos each having 9 frames which in turn have 34 points.
- So input shape for first LSTM becomes (9,34).
- The model is a 3 layer LSTM with 128 nodes each, along with Dropouts of 0.2 and 0.1 and a batch normalization.
- Which is followed by a Dense Layer of 32 nodes and an Output Layer of 1 node.
- During inference we use Posenet again for preprocessing and pandas takes care of the rest.
- Load in the model and pass on the preprocessed data to the model.
- The model will make a binary classification based on the 2 labels you trained it on.
- Use the script
testing_model.py
. - There are 2 models in the repo
mwrestling_vs_guitar.model
andguitar_vs_yoga.model
. - In the below line insert the name of the model you want to infer.
>> model = tf.keras.models.load_model('<model_name>')
- Once you have the video offline (or) a stack of 9 frames in your live inference. Pass it on to the function
preprocess_video
. Please have it ready in mp4 format.
>> X_test = preprocess_video('< Path to Video >')
- Once these 2 line of the code are edited. You can run the code an Obtain the prediction.
- Use the script
train.py
.
- NOTE : Ensure all training examples are 10 seconds long. i.e., use kinetics dataset.
- Now using the function
generate_training_data
we can essentially make out training data. The function takes 2 parameters Path to videos and Name of the csv you want to generate
>> generate_training_data('<Path_to_folder_containing_videos>', '<name_of_csv>')
- Now that both the csv's are generated. Use the
preprocess_csv
function to get your training_array. The function takes 2 parameters Path to CSV and No of samples in validation set.
>> X_action2_train, X_action2_test = preprocessing_csv('<name_of_csv>', no_of_elements_in_validation_set)
- This returns the test and training split.
- Now use the
get_final_data_for_model
.
>> X_train, X_test, Y_train, Y_test = get_final_data_for_model(X_action2_train, X_action2_test, X_action1_train, X_action1_test)
- Use the
shuffler
function to shuffle the data.
>> X_train, Y_train = shuffler(X_train, Y_train)
- All the work is done. Time to train the model.
>> train_LSTM(X_train, Y_train, X_test, Y_test, name_of_the_model)
Here we give in the X's and Y's along with the NAME by which you want the model to be saved.
- You can checkout how to infer it in the previous section.