One-shot skeleton-based action recognition on strength and conditioning exercises

Can AI models learn to classify strength and conditioning exercises after being given only one example of each exercise? This is the question that Michael Deyzel and Doctor Rensu Theart, both at the Department of Electrical and Electronic Engineering at Stellenbosch University, aim to answer in their latest research paper which the authors presented at the IEEE/Computer Vision Foundation CVPR Workshop in Vancouver, Canada. 

There is a need in the sports and fitness industry for a practical system that can identify and understand human physical activity to enable intelligent workout feedback and virtual coaching. Now imagine a system that could watch someone exercise, understand their movements, and then provide smart workout feedback or even virtual coaching. 

This is where SU-EMD, a new kind of dataset filled with motion sequences of seven common strength and conditioning exercises, comes in. These sequences are captured by both markerless and marker-based motion capture systems. This data helps solve what’s known as the one-shot skeleton action recognition problem.

Background to the Research 

What made the research challenging is that this system would need to figure out an athlete’s actions from just a few examples – it’s just not possible to gather a large quantity of human data for every single action. 

In simple terms, they’re using a state-of-the-art system called a graph convolutional network (GCN) to differentiate between actions. The GCN is similar to a sorting system that pushes dissimilar actions apart and pulls similar actions closer together. After some training, the GCN was able to correctly identify new actions 87.4% of the time with just one try.

The research also considered the impact of various factors through an ablation study. The bottom line is that this one-shot metric learning method could be a game-changer for classifying sports actions in a virtual coaching system, especially when users can only provide a few expert examples for the enrolment of new actions.

Approach to the Research

The authors suggest training a cutting-edge graph convolutional architecture as an encoder model on the large-scale NTU RGB+D skeleton data. This GCN model is designed to extract spatial-temporal features directly from skeleton sequence data, projecting them into an embedding space that clusters similar actions.


The authors have made significant strides toward creating a practical system that can recognise strength and conditioning movements in sports. They’ve introduced a unique dataset of skeleton-based exercises for strength and conditioning. These 840 samples across seven exercise action classes are now open for researchers studying action recognition in the fields of sports, health, and fitness.

To test the waters, they used their dataset as a one-shot test set, training it on a separate large-scale dataset. They aimed to explore the potential of transferring spatial-temporal features using the top-tier ST-GCN architecture. They discovered that skeleton augmentations – such as random moving, rotation, frame dropping, and pivoting – combined with the multi-similarity loss gave them the best results on their validation set.

While their findings revealed that a standard ST-GCN trained as a feature embedding in a metric learning paradigm could compete but not surpass the current best practices, they made an exciting new discovery. For the first time, they demonstrated that spatial-temporal features could be easily learned and transferred to classify entirely new classes of exercises with impressive accuracy.

This method could potentially be employed to classify sports actions in virtual coaching systems, which is particularly useful when users can’t provide many expert examples for the enrollment of new actions.

You can read the complete research paper here.