Max Słota, AGH University of Kraków, Poland
Piotr Błędowski, AGH University of Kraków, Poland
Adam Stajek, AGH University of Kraków, Poland
Sign Language Recognition (SLR) is a crucial step toward inclusive human-computer interaction, yet progress in this field is often constrained by the scarcity of annotated datasets. This work investigates isolated Polish Sign Language recognition using lightweight recurrent models trained on pose landmarks extracted from RGB video. By leveraging MediaPipe’s landmark detection, we substantially reduce input dimensionality of the visual input and bypass the need for expensive video preprocessing. The dataset used comprises ~2000 samples spanning 110 gesture classes, with some classes represented by as few as 5 examples – creating a challenging low-resource scenario. To address this, we focus on compact architectures that balance accuracy and efficiency. In particular, we evaluate Long Short-Term Memory (LSTM) networks as well as transformer-based models. Despite the limited dataset size, optimized sequence models with <100K parameters achieved recognition accuracies exceeding 80%. These results demonstrate that dimensionality reduction combined with carefully designed lightweight networks can yield strong performance under extreme data scarcity, paving the way for practical gesture recognition systems in low-resource settings.