Back to list

Online sign language interpreter


AI algorithm that converts video of a person using sign language into a text transcript

Latest achievements of CV specialists give us an opportunity to let people with hearing or speech disabilities communicate seamlessly and share knowledge.

Recognition of words of the sign language is carried out according to the algorithm:
Obtaining information about the spatial position of body parts using MediaPipe Holistic models. The model receives an image as input, searches for people in this image, and builds a human skeleton with points in three-dimensional space.
SignLab eng.png

Information preparation. Primary information allows you to define words, but due to the huge variability of movements in three-dimensional space, this is only possible for a very limited number of words. The result of performing transformations on the primary information is a vector containing the maximum amount of useful information (for one video frame) with a minimum space usage.

Recognition of words of the sign language is carried out by a set of vectors.
One vector characterizes the position of human body parts on one video frame.

Extended with text-to-speech and speech-to-text methods, this algorithm is able to translate:

  • speech to text;
  • text to speech;
  • sign language to text;
  • sign language to speech.

The sign language lacks punctuation marks and any other components that could indicate that one gesture ended and another began. Therefore, in order to recognize words, it was necessary to solve the problem of finding the beginning and the end of a word on a continuous time series. To do this, a sliding window algorithm (Windowing method) was developed, which, based on a set of many previous words, can predict the next one.

The algorithm is implemented for 2 languages: Russian and English.

Development time
50 weeks 10 developers