Many sign languages are bonafide natural languages with grammatical rules and lexicons, hence can benefit from neural machine translation methods. As significant advances are being made in natural language processing (specifically neural machine translation) and in computer vision processes, specifically image and video captioning, related methods can be further researched to boost automated sign language understanding. This is an especially challenging AI research area due to the involvement of a continuous visual-spatial modality, where meaning is often derived from context. To this end, this thesis is focused on the study and development of new computational methods and training mechanisms to enhance sign language translation in two directions, signs to texts and texts to signs. This work introduces a new, realistic phrase-level American Sign Language dataset (ASL/ ASLing), and investigates the role of different types of visual features (CNN embeddings, human body keypoints, and optical flow vectors) in translating ASL to spoken American English. Additionally, the research considers the role of multiple features for improved translation, via various fusion architectures. As an added benefit, with continuous sign language being challenging to segment, this work also explores the use of overlapping scaled visual segments, across the video, for simultaneously segmenting and translating signs. Finally, a quintessential interpreting agent not only understands sign language and translates to text, but also understands the text and translates to signs. Hence, to facilitate two-way sign language communication, i.e. visual sign to spoken language translation and spoken to visual sign language translation, a dual neural machine translation model, SignNet, is presented. Various training paradigms are investigated for improved translation, using SignNet. By exploiting the notion of similarity (and dissimilarity) of visual signs, a metric embedding learning process proved most useful in training SignNet. The resulting processes outperformed their state-of-the-art counterparts by showing noteworthy improvements in BLEU 1 - BLEU 4 scores.
Department, Program, or Center
Computer Science (GCCIS)
Ananthanarayana, Tejaswini, "A Comprehensive Approach to Automated Sign Language Translation" (2021). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus