Live speech transcription and captioning are important for the accessibility of deaf and hard of hearing individuals, especially in situations with no visible ASL translators. If live captioning is available at all, it is typically rendered in the style of closed captions on a display such as a phone screen or TV and away from the real conversation. This can potentially divide the focus of the viewer and detract from the experience. This paper proposes an investigation into an alternative, Augmented Reality driven approach to the display of these captions, using deep neural networks to compute, track and associate deep visual and speech descriptors in order to maintain captions as "speech bubbles" above the speaker.
Library of Congress Subject Headings
Real-time closed captioning--Technological innovations; Augmented reality; Neural networks (Computer science)
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Bowald, Dylan, "AR Comic Chat" (2020). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus