Live speech transcription and captioning are important for the accessibility of deaf and hard of hearing individuals, especially in situations with no visible ASL translators. If live captioning is available at all, it is typically rendered in the style of closed captions on a display such as a phone screen or TV and away from the real conversation. This can potentially divide the focus of the viewer and detract from the experience. This paper proposes an investigation into an alternative, Augmented Reality driven approach to the display of these captions, using deep neural networks to compute, track and associate deep visual and speech descriptors in order to maintain captions as "speech bubbles" above the speaker.

Library of Congress Subject Headings

Real-time closed captioning--Technological innovations; Augmented reality; Neural networks (Computer science)

Publication Date


Document Type


Student Type


Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)


Joe Geigel

Advisor/Committee Member

Zack Butler

Advisor/Committee Member

Thomas Kinsman


RIT – Main Campus

Plan Codes