The continuous growth of video technology has resulted in increased research into the semantic analysis of video. The multimodal property of the video has made this task very complex. The objective of this thesis was to research, implement and examine the underlying methods and concepts of semantic analysis of videos and improve upon the state of the art in automated emotion recognition by using semantic knowledge in the form of Bayesian inference. The main domain of analysis is facial emotion recognition from video, including both visual and vocal aspects of facial gestures. The goal is to determine if an expression on a person's face in a sequence of video frames is happy, sad, angry, fearful or disgusted. A Bayesian network classification algorithm was designed and used to identify and understand facial expressions in video. The Bayesian network is an attractive choice because it provides a probabilistic environment and gives information about uncertainty from knowledge about the domain. This research contributes to current knowledge in two ways: by providing a novel algorithm that uses edge differences to extract keyframes in video and facial features from the keyframe, and by testing the hypothesis that combining two modalities (vision with speech) yields a better classification result (low false positive rate and high true positive rate) than either modality used alone.
Library of Congress Subject Headings
Human face recognition (Computer science); Computer vision; Pattern recognition systems; Facial expression; Bayesian statistical decision theory
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Vashi, Gati, "Semantic Analysis of Facial Gestures from Video Using a Bayesian Framework" (2011). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus