Induced affect is the emotional effect of an object on an individual. It can be quantiﬁed through two metrics: valence and arousal. Valance quantifies how positive or negative something is, while arousal quantifies the intensity from calm to exciting. These metrics enable researchers to study how people opine on various topics. Affective content analysis of visual media is a challenging problem due to differences in perceived reactions. Industry standard machine learning classifiers such as Support Vector Machines can be used to help determine user affect. The best affect-annotated video datasets are often analyzed by feeding large amounts of visual and audio features through machine-learning algorithms. The goal is to maximize accuracy, with the hope that each feature will bring useful information to the table.
We depart from this approach to quantify how different modalities such as visual, audio, and text description information can aid in the understanding affect. To that end, we train independent models for visual, audio and text description. Each are convolutional neural networks paired with support vector machines to classify valence and arousal. We also train various ensemble models that combine multi-modal information with the hope that the information from independent modalities benefits each other.
We ﬁnd that our visual network alone achieves state-of-the-art valence classiﬁcation accuracy and that our audio network, when paired with our visual, achieves competitive results on arousal classiﬁcation. Each network is much stronger on one metric than the other. This may lead to more sophisticated multimodal approaches to accurately identifying affect in video data. This work also contributes to induced emotion classification by augmenting existing sizable media datasets and providing a robust framework for classifying the same.
Library of Congress Subject Headings
Video recordings--Psychological aspects; Video recordings--Data processing; Phenomenology--Research; Support vector machines; Machine learning
Computer Engineering (MS)
Department, Program, or Center
Computer Engineering (KGCOE)
Thomas, Titus Pallithottathu, "The Emotional Impact of Audio - Visual Stimuli" (2017). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus