Automating the segmentation of anomalous activities within long video sequences is complicated by the ambiguity of how such events are defined. This thesis approaches the problem by learning generative models with which meaningful sequences can be identified in videos using limited supervision. We propose two types of end-to-end trainable Convolutional Long Short-Term Memory (Conv-LSTM) networks that are able to predict the subsequent video sequence from a given input. The first is an encoder decoder based model that learns spatio-temporal features from stacked non-overlapping image patches, and the second is an autoencoder based model that utilizes max-pooling layers to learn an abstraction of the entire image. The networks learn to model “normal” activities from usual events. Regularity scores are derived from the reconstruction errors of a set of predictions with abnormal video sequences yielding lower regularity scores, as they diverge further from the actual sequence with time. The models utilize a composite structure and examine the effects of “conditioning” to learn more meaningful representations. The best model is chosen based on the reconstruction and prediction accuracies. The Conv-LSTM models are evaluated both qualitatively and quantitatively, demonstrating competitive results on multiple anomaly detection datasets. Conv-LSTM units are shown to provide competitive results for modeling and predicting learned events when compared to state-to-the-art methods.
Library of Congress Subject Headings
Image processing--Digital techniques; Machine learning; Optical pattern recognition
Computer Engineering (MS)
Department, Program, or Center
Computer Engineering (KGCOE)
Medel, Jefferson Ryan, "Anomaly Detection Using Predictive Convolutional Long Short-Term Memory Units" (2016). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus