We present a neural network based approach to key frame extraction in the compressed domain. The proposed method is an amalgamation of both the MPEG-7 descriptors namely motion intensity descriptor and spatial activity descriptor. Shot boundary detection and block motion estimation techniques are employed prior to the extraction of the descriptors. The motion intensity (“pace of action”) is obtained using a fuzzy system that classifies the motion intensity into five categories proportional to the intensity. The spatial activity matrix determines the spatial distribution of activity (“active regions”) in a frame. A neural network is used to pick those frames as key frames which have high intensity and maximum spatial activity at the center of the frame. Results are compared against two well-known key frame extraction techniques to demonstrate the advantage and robustness of the proposed approach. Results show that the neural network approach performs much better than selecting first frame of the shot as a key frame and selecting middle frame of the shot as a key frame methods.
Department, Program, or Center
Chester F. Carlson Center for Imaging Science (COS)
Storage and Retrieval Methods and Applications for Multimedia 2004 5307 (2003) 437-447
RIT – Main Campus