We present a neural network based approach to key frame extraction in the compressed domain. The proposed method is an amalgamation of both the MPEG-7 descriptors namely motion intensity descriptor and spatial activity descriptor. Shot boundary detection and block motion estimation techniques are employed prior to the extraction of the descriptors. The motion intensity (“pace of action”) is obtained using a fuzzy system that classifies the motion intensity into five categories proportional to the intensity. The spatial activity matrix determines the spatial distribution of activity (“active regions”) in a frame. A neural network is used to pick those frames as key frames which have high intensity and maximum spatial activity at the center of the frame. Results are compared against two well-known key frame extraction techniques to demonstrate the advantage and robustness of the proposed approach. Results show that the neural network approach performs much better than selecting first frame of the shot as a key frame and selecting middle frame of the shot as a key frame methods.

Publication Date



"A neural network approach to key frame extraction," Proceedings of SPIE, Storage and Retrieval Methods and Applications for Multimedia 2004. The International Society for Optical Engineering. Held December 2003. Copyright 2003 The Society of Photo-Optical Instrumentation Engineers. This paper is made available as an electronic reprint with permission of SPIE. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited. ISSN:0277-786X Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works in February 2014.

Document Type


Department, Program, or Center

Chester F. Carlson Center for Imaging Science (COS)


RIT – Main Campus