We address the problem of key frame extraction in the compressed domain that is of great importance in content-based system applications. A novel MPEG-7 motion activity descriptor is discussed that is a combination of temporal and spatial descriptors. These descriptors represent both temporal motion intensity as well as spatial distribution of motion activity. It is assumed that the apriori information about the shot boundaries is available. The temporal descriptors are obtained by classifying the shots into five different intensity levels based on fuzzy membership functions. A high value of intensity indicates high activity and a low value of intensity indicates low activity. The spatial descriptors are obtained using motion vectors. The individual frames are characterized into spatial regions depending on the change in motion activity between successive frames. The main motivation behind this approach is to pick those frames as key frames that have maximum centralized spatial activity and high motion intensity. The motion intensity and spatial distribution are then fed to a neural network that decides the key frames based on maximum temporal activity and centralized spatial distribution. Results illustrate that the proposed approach is computationally less intensive once the network is trained and works much better than selecting the first frame and middle frame of the shot as key frame for a wide range of video sequences.
Department, Program, or Center
Chester F. Carlson Center for Imaging Science (COS)
Signals, Systems and Computers 2 (2003) 1575-1579
RIT – Main Campus