Claims
- 1. A method for creating a multimedia presentation having video frames and video segments with synchronized audio, said method comprising the steps of:
(a) receiving audio-video data; (b) separating said audio-video data into an audio stream and a video sequence; (c) dividing said video sequence into video segments, each of said video segments comprising a group of frames; (d) for each said video segment
(d1) calculating an audio significance measure using said audio stream related to said video segment; (d2) using at least said audio significance measure, selecting either said video segment in its entirety or extracting at least one slide frame from said corresponding group of frames; (e) synchronizing said audio stream and said selected video segment and slide frames; and (f) synchronously reproducing selected video segments and slide frames and said audio stream as said multimedia presentation.
- 2. The method according to claim 1, wherein, in step (d1), said audio significance measure is calculated from a measure of activity within said audio stream.
- 3. The method according to claim 2, wherein said measure of activity is determined from a frequency domain representation of said audio stream.
- 4. The method according to claim 2, wherein said measure of activity is calculated from one or more of power, peak-frequency and frequency spread.
- 5. The method according to claim 4, wherein said measure of activity is calculated as power+peak frequency+spread.
- 6. The method according to claim 4, wherein said measure of activity is calculated as 2×power+peak frequency+0.5×spread.
- 7. The method of claim 1, wherein, in step (d2), said selection is based upon a comparison of said audio significance measure with an activity threshold, such that if said threshold is exceeded, then said video segment is selected, else if not exceeded, said at least one slide frame is selected.
- 8. A method for creating a multimedia presentation having video frames and video segments with synchronized audio, said method comprising the steps of:
(a) receiving audio-video data; (b) separating said audio-video data into an audio stream and a video sequence; (c) dividing said video sequence into video segments, each of said video segments comprising a group of frames; (d) for each said video segment
(d11) extracting at least one representative frame from the corresponding said group of frames; (d2) calculating a video significance measure using said frames; (d3) calculating an audio significance measure using said audio stream related to said video segment; (d4) using said video and audio significance measures, selecting either said video segment in its entirely or extracting at least one slide frame from said group of frames; (e) synchronizing said audio stream and said selected video segments and slide frames; and (f) synchronously reproducing said segments and slide frames and said audio stream.
- 9. The method according to claim 8, wherein, in step (d1), said audio significance measure is calculated from a measure of activity within said audio stream.
- 10. The method according to claim 9, wherein said measure of activity is determined from a frequency domain representation of said audio stream.
- 11. The method according to claim 8, wherein said measure of activity is calculated from one or more of power, peak-frequency and frequency spread.
- 12. The method according to claim 11, wherein said measure of activity is calculated as power+peak frequency+spread.
- 13. The method according to claim 11, wherein said measure of activity is calculated as 2×power+peak frequency+0.5×spread.
- 14. The method according to claim 8, wherein said video significance measure is determined from a level of relative movement between said frames.
- 15. The method according to claim 14, wherein said frames comprise objects and said level of relative movement is determined from a direction and magnitude of motion (ui,vi) of each object in said frames to derive an activity value of the frame.
- 16. The method according to claim 15, wherein said activity value is determined from the standard deviation of the direction and magnitude (ui,vi) of each object.
- 17. The method of claim 16, wherein said activity value is determined according to the expression:
- 18. The method of claim 8, wherein, in step (d4), said selection is based upon a comparison of said audio significance measure with an activity threshold, such that if said threshold is exceeded, then said video segment is selected, else if not exceeded, said at least one slide frame is selected.
- 19. The method of claim 18, wherein said combined significance measure is equal to one of the audio significance measure, the video significance measure, or an average of said audio and video significance measures depending upon the level of audio and video activity.
- 20. Apparatus for creating a multimedia presentation having video frames and video segments with synchronized audio, said apparatus comprising:
input means for receiving audio-video data; means for separating said audio-video data into an audio stream and a video sequence; means for dividing said video sequence into video segments, each of said video segments comprising a group of frames; processor means which, for each of video segment, calculates an audio significance measure using said audio stream related to said video segment, and, using at least said audio significance measure, selects either said video segment in its entity or extracting at least one slide frame from said corresponding group of frames; means for synchronizing said audio stream and said selected video segment and slide frames; and means for synchronously reproducing selected video segments and slide frames and said audio stream as said multimedia presentation.
- 21. Apparatus for creating a multimedia presentation having video frames and video segments with synchronized audio, comprising:
means for receiving audio-video data; means for separating said audio-video data into an audio stream and a video sequence; means for dividing said video sequence into video segments, each of said video segments comprising a group of frames; processor means which, for each said video segment, extracts at least one representative frame from the corresponding said group of frames, calculates a video significance measure using said frames, calculates an audio significance measure using said audio stream related to said video segment, and, using said video and audio significance measures, selects either said video segment in its entirely or extracting at least one slide frame from said group of frames; means for synchronizing said audio stream and said selected video segments and slide frames; and means for synchronously reproducing said segments and slide frames and said audio stream.
- 22. A computer program product including a computer readable medium incorporating a computer program for creating a multimedia presentation having video frames and video segments with synchronized audio, said computer program having:
input code means for receiving audio-video data; code means for separating said audio-video data into an audio stream and a video sequence; code means for dividing said video sequence into video segments, each of said video segments comprising a group of frames; processing code means which, for each of video segment, calculates an audio significance measure using said audio stream related to said video segment, and, using at least said audio significance measure, selects either said video segment in its entity or extracting at least one slide frame from said corresponding group of frames; code means for synchronizing said audio stream and said selected video segment and slide frames; and code means for synchronously reproducing selected video segments and slide frames and said audio stream as said multimedia presentation.
- 23. A computer program product including a computer readable medium incorporating a computer program for creating a multimedia presentation 15 having video frames and video segments with synchronized audio, said computer program having:
code means for receiving audio-video data; code means for separating said audio-video data into an audio stream and a video sequence; code means for dividing said video sequence into video segments, each of said video segments comprising a group of frames; processing code means which, for each said video segment, extracts at least one representative frame from the corresponding said group of frames, calculates a video significance measure using said frames, calculates an audio significance measure using said audio stream related to said video segment, and, using said video and audio significance measures, selects either said video segment in its entirety or extracting at least one slide frame from said group of frames; code means for synchronizing said audio stream and said selected video segments and slide frames; and code means for synchronously reproducing said segments and slide frames and said audio stream.
RELATED APPLICATION
[0001] This is a continuation-in-part of application Ser. No. 09/215,004, the contents of which are incorporated herein by cross-reference.
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
09215004 |
Dec 1998 |
US |
Child |
10051631 |
Jan 2002 |
US |