The present invention relates to a recording/reproduction apparatus and a recording/reproduction method, and a recording medium storing a recording/reproduction program. More particularly, the present invention relates to a recording/reproduction apparatus and a recording/reproduction method having a variable-speed reproduction function, and a recording medium storing a recording/reproduction program, and an integrated circuit for use in the recording/reproduction apparatus.
Conventionally, there is a recording/reproduction apparatus which reproduces AV data with a predetermined speed (fast or slow playback) without changing an interval. In such a recording/reproduction apparatus, for example, by invariably reproducing audio and video with the same reproduction speed, transition to fast or slow playback can be performed without a sense of discomfort. On the other hand, there is an increasing demand for a higher reproduction speed for efficient view. However, when the reproduction speed is excessively increased, the degree of understanding of sentences is conversely reduced (reproduced speech becomes fast talk which is difficult to recognize). Regarding such a problem, a variable-speed reproduction method is known which changes a reproduction speed as appropriate, depending on a speech or non-speech portion. For example, by reproducing speech portions with a lower speed and non-speech portions with a higher speed, high-speed reproduction can be achieved without reducing the understanding of a sentence (e.g., Patent Document 1).
Even when real-time reproduction is required (e.g., television broadcasting), variable-speed reproduction is effectively used. Specifically, in order to improve the degree of understanding of sentences in television-broadcast speech mainly for older persons, a method is known in which speech portions are reproduced at a reduced reproduction speed (slow playback), and non-speech portions are reproduced at an increased speed (fast playback), so that both real-time reproduction and clear reproduction can be achieved in a television which does not have a large-capacity storage device. In such a method, the speed of a video signal may be either a fixed speed or a speed following a change in the speed of speech (e.g., Patent Document 2).
The configuration of a conventional recording/reproduction apparatus as described above will be described with reference to
In recording/reproduction apparatuses having configurations as described above, by analyzing audio, and changing a reproduction speed, depending on the speech and non-speech portions (e.g., reproduction is performed at a lower speed in speech portions, and reproduction is performed at a higher speed in non-speech portions), fast playback can be performed without reducing the degree of understanding of sentences, resulting in efficient viewing.
Patent Document 1: Japanese Patent Laid-Open Publication No. 2001-290500
Patent Document 2: Japanese Patent Laid-Open Publication No. 2001-298710
However, in recording/reproduction apparatuses as described above, the video reproduction speed is inevitably either caused to follow an audio reproduction speed or fixed to a predetermined speed. Therefore, for example, in the former case, the video reproduction speed is frequently changed as the audio reproduction speed varies depending on speech and non-speech sections. As a result, motions become less smooth in scenes including significant motions (sports, etc.), scenes which pan across a landscape, and the like, so that the user has a sense of discomfort.
Therefore, an object of the present invention is provide a recording/reproduction apparatus and a recording/reproduction method for recording and reproducing television broadcast programs, captured moving images, and the like, in which the reproduction speeds of audio and image are both controlled so that variable-speed reproduction can be achieved without a sense of discomfort.
To achieve the above objects, the present invention has the following aspects.
A first aspect of the present invention is directed to a recording/reproduction apparatus comprising an AV data accumulation unit for accumulating an audio signal and a video signal, an AV data analyzing unit for analyzing feature amounts of the audio signal and the video signal accumulated in the AV data accumulation unit, a speed determining unit for determining reproduction speeds of the audio signal and the video signal based on the feature amounts of the audio signal and the video signal analyzed by the AV data analyzing unit, an audio h reproduction speed converting unit for changing the reproduction speed of the audio signal based on the audio reproduction speed determined by the speed determining unit, and an image reproduction speed converting unit for changing the reproduction speed of the video signal based on the video reproduction speed determined by the speed determining unit.
In a second aspect based on the first aspect, the AV data analyzing unit performs the analysis when the audio signal and the video signal are accumulated into the AV data accumulation unit, and saves a result of the analysis in association with the audio signal and the video signal, and the speed determining unit determines the reproduction speeds of the audio signal and the video signal based on the result of the analysis.
In a third aspect based on the first aspect, the recording/reproduction apparatus further comprises a time difference calculating unit for calculating a time difference between the audio signal and the video signal being reproduced. The speed determining unit determines a reproduction speed of one of the audio signal and the video signal, depending on the time difference calculated by the time difference calculating unit.
In a fourth aspect based on the second aspect, the recording/reproduction apparatus further comprises an AV synchronizing unit for generating synchronization information for synchronizing the audio signal with the video signal, based on the feature amounts of the audio signal and the video signal analyzed by the AV data analyzing unit. The audio reproduction speed converting unit and the video reproduction speed converting unit synchronize the audio signal with the video signal based on the synchronization information.
In a fifth aspect based on the first aspect, the AV data analyzing unit has a face image detecting unit for detecting a face image from the feature amount of the video signal. The speed determining unit determines the reproduction speed, depending on a result of the detection of the face image.
In a sixth aspect based on the first aspect, the AV data analyzing unit has a motion vector detecting unit for detecting a motion vector from the feature amount of the video signal. The speed determining unit determines the reproduction speed, depending on a result of the detection of the motion vector.
In a seventh aspect based on the first aspect, the speed determining unit outputs, to the image reproduction speed converting unit, a signal indicating an instruction to change a reproduction mode between a first reproduction mode in which the video signal is reproduced at a previously determined reproduction speed, and a second reproduction mode in which the video signal is reproduced at a reproduction speed which is caused to follow the reproduction speed of the audio signal, based on the feature amounts analyzed by AV data analyzing unit, and the image reproduction speed converting unit changes the reproduction speed of the video signal based on a reproduction mode designated by the speed determining unit.
In an eighth aspect based on the seventh aspect, the AV data analyzing unit has a face image detecting unit for determining a face image from a feature amount of a video signal. The speed determining unit outputs, to the image reproduction speed converting unit, a signal indicating an instruction to reproduce a video signal in which the face image has not been detected in the first reproduction mode, and reproduce a video signal in which the face image has been detected in the second reproduction mode.
In a ninth aspect based on the seventh aspect, the AV data analyzing unit has a motion vector detecting unit for detecting a motion vector of video from a feature amount of a video signal. The speed determining unit outputs, to the image reproduction speed converting unit, a signal indicating an instruction to reproduce a video signal in which the motion vector has a predetermined value or more in the first reproduction mode, and reproduce a video signal in which the motion vector has the predetermined value or less in the second reproduction mode.
A tenth aspect of the present invention is directed to a recording/reproduction method comprising an AV data accumulation step of accumulating an audio signal and a video signal, an AV data analyzing step of analyzing feature amounts of the audio signal and the video signal accumulated in the AV data accumulation step, a speed determining step of determining reproduction speeds of the audio signal and the video signal based on the feature amounts of the audio signal and the video signal analyzed in the AV data analyzing step, an audio reproduction speed converting step of changing the reproduction speed of the audio signal based on the audio reproduction speed determined in the speed determining step, and an image reproduction speed converting step of changing the reproduction speed of the video signal based on the video reproduction speed determined in the speed determining step.
An eleventh aspect of the present invention is directed to a recording medium storing a recording/reproduction program for causing a computer of a recording/reproduction apparatus comprising an AV data accumulating unit for accumulating an audio signal and a video signal, to execute an AV data accumulation step of accumulating an audio signal and a video signal, an AV data analyzing step of analyzing feature amounts of the audio signal and the video signal accumulated in the AV data accumulation step, a speed determining step of determining reproduction speeds of the audio signal and the video signal based on the feature amounts of the audio signal and the video signal analyzed in the AV data analyzing step, an audio reproduction speed converting step of changing the reproduction speed of the audio signal based on the audio reproduction speed determined in the speed determining step, and an image reproduction speed converting step of changing the reproduction speed of the video signal based on the video reproduction speed determined in the speed determining step.
A twelfth aspect of the present invention is directed to an integrated circuit for use in a recording/reproduction apparatus comprising an AV data accumulation unit for accumulating an audio signal and a video signal, the integrated circuit comprising an AV data analyzing unit for analyzing feature amounts of the audio signal and the video signal accumulated in the AV data accumulation unit, a speed determining unit for determining reproduction speeds of the audio signal and the video signal separately based on the feature amounts of the audio signal and the video signal analyzed by the AV data analyzing unit, an audio reproduction speed converting unit for changing the reproduction speed of the audio signal based on the audio reproduction speed determined by the speed determining unit, and an image reproduction speed converting unit for changing the reproduction speed of the video signal based on the video reproduction speed determined by the speed determining unit.
According to the first aspect, both audio and video can be analyzed to control reproduction speeds of both the audio and image separately, depending on scenes.
According to the second aspect, the analysis is performed when audio and video are accumulated, thereby reducing process load during reproduction as compared to when the analysis is performed during reproduction.
According to the third aspect, a difference in reproduction time between speed-converted audio and speed-converted video is measured as appropriate, and the reproduction speed of audio or video is controlled at any time during reproduction so as to prevent the time difference from increasing, thereby making it possible to achieve variable-speed reproduction in which audio and image are prevented from being deviated from each other.
According to the fourth aspect, for example, synchronization information for synchronizing video with audio at a point where scenes are changed, is previously generated prior to reproduction based on the analysis result. By performing reproduction based on the synchronization information, reproduction can be performed while further reducing a sense of discomfort caused by a deviation of synchronization.
According to the fifth aspect, a reproduction speed can be changed, depending on the presence or absence of a face image in a scene. Therefore, for example, in a scene in which a human is speaking, a reproduction speed is slowed, and in the other scenes, a reproduction speed is increased. Thus, a reproduction speed can be adjusted, depending on scenes.
According to the sixth aspect, a reproduction speed can be changed, depending on the significance of a motion in a scene. Therefore, for example, in a scene having a significant motion, a reproduction speed is slowed, and in the other scenes, a reproduction speed is increased. Thus, a reproduction speed can be adjusted, depending on scenes.
According to the seventh aspect, an effect similar to the above-described first aspect is obtained.
According to the eighth aspect, an effect similar to the above-described fifth aspect is obtained.
According to the ninth aspect, an effect similar to the above-described sixth aspect is obtained.
Also, according to recording/reproduction method, the recording medium storing the recording/reproduction program, and the integrated circuit for use in the recording/reproduction apparatus, of the present invention, an effect similar to the above-described first aspect can be obtained.
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. Note that the present invention is not limited to the examples.
The accumulation unit 11 is implemented on the secondary memory unit 4. The accumulation unit 11 stores video data and audio data (hereinafter referred to as AV data) which are obtained by encoding a video signal and an audio signal (e.g., a recorded program, a captured moving image, etc.) in the form of MPEG or the like.
The AV analyzing unit 12 analyzes video and audio signals accumulated in the accumulation unit 11. The analysis includes detection of a speech section and a non-speech section, and the like in the case of audio signals; and detection of a face, detection of a luminance, detection of a motion vector, and the like in the case of video signals. Also, the AV analyzing unit 12 outputs a result of the analysis to the speed determining unit 13.
The speed determining unit 13 determines reproduction speeds of the video and audio signals based on the result of the analysis by the AV analyzing unit 12. Also, the speed determining unit 13 notifies the audio reproduction speed converting unit 14 and the video reproduction speed converting unit 15 of the determined reproduction speeds. Also, the speed determining unit 13 out-puts to the video reproduction speed converting unit 15 an instruction to perform reproduction with frame dropping (described below). The audio reproduction speed converting unit 14 converts the audio signal accumulated in the accumulation unit 11 based on the reproduction speed notified of by the speed determining unit 13, and outputs the result to the I/O interface unit 7. Similarly, the video reproduction speed converting unit 15 converts the video signal accumulated in the accumulation unit 11 based on the reproduction speed notified of by the speed determining unit 13, and outputs the result to the I/O interface unit 7. Also, the video reproduction speed converting unit 15 receives the instruction from the speed determining unit 13 and controls the frame dropping reproduction (described below).
Note that the AV analyzing unit 12, the speed determining unit 13, the audio reproduction speed converting unit 14, and the video reproduction speed converting unit 15 of
Here, an operation of the recording/reproduction apparatus 10 of the first embodiment will be roughly described. Initially, a broadcast program which is received by a reception unit (not shown) is recorded into the accumulation unit 11 in accordance with a timer recording or the like set by a user. Next, the user provides an instruction to perform fast playback of the program. The AV analyzing unit 12 analyzes feature amounts of video and audio signals of the recorded program. Next, the speed determining unit 13 determines reproduction speeds of the video and audio signals constituting the program based on a result of the analysis. Based on the reproduction speeds thus determined, the video signal and the audio signal are reproduced.
Hereinafter, a detailed operation of a recording/reproduction process performed by the recording/reproduction apparatus 10 will be described with reference to FIGS. 3 to 7.
Next, the AV analyzing unit 12 analyzes the program video and audio data read out from the accumulation unit 11, and outputs a result of the analysis to the speed determining unit 13 (step S3). The process of step S3 will be more specifically described. The AV analyzing unit 12 analyzes each image frame constituting the video data of the video and audio data thus read out, with respect to image feature amounts, such as an image luminance, the presence or absence of a face image, a motion vector, a scene change point, and the like. Similarly, the audio data is also analyzed with respect to audio feature amounts, such as whether the audio data is speech or non-speech, how much is an S/N ratio (S: speech, N: other sounds), or the like. Thereafter, the AV analyzing unit 12 associates a time stamp of the analyzed image and audio data with information about each feature amount of the image and audio, and outputs them as a result of the analysis to the speed determining unit 13.
Here, the luminance detection in the video feature amount analysis refers to detection of a luminance of an image frame at sometime. For example, luminance information itself of each image frame may be detected, or the presence or absence of a scene in which there is a significant change in luminance between image frames within a predetermined time may be detected using a threshold value. The face image detection refers to detection of the face of a human from an image frame at some time. This is achieved based on color difference information, the roundness of a contour, or the like in addition to the image frame luminance. Note that, regarding the face image detection, the “presence/absence” of a face image may be output as a result of the analysis based on a proportion of a face occupying an image frame, and in addition, the “possibility” of a face image may be output as a result of the analysis (e.g., the possibility is indicated by the value (e.g., 75%) of a probability that a face image is present, etc.). The motion vector detection refers to detection of a motion vector indicating the significance of a motion in video. Also, the scene change point refers to a point at which scenes are changed in a program (e.g., in a news program, a scene on location is changed to a close-up scene of an announcer, etc.), and it may be estimated that scenes are changed, when the amount of change over time in luminance is significant.
Next, the speed determining unit 13 determines a reproduction speed of each of the audio signal and the video signal based on the analysis result output by the AV analyzing unit 12, and notifies the audio reproduction speed converting unit 14 and the video reproduction speed converting unit 15 of the determined audio reproduction speed and video reproduction speed (step S4). The process of step S4 will be more specifically described with reference to FIGS. 4 to 6.
Referring back to
Note that video and audio are asynchronously reproduced in methods in which video is reproduced at a fixed speed as described above. However, in some cases, it is difficult to perform such asynchronous reproduction. For example, in the case of a moving image compressing method, such as MPEG or the like, video and audio streams in the vicinity of the same time are subjected to a multiplexing process and the resultant single stream is recorded into a medium. Therefore, in the case of reproduction, after data is read out from the medium, any one of video and audio streams needs to be buffered in a memory or the like, and needs to be decoded and reproduced with appropriate timing. For some apparatuses, of the above-described processes, due to cost a large-capacity memory may not be held, due to reproduction time management it may not be possible to design asynchronous reproduction itself, and the like. In these cases, a frame dropping reproduction method may be used instead of the above-described asynchronous reproduction. The frame dropping reproduction method refers to a method of displaying a frame or a field at arbitrary intervals, i.e., in an interlaced manner, in the case of moving image reproduction. In this case, timing with which video to be displayed is output may be the same as that of audio (i.e., synchronous output reproduction). Thus, if the frame dropping reproduction method is used, although smooth video is not obtained, video and audio are in synchronization with each other to some extent, so that it is not unnatural compared to when the above-described asynchronous reproduction is used, thereby making it possible to reduce a sense of discomfort for a viewer.
Also, a scene change point may be used to adjust a video reproduction speed so that video and audio are synchronized with each other. For example, in a case where the above-described asynchronous reproduction is applied to a scene having a significant motion, if a scene on location is changed to a close-up scene of an announcer in a studio when reproduction of video lags behind speech, the video of the scene on location is presented while speech of the announcer reading news is presented, so that the user feels a sense of discomfort. Therefore, in such a case, by skipping the video of a lagging scene on location with timing with which scenes are changed (scene change point) so that the video of the announcer is output, an adjustment process of synchronizing video with audio, or the like may be performed.
On the other hand, as a result of the determination in step 31, if the scene to be processed is neither a pan video nor a scene having a significant motion (NO in step S31), the speed determining unit 13 determines based on the analysis result whether or not a face video is present (step S35). As a result, when a face image is not present (NO in step S35), the flow goes to the process of step S32. On the other hand, when a face image is present (YES in step S35), it is determined whether or not it is a scene change point (step S36). As a result of the determination in step S36, when it is not a scene change point (NO in step S36), the image speed is set to be a speed which follows the audio speed (step S39). On the other hand, as a result of the determination in step S36, when it is a scene change point (YES in step S36), it is determined whether or not the time difference between audio and video to be reproduced is larger than or equal to a predetermined value (step S37). As a result of the determination, if the time difference is larger than or equal to the predetermined value (YES in step S37), the reproduction speed of an image is adjusted to be synchronized with audio. For example, the image speed is set so that, when reproduction of video lags, a lagging video portion is skipped, or when the video reproduction has preceded more, the video reproduction is temporarily stopped until audio catches up with video (step S38). On the other hand, if the time difference is smaller than or equal to the predetermined value, the speed determining unit 13 goes to the process of the above-described step S39. Thus, the video speed determining process is completed.
Referring back to
Referring back to
Thus, in the first embodiment, video and audio of each program recorded in the accumulation unit 11 are analyzed to determine reproduction speeds, so that the reproduction speeds of both the audio and image are adaptively controlled, depending on scenes, thereby achieving variable-speed reproduction without a sense of discomfort.
Note that, regarding the determination threshold value for the face image detection performed by the AV analyzing unit 12, not only the determination threshold value is fixedly set in a program which is being reproduced, but also the determination threshold value may be a variable threshold value which may be changed at any time during reproduction. For example, the significance of a motion is determined based on the detected motion vector, and if there is a scene having a significant motion (it is considered to be less possible that a person is speaking), a threshold value is set with which a face image is not likely to be detected, and when there is a scene which does not have a significant motion (it is considered to be more possible that a person is speaking), a threshold value may be set with which a face image is likely to be detected. Also, for example, in the case of video which pans across a landscape (hereinafter referred to as pan video), if face images are indiscriminately detected, the video reproduction speed is quickened or slowed in a succession of pan video, so that the reproduction speed is changed more frequently than necessary. Therefore, in the case of pan video, a threshold value may be set with which a face is not likely to be detected. Also, for example, when the proportion of speech sections occupying within a predetermined time is large, and an s/N ratio of speech to non-speech is large, it is considered to be highly possible that there are speakers (i.e., two or more) in a screen. Therefore, also in such a case, a threshold value is considered to be set with which a face image is likely to be detected. Also, the genre or the like of a program to be recorded is previously checked using an electronic program guide or the like, and analysis may be performed, taking the genre of the program into consideration. For example, in news programs and the like, the proportion of the image area occupied by the face of an announcer does not vary much. Therefore, the determination threshold value used for the above-described face detection may be a fixed value.
Also, in the audio feature amount analysis performed in the above-described step S3, non-speech thus determined may be categorized into clap, acclamation, noise, or the like using an audio signal estimating method based on a HMM (Hidden Markov Model) or a GMM (Gaussian Mixture Model), and in step S4, based on the resultant category, the audio reproduction speed may be determined.
Also, the audio reproduction speed determined in the above-described step S22 or S23 is not limited to 1.3 times or 4 times, and may be adaptively set, depending on a situation at that time. For example, when the S/N ratio is poor, it is generally considered that speech has less quality and is not recognizable. In such a case, the reproduction speed of speech sections may be set to be low so as to cause speech to be recognizable. Also, times of speech and non-speech are previously accumulated, and based on, for example, the ratio of the speech time to the non-speech time before determination of a reproduction speed, the speed may be adaptively determined so that a whole program can be reproduced in a target time previously set. For example, assuming that reproduction is started in accordance with an instruction to reproduce a 60-minute program in 30 minutes, at the time when an elapsed time in the program reaches 20 minutes, it is calculated whether or not the program is completely reproduced within 30 minutes with a current speed setting, based on a reproduction speed at that time and a ratio of speech sections and non-speech sections for the 20 minutes (e.g., when speech sections occupy 15 minutes of the 20 minutes, it is highly possible that speech sections occupy longer time in the future, etc.). When the program cannot be completely reproduced within 30 minutes, the audio reproduction speed in non-speech sections may be further increased, or the like.
Regarding the determination of a video reproduction speed in the above-described step S12, an electronic program guide or the like may be used to previously check the genre of a recorded program, and the video reproduction speed may be determined, taking the genre of the program into consideration. For example, for news programs, the video reproduction speed is in principle set to follow the audio reproduction speed. For sports programs, variety programs, and the like, the video reproduction speed may be in principle set to be fixed to, for example, the 2-times speed. Also, the “possibility” that a face image is output as a result of the above-described analysis result, may be used. For example, if the “possibility” of a face image is 80% or more and a scene has a significant motion, the speed may be determined to be 1.5 times. If the “possibility” of a face image is 30% and a scene has a significant motion, the speed may be determined to be 3 times.
Further, regarding the determination of the video reproduction speed, the recording/reproduction apparatus 10 may be provided with two video reproduction modes: a first reproduction mode in which video is reproduced at a previously designated speed; and a second reproduction mode in which video is reproduced at a speed which follows the audio reproduction speed. In this case, the speed determining unit 13 instructs the video reproduction speed converting unit 15 to change the two video reproduction modes based on the above-described analysis result. Thereafter, based on the instruction, the video reproduction speed converting unit 15 may change the video reproduction modes at any time and output video.
Next, a second embodiment of the present invention will be described with reference to
Hereinafter, a detailed operation of the recording/reproduction process of the second embodiment of the present invention will be described with reference to
Next, the CPU 2 receives an instruction to reproduce a recorded program from the user, and causes the audio reproduction speed converting unit 14 and the video reproduction speed converting unit 15 to read out AV data of the designated program from the accumulation unit. Further, the CPU 2 causes the speed determining unit 13 to perform a reproduction speed determining process (step S42).
Next, the speed determining unit 13 reads out the analysis result of the program to be reproduced, from the accumulation unit 11, and based on the analysis result, determines a reproduction speed of each of an audio signal and a video signal, and notifies the audio reproduction speed converting unit 14 and the video reproduction speed converting unit 15 of the determined audio reproduction speed and video reproduction speed (step S43). Note that the specific contents of the reproduction speed determining process are similar to those of the reproduction speed determining process of the above-described first embodiment (see
After the process of step S43 is finished, the CPU 2 causes the audio reproduction speed converting unit 14 to perform an audio signal speed converting and outputting process. In addition, the CPU 2 causes the video reproduction speed converting unit 15 to perform a video signal speed converting and outputting process (step S44). The operation of step S44 is similar to that of step S5 which has been described in the first embodiment with reference to
Thus, in the second embodiment, audio and video are analyzed when a program or the like is recorded, and the analysis result is saved in association with recorded data. Thereby, the analysis process does not need to be performed every time reproduction is performed, thereby making it possible to reduce the process load of the recording/reproduction apparatus when reproduction is performed.
Note that the analysis process is performed when a program is recorded in this embodiment. The analysis process may be performed during an idle time (e.g., a midnight time zone, etc.) in which the recording/reproduction apparatus does not perform a recording/reproduction process, to save the analysis result.
Further, not only the analysis process but also the speed determining process may be performed, and in addition, an instruction of a reproduction speed may be saved into the accumulation unit 11. Thereby, when reproduction is performed, the audio reproduction speed converting unit 14 and the video reproduction speed converting unit 15 may read out only the instruction of a reproduction speed from the accumulation unit 11, and may adjust and output the reproduction speeds of audio and video in accordance with the instruction.
Next, a third embodiment of the present invention will be described with reference to FIGS. 10 to 12. In the third embodiment, a time difference between reproduction times of video and audio after speed conversion is detected. Thereafter, when the detected time difference is larger than or equal to a predetermined value, for example, a reproduction speed in non-speech sections is further increased, or a reproduction speed in speech sections is further slowed, thereby reducing a deviation in reproduction time between video and audio. Note that the recording/reproduction apparatus 30 of this embodiment is similar to the recording/reproduction apparatus 10 which has been described in the first embodiment with reference to
Hereinafter, a detailed operation of the recording/reproduction process of the third embodiment of the present invention will be described with reference to
Following step S55, the time difference measuring unit 21 measures the time difference between reproduction times of the speed-converted audio output from the audio reproduction speed converting unit 14 and the speed-converted image output from the video reproduction speed converting unit 15, and outputs the time difference as time difference information to the speed determining unit 13 (step S56). For example, the time difference is calculated based on time stamp information which is assigned to each of audio data and image data.
The speed determining unit 13 adjusts the reproduction speeds using the time difference information in a reproduction speed determining process (see
On the other hand, as a result of the determination in step S61, when the speed determining unit 13 determines that the section to be processed is a non-speech section (NO in step S61), the speed determining unit 13 references the time difference information to determine whether or not the time difference is larger than or equal to a predetermined value (step S66). When the time difference is less than the predetermined value (NO in step S66), the speed determining unit 13 determines 4 times as the audio reproduction speed (step S70). On the other hand, when the time difference is larger than or equal to the predetermined value (YES in step S66), the speed determining unit 13 determines whether or not reproduction of audio has preceded more than reproduction of video (step S67). As a result of the determination, when the video reproduction has preceded more than the audio reproduction (NO in step S67), the speed determining unit 13 determines 6 times as the audio reproduction speed (step S68). In other words, the audio reproduction speed is increased so as to catch up with the video reproduction. On the other hand, as a result of the determination, when the audio reproduction has preceded more than the video reproduction (YES in step S67), the audio reproduction speed is decreased to two times so as to cause the video reproduction to catch up with the audio reproduction (step S69). Thus, the audio speed determining process is completed.
Thus, in the third embodiment, when a speed is temporarily changed to perform fast playback, the time difference between video and audio is measured and corrected. Thereby, it is possible to prevent the time difference between audio and image from being increased during reproduction and prevent the audio and image from being reproduced while presenting unmatched contents.
Next, a fourth embodiment of the present invention will be described with reference to FIGS. 13 to 15. In the fourth embodiment, a synchronization point is set in an arbitrary place in a program based on the analysis result saved in the accumulation unit in the above-described second embodiment, and video and audio are synchronized with each other at the synchronization point during reproduction.
Hereinafter, a detailed operation of the recording/reproduction process of the fourth embodiment of the present invention will be described with reference to
In
Following the above-described step S62, after the processes of steps S63 and S64, in step S65 the audio reproduction speed converting unit 14 and the video reproduction speed converting unit 15 perform a process which is similar to that of step S43 of the above-described second embodiment, to reproduce video and audio. In this case, the audio reproduction speed converting unit 14 and the video reproduction speed converting unit 15 reference the above-described synchronization information, and if video or audio being reproduced reaches a synchronization point, performs reproduction while skipping video or audio based on the skip information associated with the synchronization point.
Thus, in the fourth embodiment, a synchronization point can be previously set based on the analysis result. A deviation between video and audio of a reproduced program is corrected at the synchronization point. Thereby, it is possible to finely correct the deviation between video and audio.
Note that, regarding the setting of a synchronization point, a tolerable range for a deviation between audio and video may be provided, and only when a point exceeds the tolerable range, the point may be set as a synchronization point. Further, the tolerable range may be changed as appropriate, depending on scenes, based on the analysis result obtained from the AV analyzing unit 12. For example, the following control may be performed: in the case of a scene having a significant motion (sports, etc.) or a scene having pan video (a landscape, etc.), the tolerable range for synchronization deviation is set to be as large as several seconds, and conversely, in the case of a scene in which an announcer speaks, the tolerable range for synchronization deviation is set to be as small as several tens of milliseconds to several hundreds of milliseconds. Also, for example, the number of synchronization points (frequency of synchronization) or the tolerable range for synchronization deviation, may be set in advance by the user.
The recording/reproduction method, the recording/reproduction apparatus, and the recording medium storing the recording/reproduction program, of the present invention, can analyze both audio and image to control reproduction speeds of both the audio and image, depending on scenes, and are useful for applications, such as an accumulation type video recorder, personal computer software for editing, and the like.
Number | Date | Country | Kind |
---|---|---|---|
2005-027315 | Feb 2005 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP06/01468 | 1/30/2006 | WO | 10/6/2006 |