This application relates to and claims priority from Japanese Patent Application No. 2009-092009 filed on Apr. 6, 2009, the entire disclosure of which is incorporated herein by reference.
(1) Field of the Invention
The present invention relates to a device and a method of encoding a moving image, and in particular, to a device and a method of encoding a moving image by which a delay or an underflow of data may be prevented when used in a real-time video and audio communication system, such as a TV (television) phone and teleconference.
(2) Description of the Related Art
In recent years, video and audio communication devices, such as TV phones and teleconference, have become popular with the development of the video encoding technique and communication lines. In addition, a function has been mounted on mobile products, such as mobile phones, whereby real-time video and audio communication may be accomplished.
On the other hand, with the development of the imaging technique and the encoding technique, cameras by which HD (High Definition) video may be shot have been put on the market, and the real-time video and audio communication system with HD video quality is also expected. However, there is a problem in the real-time video and audio communication system using HD video that a delay between two places is increased because of an increase in data amount, therefore the communication between both sides cannot be continued smoothly.
As a device for encoding a moving image by which the aforementioned delay in the video and audio communication may be reduced, the devices disclosed in Japanese Patent Application Laid-Open H7 (1995)-193821 and Japanese Patent Application Laid-Open 2006-80788 may be cited.
In the device disclosed in the former Patent Document, a delay is reduced by clearing the data in a transmission buffer when the transmission side receives a screen update request that has been issued if a decoding error is detected on the reception side. In addition, with the input moving image, immediately after the aforementioned processing, being intra-encoded by intra-frame processing, it is prevented that a decoding error may again occur on the reception side.
In the device disclosed in the latter Patent Document, a delay is reduced by only the transmission side with the use of an easier logic. That is, data in a transmission buffer, which is being monitored, are cleared after an input moving image is intra-encoded when a certain amount or more of data are accumulated in the transmission buffer. Thereby, a delay may be reduced and a decoding error may be prevented from occurring on the reception side.
A delay may be reduced and a decoding error may be eliminated on the reception side by the techniques disclosed in the aforementioned Patent Documents.
However, in the former Patent Document, minimum encoded data, necessary for transmission, have to be stored in the transmission buffer after predicting the period required from when the encoded data are cleared to when the encoded data of the next moving image is completed. In this case, because accurate prediction of the timing when the encoded data in the transmission buffer will be transmitted is difficult, there is a possibility that an underflow in the transmission buffer may occur due to the difference between the optimal value and the real value of the remaining amount of the encoded data in the transmission buffer.
Also, in the latter Patent Document the timing of reducing a delay is sometimes delayed because the data to be transmitted are cleared after the intra-encoding is completed.
In addition, when video and audio communication is generally performed, video and audio are multiplexed into the formats, such as the TS (Transport Stream) format and the PS (Program Stream) format. In both the aforementioned Patent Documents, the boundary between picture units (I-picture; Intra Picture, P-picture; Predictive Picture, B-picture; Bi-directionally Predictive Picture) is taken into consideration; however, the data in the transmission buffer are cleared without consideration of the packet boundary between the MPEG (Moving Picture Experts Group) system layers, such as TS and PS. Therefore, the packet boundary between receiving streams may be shifted depending on a decoding device on the reception side, thereby possibly causing a decoding error.
In view of the aforementioned circumstances, an object of the present invention is to provide a device and a method of encoding a moving image by which a delay may be reduced and an underflow of data may be prevented. Another object of the invention is to provide a device and a method of encoding a moving image by which a decoding error is made less likely to occur in accordance with the constraint of the decodable input format in a moving image decoding device on the reception side.
In order to achieve these objects, the present invention relates to a device for encoding a moving image. The device for encoding a moving image comprises: an imaging unit that generates the moving image by shooting a subject; an encoder circuit that encodes data of the moving image; a stream buffer that accumulates the encoded data of the moving image that is supplied from the encoder circuit; a network circuit that transmits the encoded data of the moving image, which has been accumulated in the stream buffer, to a network; and a system controller that operatively controls the device for encoding a moving image, wherein, when an accumulated amount of the encoded data, which has been accumulated in the stream buffer, is greater than or equal to a predetermined threshold value, the system controller controls the device for encoding a moving image such that a read position at which the encoded data is to be read from the stream buffer is advanced forward on the time axis.
Further, the present invention relates to a method of encoding a moving image. The method of encoding a moving image comprises: an imaging step that creates the moving image by shooting a subject; an encoding step that encodes data of the moving image; an accumulation step that accumulates the encoded data of the moving image that has been created in the encoding step; a communication step that transmits the encoded data of the moving image, which has been accumulated in the accumulation step, to a network; and a system control step that controls the encoding step, the accumulation step, and the communication step, wherein the system control step includes an accumulated amount determination step that determines whether an accumulated amount of the encoded data of the moving image, which has been accumulated in the accumulation step, is greater than or equal to a predetermined threshold value, and wherein, when it is determined that an accumulated amount of the encoded data of the moving image, which has been accumulated in the accumulation step, is greater than or equal to a predetermined threshold value as a result of the determination in the accumulated amount determination step, the system control step controls the accumulation step such that a read position at which the encoded data is to be read from the accumulation step is advanced forward on the time axis.
According to the present invention, a device and a method of encoding a moving image by which a delay is reduced may be provided. In an embodiment of the invention, a device and a method of encoding a moving image by which an underflow of data is prevented may be provided. In another embodiment of the invention, a device and a method of encoding a moving image by which a decoding error is made less likely to occur in a moving image decoding device on the reception side may be provided. There is an effect in any case that a TV phone system or a TV conference system may be improved in terms of usability.
These and other features, objects and advantages of the present invention will become more apparent from the following description when taken in conjunction with the accompanying drawings wherein:
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying
In the real-time video and audio communication, a start request is issued by a decoding device, which is located remotely, and an encoding device initiates the processing after receiving the request. In the encoding device 100 that transmits video, an optical signal, which has been inputted from the lens 101, is at first converted into an electrical signal by the imaging element 102 and then the analog electrical signal is converted into a digital signal. The camera DSP 103 converts a video signal, which has been inputted from the imaging element 102, into a format that may be inputted to the video encoder circuit 105. An audio input signal from the microphone 104 is inputted to the audio encoder circuit 106. The video elementary stream encoded by the video encoder circuit 105 and the audio elementary stream encoded by the audio encoder circuit 106 are packetized by the video-audio multiplexing circuit 107 into a format, such as the TS format or the PS format, and then accumulated in the stream buffer 108. The network circuit 109 is one for communicating with an external apparatus, and transmits the contents accumulated in the stream buffer 108 to a network. Communication with an external apparatus may be performed through the I/O terminal 110 and a network such as the Internet. The system controller 111 controls the whole system of the encoding device 100. Specifically, by controlling the camera DSP 103, the video encoder circuit 105, the audio encoder circuit 106, the video-audio multiplexing circuit 107, the stream buffer 108, and the network circuit 109, the processing on the transmission side of a real-time video and audio communication system is executed.
The stream buffer 108 accumulates the packetized stream by using a RAM (Random Access Memory). Although the network circuit 109 may be one for wireless communication or wired communication, the communication I/O terminal 110 may be omitted in the case of wireless communication. Data to be transmitted or received are video-audio streams but other various commands, such as a file transfer protocol, may be transmitted or received. The system controller 111 is structured mainly with a CPU (Central Processing Unit) and a flash memory, and the program beforehand stored in the flash memory is loaded and executed by the CPU.
An embodiment of the present invention will be described with reference to
Subsequently, the input moving image and the input audio, which have been created in the imaging step S201, are encoded in the encoding step of Step S202. As a type of encoding video, MPEG 2 (ISO/IEC 13818), MPEG 4 AVC/H. 264 (ISO/IEC 14496 10), etc., are used. As a type of encoding audio, AAC (Advanced Audio Coding), AC 3 (Dolby Digital Audio Code number 3), etc., are used. However, en encoding method other than the aforementioned methods may be used if the encoding method is supported by the decoding device on the side that has issued the request. Specifically, the video data encoded by the video encoder circuit 105 and the audio date encoded by the audio encoder circuit 106 are accumulated in the stream buffer 108 after being multiplexed by the video-audio multiplexing circuit 107.
When the encoding step S202 has ended, the network transmission step of Step S203 is executed. In the network transmission step S203, the encoded stream, which has been accumulated in the stream buffer 108, is transmitted to the decoding device (not illustrated) by the network circuit 109.
When the execution of the network transmission process S203 has ended, it is determined in Step S204 whether a stop request for the transmission, which has been issued by the decoding device, has been received. When no request has been received (“No” in the drawing), the processing will be repeated from the imaging step of Step S201. When a stop request for the transmission has been received (“Yes” in the drawing), the processing will end.
In Step S300, the input video signal and the input audio signal, which have been created in the imaging step S201, are encoded by the video encoder circuit 105 and the audio encoder circuit 106. The encoding is performed in a picture unit. The system controller 111 determines in Step S301 whether the type of the picture, which has been encoded immediately before, is an I-picture. The determination of whether the type of the picture is the I-picture may be made based on the data indicating the type of the picture in the encoded stream, or may be made by counting the number of the pictures because the I-picture is located at the head of a GOP (Group Of Picture). When it is determined that the I-picture is encoded (“Yes” in the drawing), the system controller 111 memorizes in Step S302 the head position of the I-picture and makes a transition to Step S303.
When it is determined in Step S301 that the type of the picture, which has been encoded immediately before, is not an I-picture (“No” in the drawing), the system controller makes a transition to the processing of Step S303. The system controller 111 determines in Step S303 whether the stream data size whose encoding has ended exceeds the transmission size that has been requested by the decoding device. When the data size exceeds the requested transmission size (“Yes” in the drawing), the system controller 111 ends the encoding step of Step 202 and makes a transition to the network transmission step of Step 203. When the data size does not exceed the requested transmission size (“No” in the drawing), the system controller 111 makes a transition to Step S300 and repeats the encoding processing of the next picture. Even when the data size does not exceed the requested transmission size, the network transmission step of Step S203 is executed.
As stated above, by comparing the data size with the transmission size that is requested by a decoding device such that the data, which is greater than the latter, is accumulated in the stream buffer 108, an underflow of the data in the stream buffer 108 may be surely prevented. The transmission sizes that are requested by decoding devices are mostly within the range of approximately ten Kbytes to approximately several tens Kbytes.
The system controller 111 determines in Step S400 whether the stream data size, which has been accumulated in the stream buffer 108, exceeds a predetermined threshold value. The threshold value may be set to be an optimal value by a system designer or a user. The bit rate of the TS stream with HD video quality is generally within the range of 10 Mbps to 25 Mbps, and an encoding amount of approximately 640 Kbytes to approximately 1.6 Mbytes is created per GOP. Because one GOP is equal to 0.5 seconds, an encoding amount has to be set to be approximately 500 Kbytes if it is necessary that a delay in an encoding device is suppressed to be less than or equal to 0.5 seconds.
When it is determined in Step S400 that the data size does not exceed the predetermined threshold value (“No” in the drawing), the communication is being continued smoothly, thereby allowing for a delay to be less likely to occur. Accordingly, the system controller 111 makes a transition to Step S403 such that the network circuit 109 sequentially transmits the accumulated data to the network. When it is determined that the data size exceeds the threshold value (“Yes” in the drawing), a possibility of occurring a delay is high. Accordingly, the system controller 111 makes a transition to Step S401 to determine whether the I-picture head position, which has been memorized in Step S302 in
When a plurality of the I-pictures are located in the accumulated data in the stream buffer 108, it is satisfactory that the I-picture, which has been lastly accumulated, that is, which is located at the last position on the time axis, is set to be the next read position. Thereby, the effect of reducing a delay may be made significant.
As stated above, when a delay becomes a problem because the accumulated data in the stream buffer 108 exceeds a predetermined threshold value, it is designed that the delay is reduced with the network circuit 109 skipping the read position to the I frame to transmit the accumulated data.
In the present embodiment, when the encoding step S202 illustrated in
An example of reducing a delay in the real-time video and audio communication has been described above with reference to
Subsequently, another method of reducing a delay in the network transmission step of S203, which is different from that in Embodiment 1, will be described with reference to
In TV phone devices or TV conference systems, an elementary stream of each of video and audio is generally transmitted to a network after being multiplexed into the TS or the PS by a multiplexing unit, such as the video-audio multiplexing circuit 107. The TS and the PS are respectively configured with continued packets with certain sizes (TS; 192 bytes, PS; 2048 bytes), and the packet sizes are unrelated to the frame sizes of video and audio.
On the decoding device side, the stream that has been packetized into the TS format or the PS format is generally separated into elementary streams of video and audio and then decoded. Accordingly, if a normal stream is not inputted to a video-audio separation circuit, normal separation processing cannot be performed depending on a decoding device, thereby possibly causing a breakdown. That is, there is a possibility that, if a stream in which the cycle of the packet boundary is not constant is inputted, the video/audio separation cannot be performed normally depending on a decoding device.
In order to make the method of reducing a delay according to Embodiment 1 effective irrespective of a decoding device, it is necessary to skip the read position in accordance with the cycle of the packet boundary.
However, when the data size in the stream buffer 108 is greater than a predetermined threshold value and a delay becomes a problem, the following problem arises if the transmission is initiated with the next I-picture head position being a starting position: It is assumed that the next I-picture is located at the position denoted with “I-picture head position” in
The position at which the previous transmission ended may be founded from both the transmission size that is requested by a decoding device and the packet size. By determining the difference between the position thus found and the next packet boundary, the illustrated correction size (obliquely striped area) is determined. When the position ahead by the correction size of the I-picture head position is assumed to be the I-picture head correction position, and when the I-picture head correction position is assumed to be the starting position of the next transmission, the cycle of the packet boundary does not vary. Therefore, the aforementioned problem of occurring a decoding error may be solved. In this case, the read position at which the encoded data is read from the stream buffer 108 is to be shifted by an integral multiple of the packet size.
When the transmission size that is requested by a decoding device is always multiple sizes of the packet size, the correction size always becomes 0. However, when the transmission size that is requested by a decoding device is not multiple sizes of the packet size, the packet boundary is to be shifted. Accordingly, the system controller 111 calculates, in Step S500 in
The example in which the skip of read position, which is intended to reduce a delay, is performed taking into consideration the packet boundary, has been described above with reference to
Subsequently, still another embodiment in which a decoding error may be prevented in a decoding device will be described with reference to
In the video encoding standards, such as MPEG 2 and MPEG 4 AVC/H. 264, there are two GOP structures of an Open GOP and a Closed GOP.
In general, division edit of a stream is mostly performed in a GOP unit. In the case of an Open GOP, there is a possibility that a decoding error may occur depending on a decoding device due to the aforementioned B-picture influence when the stream after the GOP boundary is intended to be decoded from the head, and hence a broken link flag is to be set. The broken link flag means a flag by which it is directed that the B-picture, which refers to the previous GOP, is neglected in an Open GOP. Accordingly, the stream in which a broken ling flag is set may communicate to a decoding device that it not necessary for a target picture to be decoded. In the case of the MPEG 2, a broken link flag may be set in the GOP header of the MPEG video layer. In the case of the MPEG 4 AVC/H. 264, a broken link flag may be set in the SEI (Supplemental Enhancement Information) in the NAL (Network Abstraction Layer) unit.
On the other hand, a closed GOP has the structure of the bit stream denoted with 702 in
As stated above, at the position where a read position is skipped and at the position where the packet boundary is skipped, which have been illustrated in Embodiments 1 and 2, there is a possibility that a decoding error may occur on a decoding device side when the GOP, which is to be read immediately after the skip, is an Open GOP. A way to prevent the problem will be described with reference to the flow chart in
The method of setting a broken link flag according to Embodiment 3 has been described above with reference to
Subsequently, another embodiment in which a decoding error may be prevented in a decoding device will be described with reference to
In Embodiment 3, a decoding error after the read position has been skipped may be prevented by setting a broken link flag; however, it is necessary to edit the encoded stream immediately before the transmission, and hence the burden of the encoding device becomes slightly large. Accordingly, in Embodiment 4, when a picture whose decoding is unnecessary is present after the read position has been skipped to the I-picture head position (or the I-picture head correction position), the read position will be further skipped.
When the I-picture head position (or the I-picture head correction position) is not located in the untransmitted data area in Step S1001 (“No” in the drawing), the system controller 111 makes a transition to Step S1006 such that the network circuit 109 sequentially transmits the accumulated data in the stream buffer 108 to the network. When the head P-picture position in the GOP (or the head correction position of the head P-picture in the GOP) is not located in the untransmitted data area in Step S1004 (“No in the drawing), the system controller 111 makes a transition to Step S1006 such that the network circuit 109 sequentially transmits the accumulated data in the stream buffer 108 to the network.
The method of preventing a decoding error after skipping to an I-picture in Embodiment 4 has been described above with reference to
The system configurations and process procedures described in Embodiments 1 through 4 are intended only to be illustrative of the present invention, and those skilled in the art will recognize that different configurations, different process procedures, and combination of each Embodiment may be made in the various features and elements of the invention without departing from the scope of the invention.
While we have shown and described several embodiments in accordance with our invention, it should be understood that disclosed embodiments are susceptible of changes and modifications without departing from the scope of the invention. Therefore, we do not intend to be bound by the details shown and described herein but intend to cover all such changes and modifications that fall within the ambit of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2009-092009 | Apr 2009 | JP | national |