Specific embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings in which:
Best modes for implementing the present invention will be described below.
The configuration of the media stream relay device according to the first implementing mode of the present invention will be described with reference to
An embodiment of the present implementing mode will be described next as the first embodiment of the present invention, with reference to the drawings. As shown in
The following description relates to the processing of media streams by media stream relay device 1 from packet switching network 3 to circuit switching network 2. Circuit switching network terminating unit 10 in media stream relay device 1 functions to terminate circuit switching network 2, and to receive multiplexed data from multiplexed data generating unit 17 (described later) and output the multiplexed data to circuit switching network 2. Note that circuit switching network terminating unit 10 also functions to receive data from circuit switching network 2, as discussed below.
On the other hand, media stream relay device 1 includes control packet control unit 11, video packet control unit 12 and audio packet control unit 13 for terminating packet switching network 3, and rearranges packets arriving from packet switching network 3 (here, the RTP protocol is assumed) based on sequence numbers, timestamps, or the like.
Control packet control unit 11 receives control packets from packet switching network 3, extracts an encoded control stream after rearranging the control packets, and outputs the encoded control stream to control stream processing unit 14. Similarly, video packet control unit 12 receives video packets from packet switching network 3, extracts an encoded video stream after rearranging the video packets, and outputs the encoded video stream to video stream processing unit 15.
Note that audio packet control unit 13 has a buffer for accumulating a fixed amount of audio packet data in order to absorb fluctuation on packet switching network 3. During the rearranging process, encoded audio data is extracted from the received audio packets and stored in the buffer paired with the assigned header information, and target encoded audio data is then output in response to requests from stream control unit 16.
Control stream processing unit 14 analyzes the encoded control stream, acquires the call control information of packet switching network 3, and outputs an encoded stream that is based on the call control information required in call connection with circuit switching network 2 to multiplexed data generating unit 17. Video stream processing unit 15 analyzes the encoded video stream, converts the encoded video stream to the video encoding system of circuit switching network 2 if necessary, and outputs the encoded video stream to multiplexed data generating unit 17.
Note that the present invention is not particularly limited in relation to the type of call control system or whether the conversion process using the video encoding system is implemented, provided the system is able to terminate the video encoding system and the call connection with both packet switching network 3 and circuit switching network 2. While the present embodiment adopts a configuration that also includes video processing, the present invention is not particularly limited to this, and may adopt a configuration that does not include video, that is, a configuration that does not include video packet control unit 12 or video stream processing unit 15.
Stream control unit 16 makes inquiries to determining unit 18 as to whether an encoded audio stream is acquirable from audio packet control unit 13, based on periodical request instructions from multiplexed data generating unit 17.
As aforementioned, received audio packets are stored in the buffer of audio packet control unit 13 after having been reordered in accordance with the header information (here, an RTP header is assumed, and includes information such as sequence numbers, timestamps, etc.). Determining unit 18 checks the data at head of the buffer in response to an inquiry from stream control unit 16, and returns a determination result as to whether acquisition is possible. Where it is determined that an encoded audio stream is acquirable, stream control unit 16 acquires the encoded audio stream from audio packet control unit 13, and outputs the acquired stream to multiplexed data generating unit 17. On the other hand, where it is determined that acquisition is not possible, stream control unit 16 does not acquire the target encoded audio stream but instead generates an encoded audio stream representing silence or noise, and outputs the generated stream to multiplexed data generating unit 17.
Note that where the audio encoding systems of both packet switching network 3 and circuit switching network 2 are the same, either an encoded stream representing silence or noise information provided in the audio encoding system or a device specific encoded stream may be used for encoded audio streams representing one of silence and noise information generated by stream control unit 16. Where the audio encoding systems of both packet switching network 3 and circuit switching network 2 are different, an encoded audio stream for outputting to circuit switching network 2 is generated by converting an input encoded audio stream, or specifically by decoding an input encoded stream and encoding the output of the decoding, although the present invention is not particularly limited in this respect.
Example of cases in which determining unit 18 determines that acquisition is not possible are given below.
(1) The buffer in audio packet control unit 13 is in the process of accumulating a fixed amount of data to absorb fluctuation on the packet switching network. This also includes the case where the buffer runs empty during communication.
(3) A relevant encoded audio stream does not exist after rearranging or disposing the received audio packets in accordance with the RTP header information (Mbits, sequence numbers, timestamps, etc.) as a result of not receiving audio packets representing silence because the transmission specification of media communication terminal 5, which is the source device in this case, conforms to intermittent transmission for transmitting only sound (and also noise depending on the audio encoding system).
Note that in (1) above, during the period before media stream relay device 1 initially receives audio packets after the start of communication, stream control unit 16 may output an encoded stream representing silence to circuit switching network 2 or withhold output until audio packets are initially received.
Multiplexed data generating unit 17 multiplexes encoded control, video and audio streams acquired respectively from control stream processing unit 14, video stream processing unit 15 and stream control unit 16, and outputs the multiplexed data to circuit switching network terminating unit 10. Note that multiplexing is possible even if all of the encoded streams cannot be acquired, and that output is performed after adding predetermined unique data if the bandwidth at the time of output has not been satisfied.
The block configuration of media stream relay device 1 according to the second implementing mode of the present invention is the same as the block configuration according to the first implementing mode of the present invention shown in
Media stream relay device 1 according to the second implementing mode of the present invention includes audio packet control unit 13 which has a buffer for absorbing fluctuation on packet switching network 3, and receives audio packets, extracts an encoded audio stream after having rearranged the received audio packets based on the header information of the packets, and stores the extracted encoded audio stream in the buffer, determining unit 20 which determines whether an encoded audio stream is acquirable from the buffer and outputs a determination result, stream control unit 19 which adjusts the output timing by generating and outputting an encoded audio stream representing one of silence and noise information based on the determination result, the header information originally assigned to previous encoded audio streams, and the header information assigned to the target encoded audio stream, and/or adjusts the output timing by generating and outputting an encoded audio stream representing one of silence and noise information based on the determination result, frame information for previous encoded audio streams, and frame information for the target encoded audio stream, and multiplexed data generating unit 17 which multiplexes the encoded audio stream, and outputs multiplexed data to the circuit switching network 2.
An embodiment of the present implementing mode will be described next as the second embodiment of the present invention with reference to the drawings, focusing only on the differences from the first embodiment of the present invention.
The present embodiment includes stream control unit 19 and determining unit 20 in place of stream control unit 16 and determining unit 18 in
Frame information here indicates information that is distinguishable into at least the two types of sound and silence, and possibly noise as a third type depending on the encoding system. Note that the frame information may be included as identification information in the encoded audio stream, or refer to the data size of an encoded audio stream or the output result of a determination process that enables an equivalent distinction to be made. The present invention is not particularly limited in this respect.
Note that it is envisaged that the buffer of audio packet control unit 13 will frequently run empty in the case of intermittent transmission in which media communication terminal 5 on packet switching network 3 transmits only audio packets that contain an encoded audio stream representing sound or noise (this depends also on the buffer size setting in media stream relay device 1). The method described in the present embodiment is primarily designed as an output timing adjustment method for dealing with such cases.
The determination method implemented by determining unit 20 uses at least one of RTP header information and frame information for encoded audio streams. Where an audio packet fails to arrive continuously after the buffer in audio packet control unit 13 has run empty, stream control unit 19 generates and outputs encoded audio data representing silence, as aforementioned. Determining unit 20 further considers at least one of the RTP header information of the audio packet that initially arrives or the frame information of the encoded audio stream contained in the audio packet as judgment information for determining whether an encoded audio stream is acquirable.
Where it is determined that acquisition is possible (to not inhibit output), stream control unit 19 acquires the encoded audio stream from audio packet control unit 13 and outputs the acquired stream to multiplexed data generating unit 17. On the other hand, where it is determined that acquisition is not possible (to inhibit output), stream control unit 19 does not acquire the encoded audio stream but instead generates an encoded audio stream representing silence or noise and outputs the generated stream to multiplexed data generating unit 17.
The processing flow based on the RTP header information judged by determining unit 20 will be described next with reference to
S1: Determining unit 20 judges whether the buffer was empty last time. If the buffer was empty, processing proceeds to S2, and if the buffer was not empty, processing proceeds to S7.
S2: Determining unit 20 checks the Mbit of the RTP header information. If 1, processing proceeds to S3, and if not 1, processing proceeds to S4. Note that while the processing flow in the present embodiment includes this determination, it may be omitted, in which case processing proceeds from S1 (YES) to S4.
S3: Determining unit 20 returns a result that acquisition is not possible. In view of this, processing moves to a buffer accumulation process for absorbing fluctuation, whereby output from audio packet control unit 13 is inhibited until a preset fixed amount of data accumulates in the buffer or a fixed time period elapses. This processing is also implemented when the determination process using frame information (S11 in
S4: Determining unit 20 calculates the difference between the sequence numbers of encoded audio streams previously output by stream control unit 19 and the sequence number of the target encoded audio stream. Note that here the target encoded audio stream is the encoded audio stream to be output in the case where it is determined that output is possible. Where the absolute value of the difference exceeds threshold X1, processing returns to S3, and where the absolute value of the difference does not exceed threshold X1, processing proceeds to S5.
S5: Determining unit 20 calculates the difference between the timestamps of encoded audio streams output by stream control unit 19 and the timestamp of the target encoded audio stream. If the absolute value of the difference exceeds threshold X2, processing returns to S3, and if the absolute value of the difference does not exceed threshold X2, processing proceeds to S6.
S6: Determining unit 20 calculates a value as the difference of the number of times stream control unit 19 has already generated an encoded audio stream representing silence or noise from a conversion value obtained by converting the difference calculated in S5 into an equivalent number of frames in processing units based on the audio encoding system being used. If the value is positive, determining unit 20 returns a result that acquisition is not possible for the equivalent number of times including this time when an acquisition request is received from stream control unit 19, and output of the encoded audio stream in the buffer to multiplexed data generating unit 17 is inhibited. During this interval, stream control unit 19 generates an encoded audio stream representing one of silence and noise and outputs the generated stream to multiplexed data generating unit 17.
On the other hand, if the value is negative, determining unit 20 returns a result that acquisition is not possible, and in view of this, processing moves to the buffer accumulation process for absorbing fluctuation, whereby output from audio packet control unit 13 is inhibited until a preset fixed amount of data accumulates in the buffer or a fixed time period elapses. During this interval, stream control unit 19 instead generates an encoded audio stream representing one of silence and noise information, and outputs the generated stream to multiplexed data generating unit 17. Where the determination process using frame information (described later) is included, processing is performed in accordance with a final result obtained by further implementing the determination process based on frame information (S11 in
S7: Determining unit 20 checks whether the target buffer size exceeds X3. If more than X3, determining unit 20 returns a result that acquisition is possible after having reduced the buffer size (S8), and the encoded audio stream positioned at the head of the adjusted buffer is output from audio packet control unit 13 to multiplexed data generating unit 17 via stream control unit 19. If the target buffer size does not exceed X3, determining unit 20 returns a result that acquisition is possible (S7), and the encoded audio stream is output from audio packet control unit 13 to multiplexed data generating unit 17 via stream control unit 19.
Note that the order in which the reduction adjustment and the output from audio packet control unit 13 are implemented may be reversed, and that the reduction adjustment may be implemented by discarding encoded streams in order from the oldest, or by preferentially discarding encoded streams representing at least one of silence and noise information. Note also that thresholds X1 and X2 in the flowchart may be updated sequentially according to the arrival interval between successive packets, and that threshold X3 may vary dynamically according similarly to the arrival interval between successive packets, as well as the packet loss rate, fluctuation or the like on the packet switching network.
As a result of the above series of judgments, determining unit 20 returns a judgment result to stream control unit 19, which adjusts the output timing of packets received after the buffer ran empty based on this judgment result. If there is no output, stream control unit 19 generates an encoded audio stream representing silence and outputs the generated stream to multiplexed data generating unit 17.
The processing flow based on frame information further judged by determining unit 20 will be described next with reference to
S10: Determining unit 20 judges whether the buffer was empty last time. If the buffer was empty, processing proceeds to S11, and if the buffer was not empty, processing proceeds to the aforementioned S7. Note that where the processing flow shown in
S11: Determining unit 20 checks the frame information of the encoded audio stream. If sound, processing proceeds to S13, and if silence, determining unit 20 returns a result that acquisition is possible to stream control unit 19, which immediately acquires the encoded audio stream from audio packet control unit 13, and outputs the acquired stream to multiplexed data generating unit 17. Depending on the encoding system, there may also be frame information for noise information, in which case processing proceeds to S12.
S12: Determining unit 20 calculates the temporal difference from the last encoded audio stream representing noise information previously output by stream control unit 19. If the difference exceeds threshold Y1, determining unit 20 returns a result that acquisition is possible to stream control unit 19, which immediately acquires the encoded audio stream from audio packet control unit 13, and outputs the acquired stream to multiplexed data generating unit 17. If the difference does not exceed threshold Y1, processing proceeds to S14.
S13: Determining unit 20 calculates the temporal difference from the last encoded audio stream representing sound previously output by stream control unit 19. If the difference does not exceed threshold Y2, determining unit 20 returns a result that acquisition is possible to stream control unit 19, which immediately acquires the encoded audio stream from audio packet control unit 13, and outputs the acquired stream to multiplexed data generating unit 17. If the difference does exceed threshold Y2, processing proceeds to S15.
S14: Determining unit 20 checks how much times has passed since an encoded audio stream representing noise was last output, and calculates the time difference from the transmission cycle time of encoded noise information streams encoded with the audio encoding system being used. Determining unit 20 further calculates a divided time difference by dividing the calculated time difference by the processing unit of the audio encoding system being used. If the divided time difference is positive, output of the encoded audio stream is inhibited for a number of times equivalent to that value. In the interval during which output is inhibited, stream control unit 19 generates an encoded audio stream representing one of silence and noise, and outputs the generated stream to multiplexed data generating unit 17.
On the other hand, if the divided time difference is negative, determining unit 20 returns a result that acquisition is not possible to stream control unit 19, and processing moves to the buffer accumulation process for absorbing fluctuation, whereby output from audio packet control unit 13 is inhibited until a preset fixed amount of data accumulates in the buffer or a fixed time period elapses. During this interval, stream control unit 19 instead generates an encoded audio stream representing one of silence and noise information, and outputs the generated stream to multiplexed data generating unit 17.
S15: Processing moves to the buffer accumulation process for absorbing fluctuation, whereby output from audio packet control unit 13 is inhibited until a preset fixed amount of data accumulates in the buffer or a fixed time period elapses. During this interval, stream control unit 19 instead generates an encoded audio stream representing one of silence and noise information, and outputs the generated stream to multiplexed data generating unit 17.
Note that in S14 and S15 above, apart from immediately outputting the target encoded audio stream when notified that acquisition is possible, stream control unit 19 may discard the target encoded audio stream and output the following encoded audio stream.
Also, threshold Y1 in the above processing flow is, as a general rule, preferably determined based on the specifications of the audio encoding system used, although this threshold may be updated sequentially according to the arrival interval between successive packets. Threshold Y2 preferably is not set to an excessively large value, and may be set based also on the RTP header information.
According to the method described above, more preferable media communication can be expected by appropriately generating encoded audio streams representing silence, outputting the generated streams, and adjusting the output timing, in the case of intermittent transmission in which only encoded audio streams representing sound, or possibly noise depending on the encoding system, are transmitted from source media communication terminal 5 on packet switching network 3.
The configuration of media stream relay device 1 according to the third implementing mode of the present invention will be described with reference to
An embodiment of the present implementing mode will be described next as the third embodiment of the present invention, with reference to the drawings. In the present embodiment, circuit switching network terminating unit 10 functions to terminate circuit switching network 2, and outputs multiplexed data sent from circuit switching network 2 to multiplexed data separating unit 21. On the other hand, the function of terminating packet switching network 3 with respect to the encoded control, video and audio streams is fulfilled respectively by control stream packetizing unit 22, video stream packetizing unit 23, and audio stream packetizing unit 24, which each function to packetize respective encoded streams and send the generated packets to packet switching network 3.
Multiplexed data separating unit 21 receives and separates the multiplexed data into respective encoded control, video and audio streams, and outputs the separated streams respectively to control stream processing unit 25, video stream processing unit 26 and audio stream packetizing unit 24. Control stream processing unit 25 analyzes the encoded control stream input from multiplexed data separating unit 21, and acquires call control information. Control stream processing unit 25 then generates an encoded control stream for establishing call connection with packet switching network 3, and outputs the generated stream to control stream packetizing unit 22. Video stream processing unit 26 analyzes the encoded video stream input from multiplexed data separating unit 21, and converts video acquired from the call control information to an encoded video stream for packet switching network 3, and outputs the generated stream to video stream packetizing unit 23.
Note that the present embodiment, similarly to the first embodiment, is not particularly limited in relation to the type of call control system or whether the conversion process using the video encoding system is implemented, provided the system is able to terminate the video encoding system and the call connection with both packet switching network 3 and circuit switching network 2. Further, while the present embodiment adopts a configuration that also includes video processing, the present invention is not particularly limited to this, and may adopt a configuration that does not include video, that is, a configuration that does not include video stream processing unit 26 or video stream packetizing unit 23.
In the present embodiment, encoded audio streams included in the multiplexed data of circuit switching network 2 are assumed to have been encoded based on sound, silence, and possibly noise information depending on the encoding system, and are described as containing uniquely associated information called frame information as an identifier of each encoded audio stream.
Audio stream packetizing unit 24 checks frame information corresponding to encoded audio streams input from multiplexed data separating unit 21, and packetizes only encoded audio streams having frame information that represents sound or noise information for transmission. Where encoded audio streams have frame information representing silence, audio stream packetizing unit 24 merely updates the assigned RTP header information, and does not packetize these encoded audio streams for transmission.
The configuration of media stream relay device 1 according to the fourth implementing mode of the present invention will be described with reference to
An embodiment of the present implementing mode will be described next as the fourth embodiment of the present invention with reference to the drawings, focusing only on the differences from the third embodiment. In the fourth embodiment, it is presumed that multiplexed data transmitted from circuit switching network 2 contains encoded audio streams encoded at the same compression rate without distinguishing between sound, silence, and also noise information depending on the encoding system.
In the present embodiment, the output destination of encoded audio streams separated by multiplexed data separating unit 21 is audio decoding unit 30. Sound decoding unit 30 decodes encoded audio streams input from multiplexed data separating unit 21, and outputs the resultant audio data to sound determining unit 31. Sound determining unit 31 divides the input audio data into predetermined interval lengths, and outputs determination results as to whether the individual intervals contain sound to audio encoding unit 32, together with the audio data. Audio encoding unit 32 checks the plurality of sound determination results that each correspond to the interval length of an encryption unit, and performs encoding at different compression rates by encoding the audio data as sound if at least one of the intervals is determined to contain sound, and encoding the audio data so as to greatly reduce the encoded data size, having judged that the audio data contains silence if none of the intervals are determined to contain sound.
Audio encoding unit 32 then outputs a variable compression rate encoded audio stream obtained as a result of this encoding process to audio stream packetizing unit 33. Note that audio encoding unit 32 is a means for differentiating encoded data sizes using sound and silence, and operates in accordance with the result of the analysis by control stream processing unit 25 and the same processing unit in the opposite direction (control stream processing unit 14 in the first embodiment). The same audio encoding system as audio decoding unit 30 may be used, or a completely different audio encoding system may be used.
Audio stream packetizing unit 33 packetizes only variable compression rate encoded audio streams that include frame information representing sound or noise for transmission, based on the frame information in the variable compression rate encoded audio streams input from audio encoding unit 32. Where a variable compression rate encoded audio stream includes frame information representing silence, audio stream packetizing unit 33 merely updates the assigned RTP header information, and does not transmit packets corresponding to the variable compression rate encoded audio stream.
As a result of the processing described above in the third and fourth embodiments, the effect of limiting the bandwidth and the number of packet transmitted to packet switching network 3 from media stream relay device 1 can be expected by performing intermittent transmission according to the type of frame information (sound, silence, and possibly noise information depending on the encoding system) in encoded audio streams received from circuit switching network 2.
The fifth implementing mode of the present invention is a computer program that causes a general-purpose information processing apparatus to realize functions corresponding to media stream relay device 1 of the above embodiments by being installed on the information processing apparatus. This program is able to cause the general-purpose information processing apparatus to realize functions corresponding to media stream relay device 1 of the above embodiments by being installed on the information processing apparatus via a recording medium onto which the program has been recorded, or by being installed on the information processing apparatus via a communication line.
The present invention makes it possible to realize media communication after having limited the number of audio packets for transmission as much as possible at a media stream relay device interposed between a circuit switching network and a packet switching network. It is thereby possible to contribute to improving service quality and enhancing convenience for both network providers and network users.
Number | Date | Country | Kind |
---|---|---|---|
2006-111027 | Apr 2006 | JP | national |