The present invention relates to mute processing apparatuses and methods, and particularly to a mute processing apparatus and method for automatically sending mute frames during a multi-person communication over a network.
With the development of communication networks and the associated services, a multi-person communication has been introduced and regarded as one typical service in the communication networks, such as the Public Switched Telephone Network (PSTN) and the Voice Over Internet Protocol (VoIP) Network. Furthermore, the multi-person communication has been also pervasively applied in a network telephone conversation or a network TV conversation. Such multi-person communication can support multiple persons simultaneously to communicate, and deliver speech data from broadcaster(s) to the listeners.
In order to provide a friendly communication environment, it is critical for the multi-person communication to utilize system resources to reduce delay, namely to deliver the speech data to each person as soon as possibly. With respect to the delay problem, controlling a transferring amount of the speech data is an effective method to solve the delay problem. However, in the currently multi-person communication, it transfers all speech data regardless of sound data or soundless data over the communication network. In other words, the soundless data is send over the communication network as well as the sound data. As a result, the soundless data increases a loading of the communication network and the transferring amount of the speech data. Consequently, a delay phenomenon may appear due to the unnecessary soundless data, therefore, a service quality of the multi-person communication system may be weaken due to the delay phenomenon.
What is needed, therefore, is a mute processing apparatus and method used in the multi-person communication system, which can automatically send mute frames when there are no sound inputs from terminals of the multi-person communication system without sending unnecessary soundless data, thereby reducing the transferring amount of the speech data due to rather less sizes of the mute frames as compared to the soundless data.
A mute processing apparatus is provided. The apparatus is capable of automatically sending mute frames when persons don't talk or keep mute during a multi-person communication over a network. The apparatus mainly includes a sampling unit, an energy calculating unit, a coding unit, a processing unit, and an output unit. The sampling unit is for collecting input signals from a microphone. The energy calculating unit is for calculating an energy level of input signals within a time span. The coding unit is for coding the input signals within the time span. The processing unit is for sending a mute frame within the time span if the energy levels of the input signals within previous continuous time spans are less than a predetermined energy level and the energy level of the input signals within the time span is still less than the predetermined energy level, otherwise, controlling the coding unit to code the input signals within the time span. The output unit is for outputting the mute frame from the processing unit or coded signals from the coding unit.
A mute processing method is also provided. The method includes the steps of: (a) collecting input signals from a microphone; (b) calculating an energy level of input signals within a time span; (c) sending a mute frame if the energy levels of the input signals within previous continuous time spans are less than a predetermined energy level and the energy level of the input signals within the time span is still less than the predetermined energy level, otherwise, coding the input signals within the time span; and (d) outputting the mute frame or coded signals.
Other advantages and novel features will be drawn from the following detailed description with reference to the attached drawing, in which:
The sampling unit 10 is for collecting input signals from a microphone (not shown) connected to the apparatus.
The energy calculating unit 11 is for calculating an energy level of the input signals within a time span. That is, the energy calculating unit 11 regards the input signals within one time span as a sound unit, and calculates the energy level of each sound unit.
If the energy level of the sound unit is equal to or greater than a predetermined energy level, that means the terminal of the multi-person communication is inputting sound within the time span, therefore, a speech coding operation is required on the sound unit input signals to obtain coded signals to be sent out over the network.
If energy levels of several continuous sound units are a combination consisting of energy levels greater than (equal to) and less than the predetermined energy, that means the terminal of the multi-person communication may have one or more input pauses over several continuous time spans, therefore, the speech coding operation is required on each sound unit input signals over the several continuous time spans to obtain coded signals to be sent out over the network.
If the energy levels of several previous continuous sound units are less than the predetermined energy level, and the energy level of a following sound unit is still less than the predetermined energy level, that means there is no sound input from the terminal of the multi-person communication within the several previous continuous time span and the time span of the following sound unit, therefore, a mute frame is sent within the time span of the following sound unit instead of performing a speech coding operation on the following sound unit input signals.
The coding unit 12 is for performing the speech coding operation on the input signals within the time span, that is, for coding the sound unit input signals. The counter 13 provides a value for indicating a count of continuous sound units whose energy level is less than the predetermined energy level. Furthermore, an initial value of the counter 13 is zero. The output unit 14 is for outputting coded signals from the coding unit 13 or the mute frame.
The processing unit 15 is for controlling the components of the apparatus, i.e., the sampling unit 10, the energy calculating unit 11, the coding unit 12, the counter 13, the output unit 14, the volatile storage unit 16, and the non-volatile storage unit 17.
The processing unit 15 resets the value of the counter 13 as the initial value and simultaneously signals the coding unit 12 to perform the speech coding operation on the sound unit input signals within the time span, if the energy level of the sound unit is equal to or greater than the predetermined energy level.
The processing unit 15 also signals the coding unit 12 to perform the sound coding operation on each sound unit input signals within the several continuous time spans, if the energy levels of several continuous sound units are the combination of energy levels greater than (equal to) and less than the predetermined energy level, or the energy levels of several continuous sound units are less than the predetermined energy level. The processing unit 15 further increases the value of the counter 13 by one while simultaneously performing the speech code operation on each sound unit input signals.
Moreover, the processing unit 15 sends the mute frame instead of performing the sound coding operation on the sound unit input signals and increasing the value of the counter 13, if the energy levels of several previous continuous sound units are less than the predetermined energy level and the energy level of the following sound unit is still less than the predetermined energy level. That is, if the value of the counter 13 is equal to a predetermined value and the energy level of the following sound unit is less than the predetermined energy level, the processing unit 15 signals the output unit 14 to output the mute frame.
Therefore, in the case when the value of the counter 13 equals to the predetermined value and the energy levels of the several continuous sound units are still less than the predetermined energy, the apparatus simply sends the mute frame within each corresponding time span of the sound units. Otherwise, the apparatus sends coded signals within each corresponding time span. Because a size of the mute frame is rather less than a size of the sound unit coded signals, a transferring amount of the mute frame is consequentially less than a transferring amount of the sound unit coded signals. As a result, a loading of the network is available to be reduced due to less transferring amounts of mute frames, thereby eliminating delay phenomena.
Additionally, the volatile storage unit 16 is for storing the input signals of each sound unit and the energy level of the input signals of each sound unit. The non-volatile storage unit 17 is for storing the predetermined energy level and the predetermined value.
If the energy level calculated is not less than the predetermined energy level, in step S203, the processing unit 15 resets the value of the counter 13. In step S204, the coding unit 12 performs the speech coding operation on the current sound unit input signals, and the output unit 14 sends out the coded signals from the coding unit 12 over the network. In step S205, the energy calculating unit 11 calculates the energy level of the input signals within a following time span, and the procedure goes to step S202 described above.
If the energy level calculated is less than the predetermined energy level, in step S206, the processing unit 15 determines whether the value of the counter 13 is less than the predetermined value.
If the value of the counter 13 is less than the predetermined value, in step S207, the coding unit 12 performs the speech coding operation on the current sound unit input signals, and the output unit 14 sends out the coded signals from the coding unit 12 over the network. In step S208, the processing unit increases the value of the counter 13 by one. In step S209, the energy calculating unit 11 calculates the energy level of the input signals within a following time span, and the procedure goes to step S202 described above.
If the value of the counter 13 is not less than the predetermined value, that means there is no sound input from the terminal of the multi-person communication within the current time span, in step S210, the processing unit 15 signals the output unit 14 to output the mute frame over the network, thereby reducing the loading of the network. In step S211, the energy calculating unit 11 calculates the energy level of the input signals within a following time span. In step S212, the processing unit 15 determines whether the energy level calculated is less than the predetermined energy level. If so, that means there is still no sound input from the terminal of the multi-person communication, and the procedure goes to step S211 described above to output the mute frame again over the network. If not, that means the terminal of the multi-person communication is inputting sound, and the procedure goes to step S203 described above.
Although the present invention has been specifically described on the basis of a preferred embodiment and preferred method thereof, the invention is not to be construed as being limited thereto. Various changes or modifications may be made to the embodiment and method without departing from the scope and spirit of the invention.
Number | Date | Country | Kind |
---|---|---|---|
200510101219.1 | Nov 2005 | CN | national |