1. Technical Field
The present disclosure relates to speech controlling systems, and more particularly to a speech recognition system and a method for an electronic device.
2. Description of Related Art
Voice control technology can be used with a variety of electronic devices, such as robots, electronic toys, telephones, and home appliances. Behaviors of the electronic devices can be controlled by voice commands of users. For example, a robot may turn left or turn right when receiving a corresponding voice command from a user. However, speech recognition technology is not yet perfected and the electronic devices may not recognize commands correctly and thus perform the wrong actions.
Referring to
The sampling module 11 samples a plurality of speech signals which are repeatedly produced by an acoustic sound source.
The spectrum converting module 12 obtains frequency spectrum images of each of the plurality of speech signals by examining frequency spectral compositions of each of the plurality of speech signals. The spectrum modifying module 13 adjusts the frequency spectrum images of the plurality of speech signals so that they all are the same width to obtain a plurality of training objects, as detailed below.
Referring to
The training module 14 obtains specific data of the plurality of speech signals by analyzing the training objects of the plurality of speech signals. In this embodiment, the specific data includes a set of probability values. Each of the probability values is obtained by overlapping the training objects. Each of the probability values represents a probability that the training objects appear at a point in an image area. The image area is formed by lines that enclose the overlapped training objects. In this illustrated embodiment, a rectangular image area is formed by four lines L that enclose each of the training objects 30. The rectangular image areas have the same length and width. The rectangular image areas with the training objects 30 may be overlapped to allow the training module 14 to calculate a set of probability values, which represent probabilities that the training objects 30 appear at different points in the overlapped rectangular image areas. For example, there may be a 90% chance that the training objects 30 appear at one point in the overlapped rectangular image areas, and no chance that the training objects 30 appear at another point in the overlapped rectangular image areas. Each point at which 90% (or other predetermined probability value) that the training objects 30 appear may be considered a high coincidence point.
The linking module 15 links the specific data and the meaning of the speech signals together. The meaning of the speech signals may be preprogrammed in code, which can be executed by the processor 20 to control an electronic device to do something, such as turn left. The specific data and the linked meaning are stored in the storing module 16. In this embodiment, the storing module 16 can store specific data of a plurality of speech signals having different meanings.
Therefore, when a speech command is voiced in the area of the electronic device, the speech command is firstly sampled by the sampling module 11. A frequency spectrum image of the speech command is obtained, and is modified to be the same width as the training objects. Then the frequency spectrum image of the speech command is received by the comparing module 17.
The comparing module 17 compares the modified frequency spectrum image of the speech command with specific data of the plurality of speech signals stored in the storing module 16, to determine a meaning of the speech command. In this embodiment, the comparing module 17 may find a speech signal that is the nearest match to the speech command through a rough comparison of the modified frequency spectrum image of the speech command with specific data stored in the storing module 16. A similarity degree of the speech command and the speech signal is then determined according to the specific data of the speech signal, by overlapping the modified frequency spectrum image of the speech command to the image area. For example, the similarity degree may be 85% in response to the frequency spectrum image of the speech command being superposed on 85% of the high coincidence points of the overlapped rectangular square image areas. If the similarity degree is equal to or greater than a predetermined value then it is accepted as a match.
Referring to
In step S1, the sampling module 11 samples a plurality of speech signals which have the same meanings and similar lengths.
In step S2, the spectrum converting module 12 obtains frequency spectrum images of each of the plurality of speech signals by examining frequency spectral compositions of each of the plurality of speech signals. There may be differences among the plurality of speech signals, such as duration, and loudness of the plurality of speech signals, therefore, the frequency spectrum images may be different in sizes.
In step S3, the spectrum modifying module 13 modifies the frequency spectrum images of the plurality of speech signals to be the same width to obtain a plurality of training objects. In this embodiment, the frequency spectrum images are modified by labeling the start point and the end point of each of the frequency spectrum images, and adjusting the start point S and the end point E of each of the frequency spectrum images 20 to have the predetermined distance therebetween.
In step S4, the training module 14 obtains specific data of the plurality of speech signals by analyzing the corresponding training objects. The specific data may represent a probability that the training objects appear at each point in the image area. The image area may include a plurality of high coincidence points.
In step S5, the linking module 15 links the specific data with the meaning of the speech signals. For example, the specific data may be linked with a meaning “dance.”
In step S6, the storing module 16 stores the specific data and the linked meaning. A plurality of speech signals having different meanings may be sampled in step S1, and therefore stored in the storing module 16 with corresponding meanings.
In step S7, the comparing module 17 determines a meaning of a speech command according to specific data and the linked meanings stored in the storing module 16. The comparing module 17 may find a speech signal that is the nearest match to the speech command through a rough comparison of a modified frequency spectrum image of the speech command with specific data stored in the storing module 16. A similarity degree of the speech command and the speech signal can be calculated by calculating a percentage that the modified frequency spectrum image of the speech command appearing at high coincidence points in an image area of the speech signal. It can be determined that the speech command has the same meaning with the speech signal, in response to the similarity degree being equal to or greater than the predetermined value.
In other embodiments, the system and method can be alternatively used in other acoustic recognition systems,
The foregoing description of the exemplary embodiments of the disclosure has been presented only for the purposes of illustration and description and is not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Many modifications and variations are possible in light of the above everything. The embodiments were chosen and described in order to explain the principles of the disclosure and their practical application so as to enable others of ordinary skill in the art to utilize the disclosure and various embodiments and with various modifications as are suited to the particular use contemplated. Alternative embodiments will become apparent to those of ordinary skills in the art to which the present disclosure pertains without departing from its spirit and scope. Accordingly, the scope of the present disclosure is defined by the appended claims rather than the foregoing description and the exemplary embodiments described therein.
Number | Date | Country | Kind |
---|---|---|---|
200910302276.4 | May 2009 | CN | national |