1. Technical Field
The present invention relates in general to a system and method for using semantic analysis to configure a voice reader. More particularly, the present invention relates to a system and method for dynamically selecting voice attributes that correspond to a text block's semantic content and using the voice attributes to convert the text block into synthesized speech.
2. Description of the Related Art
Voice readers are used to convert a text file into synthesized speech. The text file may be received from an external source, such as a web page, or the text file may be received form a local source, such as a compact disc. For example, a user with impaired vision may use a voice reader which receives a web page from a server through a computer network (i.e. Internet) which converts the web page text into synthesized speech for the user to hear. In another example, a young child may use a voice reader that retrieves a children's book text file from a compact disc and converts the children's book text file into synthesized speech for the child to hear.
A challenge found with voice readers, however, is that the speech in which a voice reader generates is not dynamically configurable. For example, a voice reader may be pre-configured to read text using a female voice at slow speed. In this example, the pre-configured voice is suitable while converting children's book text for a child to hear but may not be suitable while converting a financial article for an adult to hear.
Furthermore, voice readers are not configurable to convert particular sections of a text file based upon a user's interest. For example, a user may be interested in “summary” sections included in a particular technical document. In this example, the voice reader converts the text file using pre-configured voice attributes for each section and generates synthesized speech for each section, regardless of the section's content.
What is needed, therefore, is a system and method for dynamically configuring voice reader attributes such that the voice reader attributes correspond with the semantic content of the text that the voice reader is converting.
It has been discovered that the aforementioned challenges are resolved by performing semantic analysis on a text block and using voice attributes that correspond to the semantic analysis result for dynamically configuring a voice reader. A client receives a text file and segments the text file into a plurality of text blocks. In one embodiment, the client receives the text file from a web page server through a computer network, such as the Internet. In another embodiment, the client receives the text file from a storage device, such as a compact disc. The client sends a text block to a semantic analyzer
The semantic analyzer performs semantic analysis on the text block by matching semantic identifiers located in a look-up table with the text block using standard semantic analysis techniques. For example, the semantic analyzer may use semantic analysis techniques such as symbolic machine learning, graph-based clustering and classification, statistics-based multivariate analyses, artificial neural network-based computing, or evolution-based programming. The semantic analyzer matches a semantic identifier with the text block based upon the semantic analysis results, and retrieves voice attributes corresponding to the matched semantic identifier from the look-up table.
The semantic identifier may be a subject matter semantic identifier or a user interest semantic identifier. A subject matter semantic identifier corresponds to particular subject matter, such as a children's book or a financial article. A user interest semantic identifier corresponds to particular areas of interest, such as a summary, detail, or section headings of a text file. For example, the semantic analyzer identifies that a text block is a paragraph corresponding to financial information and associates a “Business Journal” semantic identifier with the text block. In this example, the semantic analyzer retrieves voice attributes corresponding to the “Business Journal” semantic identifier from the look-up table.
The semantic analyzer provides the voice attributes to a voice reader. The voice attributes include attributes such as a pitch value, a loudness value, and a pace value. In one embodiment, the voice attributes are provided to the voice reader through an Application Program Interface (API). The voice reader inputs the voice attributes into a voice synthesizer whereby the voice synthesizer converts the text block into synthesized speech for a user to hear.
In one embodiment, the text file includes semantic tags that correspond to the semantic content of particular text blocks. In this embodiment, the semantic analyzer performs latent semantic indexing on the semantic tags in order to match a semantic identifier with a semantic tag. Latent semantic indexing organizes text objects into a semantic structure by using implicit higher-order approaches to associate text objects, such as singular-value decomposition. For example, a server may have previously analyzed a text block and the server inserted semantic tags into the text block that correspond to the semantic content of the text block.
The foregoing is a summary and thus contains, by necessity, simplifications, generalizations, and omissions of detail; consequently, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be in any way limiting. Other aspects, inventive features, and advantages of the present invention, as defined solely by the claims, will become apparent in the non-limiting detailed description set forth below.
The present invention may be better understood, and its numerous objects, features, and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference symbols in different drawings indicates similar or identical items.
The following is intended to provide a detailed description of an example of the invention and should not be taken to be limiting of the invention itself. Rather, any number of variations may fall within the scope of the invention which is defined in the claims following the description.
Client 100 receives web page 130 and displays the web page on display 145. Using the example described above, client 100 displays the financial article on display 145 for a user to read. Client 100 includes voice reader 150 which is able to convert text into a synthesized voice signal, such as synthesized voice 195 (see
Voice reader 150 sends text block 160 to semantic analyzer 170. Text block 160 is a section of text that is included in web page 130, such as a paragraph. Semantic analyzer 170 performs semantic analysis on text block 160 by matching semantic identifiers located in table store 180 with the text block using standard semantic analysis techniques. For example, semantic analyzer 170 may use semantic analysis techniques such as symbolic machine learning, graph-based clustering and classification, statistics-based multivariate analyses, artificial neural network-based computing, or evolution-based programming.
Semantic analyzer 170 matches a semantic identifier with the text block based upon the semantic analysis, and retrieves voice attributes corresponding to the matched semantic identifier from a look-up table located in table store 180. Using the example described above, semantic analyzer 170 identifies that text block 160 is a paragraph corresponding to financial information and selects a “Business Journal” semantic identifier to correspond with text block 160. In this example, semantic analyzer 170 retrieves voice attributes corresponding to the “Business Journal” semantic identifier for a look-up table (see
Semantic analyzer 170 provides the retrieved voice attributes (e.g. voice attributes 190) to voice reader 150. Voice attributes 190 include attributes such as a pitch value, a loudness value, and a pace value. In one embodiment, voice attributes 190 are provided to voice reader 150 through an Application Program Interface (API) (see
Semantic analyzer 210 provides the matched tags to server 110 which inserts the tags into the requested web page. Server then sends web page with tags 230 to client 100. Client 100 receives web page 230 whereby voice reader 150 identifies a first text block and sends text block with tags 240 to semantic analyzer 170. Semantic analyzer 170 performs latent semantic indexing on the tag content, and associates a semantic identifier with the tag based upon the semantic analysis. Latent semantic indexing organizes text objects into a semantic structure by using implicit higher-order approaches to associate text objects, such as singular-value decomposition. For example, a tag may be “cash flow” and semantic analyzer 170 may associate a semantic identifier “financial” with the semantic tag.
Semantic analyzer 170 retrieves voice attributes corresponding to the associated semantic identifier from table store 180 and sends voice attributes 190 to voice reader 150. Voice reader 150 inputs voice attributes 190 into a voice synthesizer. The voice synthesizer converts the text block into synthesized voice 195 for a user to hear.
Voice reader 150 retrieves a text file from text store 320 and sends a text block (e.g. text block 160) to semantic analyzer 170 for processing. As one skilled in the art can appreciate, the text file may include semantic tags whereby semantic analyzer performs latent semantic indexing on the semantic tags (see
Semantic analyzer 430 performs semantic analysis on text block 425 and matches a semantic identifier to text block 425 based upon the semantic analysis (see
Semantic analyzer 430 performs semantic analysis on the received text block and retrieves voice attributes from voice attributes store 440 corresponding to the results of the semantic analysis. In turn, semantic analyzer 430 provides the voice attributes (i.e. pitch value, loudness value, and pace value) to voice reader 450 through API 425. Voice synthesizer 450 synthesizes the text block and creates synthesized voice 490 using the received voice attributes.
Table 500 includes columns 505, 510, 515, and 520. Column 505 includes a list of subject matter semantic identifiers. These semantic identifiers may be pre-selected or a user may select particular semantic identifiers for converting text blocks into synthesized speech. For example, a subject matter look-up table may include a “Children's Book” and a “Business Journal” semantic identifier as default semantic identifiers and a user may select other semantic identifiers to include in the subject matter look-up table (see
Column 510 includes a list of voice attribute “Pitch” values that correspond to semantic identifiers shown in column 505. Pitch values may be values such as female-high, female-medium, female-low, male-high, male-medium, male-low. A pitch value instructs a voice reader as to which voice type to use when converting a text block to synthesized speech. For example, row 525 includes a “Children's Book” semantic identifier and its corresponding pitch value is “Female-High”. In this example, the female-high pitch value instructs a voice reader to use a high pitch female voice when converting text blocks that are identified as “Children's Book” through semantic analysis.
Column 515 includes a list of voice attribute “Loudness” values that correspond to semantic identifiers shown in column 505. Loudness values may be values such as loud, medium, or soft. A loudness value instructs a voice reader as to how loud to generate speech when converting a text block. Using the example described above, row 525 includes a “Medium” loudness value which instructs a voice reader to generate speech at a medium volume level when converting text blocks that are identified as “Children's Book” using semantic analysis.
Column 520 includes a list of voice attribute “Pace” values that correspond to semantic identifiers shown in column 505. Pace values may be values such as “Slow”, “Medium”, or “Fast”. A pace value instructs a voice reader as to how fast to generate speech when converting a text block. Using the example described above, row 525 includes a “Slow” pace value which instructs a voice reader to generate speech at a slow pace when converting text blocks that are identified as “Children's Book”.
Row 530 includes a “Business Journal” semantic identifier with corresponding voice attributes “Male-Low”, “Medium”, and “Slow”. When a semantic analyzer associates a text block with the “Business Journal” semantic identifier, such as a financial statement, the semantic analyzer provides corresponding voice attributes to a voice reader. In turn, the voice reader converts the text block to speech using a low pitch male voice at medium volume and slow pace.
Row 535 includes a “Male-Related” semantic identifier with corresponding voice attributes “Male-Medium”, “Medium”, and “Medium”. When a semantic analyzer associates a text block with the “Male-Related” semantic identifier, such as men's fitness information, the semantic analyzer provides corresponding voice attributes to a voice reader. In turn, the voice reader converts the text block to speech using a medium pitch male voice at medium volume and medium pace.
Row 540 includes a “Female-Related” semantic identifier with corresponding voice attributes “Female-Medium”, “Medium”, and “Medium”. When a semantic analyzer associates a text block with the “Female-Related” semantic identifier, such as women's fitness information, the semantic analyzer provides corresponding voice attributes to a voice reader. In turn, the voice reader converts the text block to speech using a medium pitch female voice at medium volume and medium pace.
Row 545 includes a “Teenager” semantic identifier with corresponding voice attributes “Female-High”, “Loud”, and “Fast”. When a semantic analyzer associates a text block with the “Teenager” semantic identifier, such as lyrics to a pop song, the semantic analyzer provides corresponding voice attributes to a voice reader. In turn, the voice reader converts the text block to speech using a high pitch female voice at loud volume and fast pace.
A user may configure semantic identifier types other than subject matter semantic identifiers, such as user interest semantic identifiers, in order to customize a voice reader's text to speech conversion process (see
Table 550 includes columns 555, 560, 565, and 570. Column 555 includes a list of user interest semantic identifiers. Columns 560, 565, and 570 include a list of voice attribute types that are the same as columns 510, 515, and 520 as shown in
Row 575 includes a “Summary” semantic identifier with corresponding voice attributes “Male-Medium”, “Loud”, and “Medium”. When a semantic analyzer associates a text block with the “Summary” semantic identifier, such as an overview of a technical document, the semantic analyzer provides corresponding voice attributes to a voice reader. In turn, the voice reader converts the text block to speech using a medium pitch male voice at loud volume and medium pace.
Row 580 includes a “Detail” semantic identifier with corresponding voice attributes “Male-High”, “Medium”, and “Slow”. When a semantic analyzer associates a text block with the “Detail” semantic identifier, such as a specification in a technical document, the semantic analyzer provides corresponding voice attributes to a voice reader. In turn, the voice reader converts the text block to speech using a high pitch male voice at medium volume and slow pace.
Row 585 includes a “Conclusion” semantic identifier with corresponding voice attributes “Female-Medium”, “Soft”, and “Medium”. When a semantic analyzer associates a text block with the “Conclusion” semantic identifier, such as the results of an experiment, the semantic analyzer provides corresponding voice attributes to a voice reader. In turn, the voice reader converts the text block to speech using a medium pitch female voice at soft volume and medium pace.
Row 590 includes a “Section Heading” semantic identifier with corresponding voice attributes “Female-High”, “Medium”, and “Fast”. When a semantic analyzer associates a text block with the “Section Heading” semantic identifier, such as a sub-title of a section, the semantic analyzer provides corresponding voice attributes to a voice reader. In turn, the voice reader converts the text block to speech using a high pitch female voice at medium volume and fast pace.
A user selects a particular subject matter semantic identifier by using arrows 612 to scroll through a list of subject matter semantic identifiers until the user's desired subject matter semantic identifier is displayed in text box 610. For example, a list of subject matter semantic identifiers may be “Children's Book”, “Business Journal”, and “Teenager Related”. The example shown in
Once the user selects a subject matter semantic identifier, the user configures a pitch value, a loudness value, and a pace value to correspond with the subject matter semantic identifier. The user selects a particular pitch value by using arrows 617 to scroll through a list of pitch values until the user's desired pitch value is displayed in text box 615. For example, a list of pitch values may be “female-high”, “female-medium”, “female-low”, “male-high”, “male-medium”, “male-low”. The example shown in
The user selects a particular loudness value by using arrows 622 to scroll through a list of loudness values until the user's desired loudness value is displayed in text box 620. For example, a list of loudness values may be “Loud”, “medium”, and “soft”. The example shown in
The user selects a particular pace value by using arrows 627 to scroll through a list of pace values until the user's desired pace value is displayed in text box 625. For example, a list of pace values may be “Fast”, “Medium”, and “Slow”. The example shown in
Rows 630 through 634 are other rows that a user may use to select a subject matter semantic identifier and configure corresponding voice attributes. As one skilled in the art can appreciate, more or less subject matter semantic identifier choices may be available than that which is shown in
Area 640 includes user interest semantic identifiers that a user selects and configures corresponding voice attributes. A user selects a particular user interest semantic identifier by using arrows 662 to scroll through a list of user interest semantic identifiers until the user's desired user interest semantic identifier is displayed in text box 660. For example, a list of user interest semantic identifier's may be “Summary”, “Detail”, and “Section Heading”. The example shown in
Once the user selects a user interest semantic identifier, the user configures a pitch value, a loudness value, and a pace value to correspond with the user interest semantic identifier. The user selects a particular pitch value by using arrows 667 to scroll through a list of pitch values until the user's desired pitch value is displayed in text box 665. In addition, the user selects a particular loudness value by using arrows 672 to scroll through a list of loudness values until the user's desired loudness value is displayed in text box 670. Furthermore, the user selects a particular pace value by using arrows 677 to scroll through a list of pace values until the user's desired pace value is displayed in text box 675. Finally, user selects box 650 in order to inform processing that the user wishes to hear text blocks corresponding to a particular semantic identifier.
Rows 680 through 690 are other rows that a user may use to select a user interest semantic identifier and configure corresponding voice attributes. As one skilled in the art can appreciate, more or less user interest semantic identifier choices may be available than that which is shown in
When the user is finished configuring semantic identifiers and corresponding voice attributes, the user selects command button 695 to save changes and exit window 600. If the user does not wish to save changes, the user selects command button 699 to exit window 600 without saving changes.
Processing performs semantic analysis on the text block in order to match a semantic identifier to the text block (pre-defined process block 720, see
Processing retrieves the voice attributes that correspond to the identified semantic identifier from table store 735 (step 730). Table store 735 may be stored on a nonvolatile storage area, such as a computer hard drive. Processing provides the voice attributes to voice synthesizer 760 at step 740 using a direct connection or using an API (see
A determination is made as to whether there are more text blocks to process (decision 770). If there are more blocks to process, decision 770 branches to “Yes” branch 772 which loops back to retrieve (step 780) and process the next text block. This looping continues until there are no more text blocks to process, at which point decision 770 branches to “No” branch 778 whereupon processing ends at 790.
A determination is made as to whether the semantic identifiers include one or more user interest semantic identifiers (decision 820). If the semantic identifiers include one or more user interest semantic identifiers, decision 820 branches to “Yes” branch 824 whereupon a determination is made as to whether the text block includes semantic tags (decision 850). For example, a server may have previously analyzed the text block whereby the server inserted semantic tags into the text block that correspond to the semantic content of the text block (see
If the text block includes semantic tags, decision 850 branches to “Yes” branch 854 whereupon processing performs latent semantic indexing on the semantic tags using the user interest semantic identifiers. Latent semantic indexing organizes text objects into a semantic structure by using implicit higher-order approaches to associate text objects, such as singular-value decomposition. For example, the semantic tag may be “Abstract” and the user interest semantic identifiers are “Summary”, “Detail”, and “Section Headings”. Processing selects a semantic identifier at step 870 based upon the semantic analysis performed at step 865. Using the example described above, processing selects the semantic identifier “Summary” since “Summary” is the closest semantic identifier to “Abstract”.
On the other hand, if the text block does not include semantic tags, decision 850 branches to “No” branch 852 whereupon processing performs semantic analysis on the text block using the user interest semantic identifiers (step 855). For example, the text block may include overview information for a particular document, such as a technical document, and the user interest semantic identifiers include “Summary”, “Detail”, and “Section Headings”. Processing selects a semantic identifier based upon the semantic analysis performed at step 855 (step 860). Using the example described above, processing selects the semantic identifier “Summary” since “Summary” is the closest match to an “overview”.
If the semantic identifiers do not include a user interest semantic identifier, decision 820 branches to “No” branch 822 whereupon a determination is made as to whether the text block includes semantic tags (decision 825). For example, a server may have previously analyzed the text block and the server inserted semantic tags into the text block that correspond to the semantic content of the text blocks (see
On the other hand, if the text block does not include semantic tags, decision 825 branches to “No” branch 827 whereupon processing performs semantic analysis on the text block using the subject matter semantic identifiers. For example, the text block may include a financial statement for a particular company and the subject matter semantic identifiers are “Children's Book”, “Business Journal”, and “Teen Related”. Processing selects a semantic identifier based upon the semantic analysis performed at step 830 (step 835). Using the example described above, processing selects the semantic identifier “Business Journal” since “Business Journal” is the closest match to financial statement information. Processing returns at 880.
PCI bus 914 provides an interface for a variety of devices that are shared by host processor(s) 900 and Service Processor 916 including, for example, flash memory 918. PCI-to-ISA bridge 935 provides bus control to handle transfers between PCI bus 914 and ISA bus 940, universal serial bus (USB) functionality 945, power management functionality 955, and can include other functional elements not shown, such as a real-time clock (RTC), DMA control, interrupt support, and system management bus support. Nonvolatile RAM 920 is attached to ISA Bus 940. Service Processor 916 includes JTAG and I2C busses 922 for communication with processor(s) 900 during initialization steps. JTAG/I2C busses 922 are also coupled to L2 cache 904, Host-to-PCI bridge 906, and main memory 908 providing a communications path between the processor, the Service Processor, the L2 cache, the Host-to-PCI bridge, and the main memory. Service Processor 916 also has access to system power resources for powering down information handling device 901.
Peripheral devices and input/output (I/O) devices can be attached to various interfaces (e.g., parallel interface 962, serial interface 964, keyboard interface 968, and mouse interface 970 coupled to ISA bus 940. Alternatively, many I/O devices can be accommodated by a super I/O controller (not shown) attached to ISA bus 940.
In order to attach computer system 901 to another computer system to copy files over a network, LAN card 930 is coupled to PCI bus 910. Similarly, to connect computer system 901 to an ISP to connect to the Internet using a telephone line connection, modem 975 is connected to serial port 964 and PCI-to-ISA Bridge 935.
While the computer system described in
One of the preferred implementations of the invention is an application, namely, a set of instructions (program code) in a code module which may, for example, be resident in the random access memory of the computer. Until required by the computer, the set of instructions may be stored in another computer memory, for example, on a hard disk drive, or in removable storage such as an optical disk (for eventual use in a CD ROM) or floppy disk (for eventual use in a floppy disk drive), or downloaded via the Internet or other computer network. Thus, the present invention may be implemented as a computer program product for use in a computer. In addition, although the various methods described are conveniently implemented in a general purpose computer selectively activated or reconfigured by software, one of ordinary skill in the art would also recognize that such methods may be carried out in hardware, in firmware, or in more specialized apparatus constructed to perform the required method steps.
While particular embodiments of the present invention have been shown and described, it will be obvious to those skilled in the art that, based upon the teachings herein, changes and modifications may be made without departing from this invention and its broader aspects and, therefore, the appended claims are to encompass within their scope all such changes and modifications as are within the true spirit and scope of this invention. Furthermore, it is to be understood that the invention is solely defined by the appended claims. It will be understood by those with skill in the art that if a specific number of an introduced claim element is intended, such intent will be explicitly recited in the claim, and in the absence of such recitation no such limitation is present. For a non-limiting example, as an aid to understanding, the following appended claims contain usage of the introductory phrases “at least one” and “one or more” to introduce claim elements. However, the use of such phrases should not be construed to imply that the introduction of a claim element by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim element to inventions containing only one such element, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an”; the same holds true for the use in the claims of definite articles.
This application is a continuation of application Ser. No. 10/464,881 filed Jun. 19, 2003, titled “System and Method for Configuring Voice Readers Using Semantic Analysis,” and having the same inventors as the above-referenced application.
Number | Date | Country | |
---|---|---|---|
Parent | 10464881 | Jun 2003 | US |
Child | 11836890 | Aug 2007 | US |