The present invention relates generally to a system and method for producing an optimal language model for performing speech recognition.
Today's speech recognition technology enables a computer to transcribe spoken words into computer recognized text equivalents. Speech recognition is the process of converting an acoustic signal, captured by a transducive element, such as a microphone or a telephone, to a set of words. These words can be used for numerous applications including data entry and word processing. The development of speech recognition technology is primarily focused on accurate speech recognition, which is a formidable task due to the wide variety of pronunciations, individual accents, and speech characteristics of individual speakers. Speech recognition is also complicated by the highly technical and scientific vocabulary used in certain applications for speech recognition technology, such as in the medical profession.
The key to speech recognition technology is the language model. A language model describes the type of text the dictator will speak about. For example, speech recognition technology designed for the medical profession will utilize different language models for different specialties in medicine. In this example, a language model is created by collecting text from doctors in each specialty area, such as radiology, oncology, etc. The type of text collected would include language and words associated with that practice, such as diagnoses and prescriptions.
Today's state of the art speech recognition tools utilize a factory (or out-of-the-box) language model and a separate customizable site-specific language model. A recognition server determines if the site language model requires updating by monitoring the dates the factory language model and the customized site-specific language model were created, modified, or copied. The site language model would then be updated via a process to add words. Making updates to the factory language models requires the recognition server to run through and update all of the language models before they are ready to run recognition tasks.
Separate independent processes are used to (1) update the site language model, and (2) perform the batch speech recognition task using the updated site-specific language model. If the recognition server had not updated the site language model, the previous out-of-date site language model would be used in step 2. Once the speech recognition creates a transcribed output report using the language model, the transcribed report is then run through post-processing which applies formatting to the report.
What is needed is speech recognition technology that automatically updates the site language model and ensures that the most up-to-date site language model is used for recognition.
The present invention includes a system and method for modifying a language model and post-processor information by adding site specific words to a vocabulary and using preexisting corrected reports along with an optional accented word list to update the task language model.
In a first embodiment, the system may include the steps of language model identification, site-specific model creation, and language model adaptation. In language model identification, the method includes collecting a set of language models, collecting a set of corrected dictated reports, collecting site specific data from a user, matching each dictated report with each language model, determining a perplexity factor for each language model, and determining the language model having the lowest perplexity factor. Each corrected dictated report is matched with each language model and a perplexity factor is determined for each language model. The language model having the lowest perplexity factor is then selected. A site-specific language model is then created by adding the site specific data to the selected language model. In some embodiments, the site specific data includes add word lists and post processing information such as spacing and capitalization. In language model adaptation, a task language model is then created by adapting the site specific language model by updating a set of statistics for the language model, and adding an accent word list to the language model. In some embodiments there is the step of creating a site-specific language model by adding the site specific data to the language model having the lowest perplexity factor and adapting the site specific language model to create a task language model, and outputting the task language model. In some embodiments the adapting step may include updating a set of statistics for the language model and adding an accent word list to the language model to create a task language model. In some embodiments the set of statistics includes bigrams or trigrams. In some embodiments there is a step of adapting the language model with the lowest perplexity factor to create a task language model, and outputting the task language model. The adapting step may include the language model with the lowest perplexity factor including updating a set of statistics for the language model and adding an accent word list to the language model to create the task language model.
In a second embodiment the present invention includes a system and method for modifying a language model and post-processor information, including the steps of collecting the language model, collecting a set of corrected dictated reports, collecting site specific data from a user, determining whether the user has an accent, creating an accent specific list based on the user's accent; and creating a site-specific language model by adding the site specific data to the language model.
In a third embodiment the present invention includes a system and method for modifying a language model and post-processor information, including the steps of collecting the language model, collecting a set of corrected dictated reports, collecting site specific data from a user, determining whether the user has an accent, creating an accent specific list based on the user's accent, and adapting the language model to create a task language model, and outputting the task language model.
In a fourth embodiment the present invention includes an apparatus structured and arranged to modify a language model and post-processor information, including collecting the language model, collecting a set of corrected dictated reports, collecting site specific data from a user, determining whether the user has an accent, creating an accent specific list based on the user's accent, and adapting the language model to create a task language model, and outputting the task language model.
While the specification concludes with claims particularly pointing out and distinctly claiming the present invention, it is believed the same will be better understood from the following description taken in conjunction with the accompanying drawings, which illustrate, in a non-limiting fashion, the best mode presently contemplated for carrying out the present invention, and in which like reference numerals designate like parts throughout the Figures, wherein:
The present disclosure will now be described more fully with reference the to the Figures in which an embodiment of the present disclosure is shown. The subject matter of this disclosure may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein.
Referring to
The process to automatically modify a language model runs in three stages. In stage 1 (110), language model identification chooses the best factory language model 106 from the collection of factory language models 100.
The collection of corrected reports 104 is used to choose the best factory language model 106 and are reports that a user has dictated for the same worktype. For example, in the medical profession, the collection of corrected text reports 104 may include the name of common diagnoses, drugs, and procedures that this doctor dictates about when reporting in this worktype. In one embodiment of this step, each factory language model is matched and compared with corrected text reports 104 in order to determine a perplexity factor for each language model. The language model with the lowest perplexity is selected as the best factory language model 106 for the present application.
In stage 2 (120), the collection of site-specific words 102 are added to the best factory language model 106 in order to create the site language model 108. For example, for speech recognition applications for the medical industry, site-specific words may include doctor names, place names, names of procedures, names of diagnoses, and drug names. If appropriate, an accented word list 114 may also be added at this stage to ensure a better match for users with specific accents.
In stage 3 (130), language model adaptation applies the collection of corrected reports 104 to adapt the site language model 108 to the individual user's speaking style. This is accomplished with bi-grams and tri-grams, which statistically break down the pattern of words used by a specific individual, and acts to actively predict the next most likely word to follow a distinctive word. For example, the next most likely word to follow a doctor's use of the word “dilated” might be “pupil.” The output from this stage is a task language model 112. If appropriate, an accented word list 114 may also be added at this stage to ensure a better match for users with specific accents. Furthermore, it is envisioned that postprocessor information may also be implemented to customize the task language model and thus the speech recognition output, enabling site-specific formatting such as capitalization, and heading format.
The process described herein is implemented automatically using, for example, a computer program in C++ implementing a CDependency object. This CDependency object would be utilized in each of the three stages described with respect to
The flow chart in
Once data returns from the text system a signal is sent to the recognition wrappers or agents 530 which is indicated by line 525. The wrapper 530 may be a self-contained component that contains the speech server, or it may sit on a machine external to a speech server. Wrapper 530 is the entity that does the speech recognition tasks. Line 525 represents the LMID 520 object submitting a request to the recognition wrapper 530 to do the LMID task.
Returning to
It will be apparent to one of skill in the art that described herein is a novel system and method for automatically modifying a language model. While the invention has been described with reference to specific preferred embodiments, it is not limited to these embodiments. The invention may be modified or varied in many ways and such modifications and variations as would be obvious to one of skill in the art are within the scope and spirit of the invention and are included within the scope of the following claims.
This application claims the benefit of priority of U.S. Provisional Patent Application No. 60/507,134, entitled “SYSTEM AND METHOD FOR MODIFYING A LANGUAGE MODEL AND POST-PROCESSOR INFORMATION,” filed Oct. 1, 2003, which is hereby incorporated by reference in its entirety. This application also relates to co-pending U.S. patent application Ser. No. 10/413,405, entitled, “INFORMATION CODING SYSTEM AND METHOD”, filed Apr. 15, 2003; co-pending U.S. patent application Ser. No. 10/447,290, entitled, “SYSTEM AND METHOD FOR UTILIZING NATURAL LANGUAGE PATIENT RECORDS”, filed on May 29, 2003; co-pending U.S. patent application Ser. No. 10/448,317, entitled, “METHOD, SYSTEM, AND APPARATUS FOR VALIDATION”, filed on May 30, 2003; co-pending U.S. patent application Ser. No. 10/448,325, entitled, “METHOD, SYSTEM, AND APPARATUS FOR VIEWING DATA”, filed on May 30, 2003; co-pending U.S. patent application Ser. No. 10/448,320, entitled, “METHOD, SYSTEM, AND APPARATUS FOR DATA REUSE”, filed on May 30, 2003; co-pending U.S. patent application Ser. No. 10/948,625, entitled “METHOD, SYSTEM, AND APPARATUS FOR ASSEMBLY, TRANSPORT AND DISPLAY OF CLINICAL DATA”, filed Sep. 23, 2004; co-pending U.S. Provisional Patent Application Ser. No. 60/507,136, entitled, “SYSTEM AND METHOD FOR DATA DOCUMENT SECTION SEGMENTATIONS”, filed on Oct. 1, 2003; co-pending U.S. Provisional Patent Application Ser. No. 60/507,135, entitled, “SYSTEM AND METHOD FOR POST PROCESSING SPEECH RECOGNITION OUTPUT”, filed on Oct. 1, 2003; co-pending U.S. patent application Ser. No. 10/951,291, entitled, “SYSTEM AND METHOD FOR CUSTOMIZING SPEECH RECOGNITION INPUT AND OUTPUT”, filed Sep. 27, 2004; co-pending U.S. Provisional Patent Application Ser. No. 60/533,217, entitled “SYSTEM AND METHOD FOR ACCENTED MODIFICATION OF A LANGUAGE MODEL” filed on Dec. 31, 2003, co-pending U.S. Provisional Patent Application Ser. No. 60/547,801, entitled, “SYSTEM AND METHOD FOR GENERATING A PHRASE PRONUNCIATION”, filed on Feb. 27, 2004, co-pending U.S. patent application Ser. No. 10/787,889 entitled, “METHOD AND APPARATUS FOR PREDICTION USING MINIMAL AFFIX PATTERNS”, filed on Feb. 27, 2004; co-pending U.S. Provisional Application Ser. No. 60/547,797, entitled “A SYSTEM AND METHOD FOR NORMALIZATION OF A STRING OF WORDS,” filed Feb. 27, 2004; and co-pending U.S. Provisional Application Ser. No. 60/505,428, entitled “CATEGORIZATION OF INFORMATION USING NATURAL LANGUAGE PROCESSING AND PREDEFINED TEMPLATES”, filed Mar. 31, 2004, all of which co-pending applications are hereby incorporated by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
4477698 | Szlam et al. | Oct 1984 | A |
4965763 | Zamora | Oct 1990 | A |
5253164 | Holloway et al. | Oct 1993 | A |
5325293 | Dorne | Jun 1994 | A |
5327341 | Whalen et al. | Jul 1994 | A |
5392209 | Eason et al. | Feb 1995 | A |
5544360 | Lewak et al. | Aug 1996 | A |
5664109 | Johnson et al. | Sep 1997 | A |
5799268 | Boguraev | Aug 1998 | A |
5809476 | Ryan | Sep 1998 | A |
5832450 | Myers et al. | Nov 1998 | A |
5845047 | Fukada et al. | Dec 1998 | A |
5905773 | Wong | May 1999 | A |
5970463 | Cave et al. | Oct 1999 | A |
6014663 | Rivette et al. | Jan 2000 | A |
6021202 | Anderson et al. | Feb 2000 | A |
6052693 | Smith et al. | Apr 2000 | A |
6055494 | Friedman | Apr 2000 | A |
6088437 | Amick | Jul 2000 | A |
6182029 | Friedman | Jan 2001 | B1 |
6188976 | Ramaswamy et al. | Feb 2001 | B1 |
6192112 | Rapaport et al. | Feb 2001 | B1 |
6292771 | Haug et al. | Sep 2001 | B1 |
6347329 | Evans | Feb 2002 | B1 |
6405165 | Blum et al. | Jun 2002 | B1 |
6434547 | Mishelevich et al. | Aug 2002 | B1 |
6438533 | Spackman et al. | Aug 2002 | B1 |
6484136 | Kanevsky et al. | Nov 2002 | B1 |
6553385 | Johnson et al. | Apr 2003 | B2 |
6915254 | Heinze et al. | Jul 2005 | B1 |
6947936 | Suermondt et al. | Sep 2005 | B1 |
7043422 | Gao et al. | May 2006 | B2 |
7120582 | Young et al. | Oct 2006 | B1 |
7124144 | Christianson et al. | Oct 2006 | B2 |
20020007285 | Rappaport | Jan 2002 | A1 |
20020095313 | Haq | Jul 2002 | A1 |
20020143824 | Lee et al. | Oct 2002 | A1 |
20020169764 | Kincaid et al. | Nov 2002 | A1 |
20030046264 | Kauffman | Mar 2003 | A1 |
20030061201 | Grefenstette et al. | Mar 2003 | A1 |
20030083883 | Cyr et al. | May 2003 | A1 |
20030115080 | Kasravi et al. | Jun 2003 | A1 |
20030208382 | Westfall | Nov 2003 | A1 |
20030233345 | Perisic et al. | Dec 2003 | A1 |
20040098263 | Hwang et al. | May 2004 | A1 |
20040103075 | Kim et al. | May 2004 | A1 |
20040139400 | Allam et al. | Jul 2004 | A1 |
20040186746 | Angst et al. | Sep 2004 | A1 |
20040220895 | Carus et al. | Nov 2004 | A1 |
20040243545 | Boone et al. | Dec 2004 | A1 |
20040243551 | Boone et al. | Dec 2004 | A1 |
20040243552 | Titemore et al. | Dec 2004 | A1 |
20040243614 | Boone et al. | Dec 2004 | A1 |
20050108010 | Frankel et al. | May 2005 | A1 |
20050114122 | Uhrbach et al. | May 2005 | A1 |
20050120020 | Carus et al. | Jun 2005 | A1 |
20050120300 | Schwager et al. | Jun 2005 | A1 |
20050144184 | Carus et al. | Jun 2005 | A1 |
Number | Date | Country | |
---|---|---|---|
20050165598 A1 | Jul 2005 | US |
Number | Date | Country | |
---|---|---|---|
60507134 | Oct 2003 | US |