Claims
- 1. A method of extracting intelligent information from text sources in electronic format, comprising:
receiving unstructured text data from text sources in an electronic format; transforming the unstructured text data into unstructured neutral format data; converting the transformed unstructured neutral format data into structured data based on desired intelligent information; and extracting desired intelligent information from the converted structured data by using a multilayer self-organizing maps (MSOM) algorithm.
- 2. The method of claim 1, wherein the received unstructured data comprises data selected from the group consisting of text data and signal components.
- 3. The method of claim 2, wherein the text sources are selected from the group consisting of product manuals, sales manuals, maintenance manuals, service manuals, text documents, repair procedures, fault isolation procedures, troubleshooting manuals, wiring diagrams and legacy systems.
- 4. The method of claim 1, further comprising:
storing the converted structured data in a database.
- 5. The method of claim 1, wherein converting the transformed unstructured neutral format data into structured data based on desired intelligent information, comprises:
extracting multiple packets from the transformed unstructured neutral format data based on desired intelligent information, wherein each extracted packet includes multiple sentence structured text data that is translated to single sentence structured text data including information that facilitates in categorizing and classifying the extracted multiple packets; and forming multiple preprocessed packets by transforming each single sentence structured data into context-based data based on contextual information and occurrence frequency.
- 6. The method of claim 5, further comprising:
sequentially inputting the multiple preprocessed packets into the MSOM algorithm to extract the desired intelligent information from each preprocessed packet.
- 7. A method of extracting intelligent information from unstructured text data in an electronic data format, comprising:
transforming the unstructured text data into unstructured neutral format text data; converting the transformed unstructured neutral format text data into structured text data based on desired information by extracting multiple packets of finer resolution information from the transformed unstructured neutral format text data, wherein each extracted packet includes a single sentence structure data including fault information that facilitates in categorizing and classifying the extracted multiple packets; inputting the extracted multiple packets into a multi-layer self-organizing maps (MSOM) algorithm; extracting desired fault information from each inputted packet by using the MSOM algorithm; and categorizing and classifying each extracted packet based on the extracted desired fault information.
- 8. The method of claim 7, wherein inputting the converted neutral format text data into the MSOM algorithm comprises:
sequentially inputting the multiple extracted packets into the MSOM algorithm for extracting the desired information.
- 9. The method of claim 8, wherein sequentially inputting the one or more extracted packets into the MSOM algorithm for extracting the desired information comprises:
obtaining a predetermined number of packets from the multiple extracted packets; generating a template based on a two-dimensional structured document map using the obtained predetermined number of packets from the multiple extracted packets; and extracting the desired information from each remaining packet in the multiple extracted packets by projecting the remaining packet onto the generated template.
- 10. The method of claim 9, wherein generating the template based on a two-dimensional structured document map using the obtained predetermined number of packets, comprises:
extracting key phrases by sequentially inputting the obtained predetermined number of packets; generating a first layer contextual relation map by mapping each of the extracted key phrases to a two-dimensional map using a self-organizing map and a function approximation neighborhood technique; forming phrase clusters for the generated first layer contextual relation map; constructing a first key phrase frequency histogram consisting of the frequency of occurrences of key phrases from the generated first layer contextual relation map; and generating the template based on the two-dimensional structured document map of the predetermined number of packets from the constructed first key phrase frequency histogram and the generated first layer contextual relation map by using the function approximation neighborhood technique in the self-organizing map.
- 11. The method of claim 10, further comprising:
transforming each extracted key phrase into a unique numerical representation.
- 12. The method of claim 7, further comprising:
storing the multiple extracted packets in a database.
- 13. A method of text categorization, comprising:
receiving unstructured text data from text manuals in an electronic format; transforming the unstructured text data into unstructured neutral format text data; converting the transformed unstructured neutral format text data into structured text data based on desired category and classification of fault; and extracting the desired category and classification information by inputting the converted structured text data into a multilayer self-organizing maps (MSOM) algorithm.
- 14. The method of claim 13, wherein transforming unstructured text data into unstructured neutral format text data comprises:
extracting multiple packets from the transformed unstructured neutral format text data based on desired intelligent information, wherein each extracted packet includes multiple sentence structured text data that is translated to single sentence structured text data including information that facilitates in categorizing and classifying the extracted multiple packets.
- 15. The method of claim 14, wherein the fault information includes information selected from the group consisting of type of fault, fault category, location information, and rectification information that can facilitate in finding the needed information in the maintenance and service manuals.
- 16. A method of categorizing and classifying fault information from maintenance manuals, comprising:
receiving unstructured text data in the maintenance manuals in an electronic format; transforming the unstructured text data into unstructured neutral format text data; converting the transformed unstructured neutral format text data into structured text data based on desired category and classification of fault; and extracting the desired category and classification information by inputting the converted structured text data into a multilayer self-organizing maps (MSOM) algorithm.
- 17. The method of claim 16, wherein transforming unstructured text data into unstructured neutral format text data comprises:
extracting multiple packets from the transformed unstructured neutral format text data based on desired intelligent information, wherein each extracted packet includes multiple sentence structured text data that is translated to single sentence structured text data including information that facilitates in categorizing and classifying the extracted multiple packets.
- 18. The method of claim 17, wherein the fault information includes information selected from the group consisting of type of fault, fault category, and location information that can facilitate in finding the needed information in the maintenance and service manuals.
- 19. The method of claim 17, wherein inputting the converted neutral format text data into the MSOM algorithm comprises:
sequentially inputting the multiple extracted packets into the MSOM, algorithm for extracting the category and classification of fault information.
- 20. The method of claim 17, wherein sequentially inputting the one or more extracted packets into the MSOM algorithm for extracting the desired intelligent information comprises:
obtaining a predetermined number of packets from the multiple extracted packets; generating a template based on a two-dimensional structured document map using the obtained predetermined number of packets from the multiple extracted packets; and extracting the desired category and classification information from each extracted packet by projecting the extracted packet onto the generated template.
- 21. The method of claim 20, wherein generating the template based on a two-dimensional structured document map using the obtained predetermined number of packets, comprises:
extracting key phrases by sequentially inputting the obtained predetermined number of packets; generating a first layer of contextual relation map by mapping each of the extracted key phrases to a two-dimensional map using a self-organizing map and a function approximation neighborhood technique; forming phrase clusters for the generated first layer contextual relation map; constructing a first key phrase frequency histogram consisting of the frequency of occurrences of key phrases from the generated first layer of contextual relation map; and generating the template based on the two-dimensional structured document map of the predetermined number of packets from the constructed first key phrase frequency histogram and the generated first layer contextual relation map by using the function approximation neighborhood technique in the self-organizing map.
- 22. The method of claim 21, further comprising:
transforming the extracted key phrases into a unique numerical representation.
- 23. The method of claim 21, wherein generating the first layer contextual relation maps comprises:
generating the first layer contextual relation map by mapping each of the transformed multiple key phrases to a two-dimensional map using the associated self-organizing maps and the function approximation neighborhood technique.
- 24. The method of claim 17 further comprising:
storing the multiple extracted packets into a database.
- 25. The method of claim 17 further comprising:
categorizing each of the multiple extracted packets based on the extracted desired category and classification information from the multiple extracted packets.
- 26. An intelligent information-mining method to extract desired category and classification information from unstructured data in aircraft maintenance manuals, comprising:
receiving unstructured text data in the aircraft maintenance manuals in an electronic format; extracting multiple packets of finer resolution from the unstructured text data, wherein each extracted packet includes multiple sentence structured text data that is translated to single sentence structured text data, wherein each single sentence structured data includes aircraft fault information that can facilitate in categorizing and classifying the extracted multiple packets based on type of aircraft fault. preprocessing each packet including single sentence structured data to form a preprocessed packet including one or more sentences based on criteria selected from the group consisting of removing punctuations, removing all words comprised of three or fewer letters, filtering to remove general words, and filtering to remove rarely used words; obtaining a predetermined number of preprocessed packets from the multiple preprocessed packets; generating a template based on a two-dimensional structured document map using the obtained predetermined number of preprocessed packets from the multiple extracted preprocessed packets; and extracting the desired category and classification information from each extracted preprocessed packet by projecting the extracted preprocessed packet onto the generated template.
- 27. The method of claim 26, wherein generating the template based on a two-dimensional structured document map using the obtained predetermined number of preprocessed packets, comprises:
extracting key phrases by sequentially inputting the obtained predetermined number of preprocessed packets; generating a first layer contextual relation map by mapping each of the extracted key phrases to a two-dimensional map using a self-organizing map and a function approximation neighborhood technique; forming phrase clusters for the generated first layer contextual relation map; constructing a first key phrase frequency histogram consisting of the frequency of occurrences of key phrases from the generated first layer contextual relation map; and generating the template based on the two-dimensional structured document map of the predetermined number of packets from the constructed first key phrase frequency histogram and the generated first layer contextual relation map by using the function approximation neighborhood technique in the self-organizing map.
- 28. The method of claim 27, further comprising:
transforming the extracted key phrases into a unique numerical representation.
- 29. The method of claim 27, wherein generating the first layer contextual relation map comprises:
generating the first layer contextual relation map by mapping each of the transformed multiple key phrases to a two-dimensional map using the associated self-organizing maps and the function approximation neighborhood technique.
- 30. The method of claim 24, further comprising:
categorizing each of the multiple extracted preprocessed packets based on the extracted desired category and classification information from the multiple extracted preprocessed packets based on type of aircraft fault.
- 31. The method of claim 24, wherein the aircraft fault information includes information selected from the group consisting of type of aircraft fault, aircraft fault category, location information, and rectification information that can facilitate in finding the needed information in the aircraft maintenance and service manuals.
- 32. An intelligent information-mining system, comprising:
an input module to receive unstructured text data from text sources and output the unstructured text data in an electronic format; a computing platform coupled to the input module further comprises:
a document-to-knowledge (D2K) module coupled to the input module to receive the unstructured text data in electronic format and transfer unstructured text data into unstructured neutral format text data, wherein the D2K module extracts multiple packets of finer resolution from the transformed unstructured neutral format text data, and wherein each packet includes a single sentence structured data including information that facilitates in categorizing and classifying the converted multiple packets; a preprocessing module coupled to the D2K module to receive each extracted packet including the single sentence structured data, and to form a preprocessed packet including one or more sentences based on criteria that facilitates in extracting key phrases; an analyzer coupled to the preprocessing module to receive the preprocessed packets and to input the received preprocessed packets into a multilayer self-organizing maps (MSOM) algorithm to extract desired category and classification information; and an output module coupled to the computing platform to receive the extracted desired category and classification information and to categorize each extracted packet based on the received category and classification information.
- 33. The system of claim 32, wherein the text data comprises data selected from the group consisting of product manuals, sales manuals, maintenance manuals, service manuals, text documents, and legacy systems.
- 34. The system of claim 32, wherein the criteria to extract key phrases comprises criteria selected from the group consisting of removing punctuation, removing all words comprised of three or fewer letters, filtering to remove general words, and filtering to remove rarely used words.
- 35. The system of claim 34, wherein the preprocessing module transforms the extracted key phrases into a unique numerical representation.
- 36. The system of claim 34, wherein the analyzer sequentially inputs the received preprocessed packets into the MSOM algorithm to extract desired category and classification information.
- 37. The system of claim 36, wherein the analyzer obtains a predetermined number of preprocessed packets from the multiple extracted preprocessed packets, and generates a template based on a two-dimensional structured document map using the obtained predetermined preprocessed number of packets from the multiple extracted packets, and wherein the analyzer extracts the desired category and classification information from each extracted preprocessed packet by projecting the extracted preprocessed packet onto the generated template.
- 38. The system of claim 37, wherein the analyzer generates a first layer contextual relation map, wherein the analyzer constructs a first key phrase frequency histogram consisting of the frequency of occurrences of key phrases from the generated first layer contextual relation map, and wherein the analyzer generates the template based on the two-dimensional structured document map of the predetermined number of packets from the constructed first key phrase frequency histogram and the generated first layer contextual relation map by using the function approximation neighborhood technique in the self-organizing map.
- 39. The system of claim 32, wherein the computing platform further comprises memory coupled to the D2K module and the preprocessing module to receive extracted packets from the D2K module and to store the packets.
- 40. A computer-readable medium having computer-executable instructions for extracting desired intelligent information from unstructured data in aircraft maintenance manuals, comprising:
receiving unstructured text data in the aircraft maintenance manuals in an electronic format; extracting multiple packets of finer resolution from the unstructured text data, wherein each extracted packet includes multiple sentence structured text data that is translated to single sentence structured text data, wherein each single sentence structured text data includes aircraft fault information that can facilitate in categorizing and classifying the extracted multiple packets based on type of aircraft fault; obtaining a predetermined number of packets from the multiple extracted packets; generating a template based on a two-dimensional structured document map using the obtained predetermined number of packets from the multiple extracted packets; and extracting the desired category and classification information from each extracted packet by projecting the extracted packet onto the generated template.
- 41. The computer-readable medium of claim 40, wherein generating the template based on a two-dimensional structured document map using the obtained predetermined number of packets, comprises:
extracting key phrases by sequentially inputting the obtained predetermined number of packets; generating a first layer contextual relation map by mapping each of the extracted key phrases to a two-dimensional map using a self-organizing map and a function approximation neighborhood technique; forming phrase clusters for the generated first layer of contextual relation map; constructing a first key phrase frequency histogram consisting of the frequency of occurrences of key phrases from the generated first layer contextual relation map; and generating the template based on the two-dimensional structured document map of the predetermined number of packets from the constructed first key phrase frequency histogram and the generated first layer contextual relation map by using the function approximation neighborhood technique in the self-organizing map.
- 42. The computer-readable medium of claim 40, further comprising:
transforming the extracted key phrases into a unique numerical representation.
- 43. The computer-readable medium of claim 40, wherein generating the first layer contextual relation maps comprises:
generating the first layer contextual relation map by mapping each of the transformed multiple key phrases to a two-dimensional map using the associated self-organizing maps and the function approximation neighborhood technique.
- 44. A computer system to perform data transfer operations between a remotely located communication module and one or more security devices in a COM-based computer network system having multiple users, comprising:
a processor; an output device; and a storage device to store instructions that are executable by the processor for extracting desired intelligent information from unstructured data in aircraft maintenance manuals, comprising:
receiving unstructured text data in the aircraft maintenance manuals in an electronic format; extracting multiple packets of finer resolution from the unstructured text data, wherein each extracted packet includes multiple sentence structured text data that is translated to single sentence structured text data, wherein each single sentence structured text data includes aircraft fault information that can facilitate in categorizing and classifying the extracted multiple packets based on type of aircraft fault; obtaining a predetermined number of packets from the multiple extracted packets; generating a template based on a two-dimensional structured document map using the obtained predetermined number of packets from the multiple extracted packets; and extracting the desired category and classification information from each extracted packet by projecting the extracted packet onto the generated template.
- 45. The system of claim 44, wherein generating the template based on a two-dimensional structured document map using the obtained predetermined number of packets, comprises:
extracting key phrases by sequentially inputting the obtained predetermined number of packets; generating a first layer contextual relation map by mapping each of the extracted key phrases to a two-dimensional map using a self-organizing map and a function approximation neighborhood technique; forming phrase clusters for the generated first layer contextual relation map; constructing a first key phrase frequency histogram consisting of the frequency of occurrences of key phrases from the generated first layer contextual relation map; and generating the template based on the two-dimensional structured document map of the predetermined number of packets from the constructed first key phrase frequency histogram and the generated first layer contextual relation map by using the function approximation neighborhood technique in the self-organizing map.
- 46. The system of claim 45, further comprising:
transforming the extracted key phrases into a unique numerical representation.
- 47. The system of claim 45, wherein generating the first layer contextual relation maps comprises:
generating the first layer of contextual relation maps by mapping each of the transformed multiple key phrases to a two-dimensional map using the associated self-organizing maps and the function approximation neighborhood technique.
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application is related to the co-pending, commonly assigned U.S. patent application Ser. No. 09/825,577, filed May 10, 2001, entitled “INDEXING OF KNOWLEDGE BASE USING MULTILAYER SELF-ORGANIZING MAPS WITH HESSIAN AND PERTURBATION INDUCED FAST LEARNING” (Attorney Docket No.: 00256.094US1) is hereby incorporated by reference in its entirety.