Claims
- 1. An information mining method comprising:
extracting key-phrases from unstructured text data; generating three dimensional template contextual self organized maps based on the extracted key-phrases; and generating three dimensional template structured self organized maps based on the extracted key-phrases for the training samples; generating three dimensional dynamic information contextual self organized maps for information to be classified; generating three dimensional dynamic information structured self organized maps for information to be classified; and identifying desired information from a comparison of the three dimensional template maps with the dynamic information maps.
- 2. The method of claim 1 further comprising:
forming phrase clusters for the three dimensional template and dynamic information contextual self organized maps; and constructing phrase frequency histograms from the three dimensional template and dynamic information contextual self organized maps.
- 3. The method of claim 2 wherein the comparison is a function of the phrase clusters and phrase frequency histograms.
- 4. An information mining method, comprising:
extracting multiple key-phrases from unstructured text; obtaining a predetermined number of key-phrases from the multiple key-phrases; generating a layer of template contextual relation map by mapping the predetermined number of key-phrases to a three-dimensional map using a self-organizing map; generating a layer of dynamic information contextual relation map for the received unstructured text by mapping the transformed key-phrases to the three-dimensional map using the self-organizing map; forming phrase clusters for the template contextual relation map; mapping phrases of the information to be classified using the phrase clusters of the generated template contextual relation map; constructing template and dynamic information key-phrase frequency histograms consisting of the frequency of occurrences of key-phrases, respectively, from the generated template contextual relation map and the dynamic information contextual relation map; generating template and dynamic information three-dimensional structured maps from each of corresponding template and dynamic key-phrase frequency histograms; and extracting desired information by mapping the dynamic information three-dimensional structured map on to the template three-dimensional structured map.
- 5. The method of claim 4, further comprising:
receiving unstructured text from various text sources, wherein the text sources are selected from the group comprising of product manuals, maintenance manuals, and any documents including unstructured text.
- 6. The method of claim 4, further comprising:
extracting multiple key-phrases from the unstructured text sources; and forming the multiple key-phrases from each of the extracted multiple key-phrases.
- 7. The method of claim 6, wherein extracting multiple key phrases comprises:
extracting multiple key-phrases from the unstructured text sources based on specific criteria selected from the group comprising filtering to removing all words comprising three or fewer letters, filtering to remove general words, and filtering to remove rarely used words.
- 8. The method of claim 6, wherein the key-phrases comprise:
one or more key-words and/or one or more key-phrases.
- 9. The method of claim 6, wherein key-phrases comprise:
one or more extracted key-phrases and associated preceding and following words adjacent to the extracted key-phrases to include contextual information.
- 10. The method of claim 4, further comprising:
transforming each of the extracted key-phrases into a unique numerical representation.
- 11. An intelligent information mining method, comprising:
receiving unstructured text; extracting multiple key-phrases from the unstructured text; transforming each of the extracted key-phrases into a unique numerical representation; obtaining a predetermined number of key-phrases from the transformed key-phrases; generating a layer of template contextual relation map by mapping the predetermined number of key-phrases on to a surface of a three-dimensional map using a self-organizing map; generating a layer of dynamic information contextual relation map for the received unstructured text by mapping the transformed key-phrases on to the surface of the three-dimensional map using the self-organizing map; forming phrase clusters for the template contextual relation map; forming phrase clusters for the dynamic information contextual relation map by using the phrase clusters of the generated template contextual relation map; constructing template and dynamic information key-phrase frequency histograms consisting of the frequency of occurrences of key-phrases, respectively, from the generated template contextual relation map and the dynamic information contextual relation map; and generating template and dynamic information three-dimensional structured maps from each of corresponding template and dynamic key-phrase frequency histograms extracting desired information by mapping the dynamic information three-dimensional structured map on to the template three-dimensional structured map.
- 13. The method of claim 12, wherein the unstructured text is received from sources selected from the group consisting of a data base/data warehouse, a LAN/WAN network, SAN, Internet, a voice recognition system, and a mobile/fixed phone.
- 14. The method of claim 13, wherein the received unstructured text can be in any natural language.
- 15. A method, comprising:
extracting multiple key-phrases from unstructured text; obtaining a predetermined number of multiple key-phrases from the multiple key-phrases; generating a layer of template contextual relation map by mapping the predetermined number of multiple key-phrases on to a surface of a spherical map using a self-organizing map; generating a layer of dynamic information contextual relation map for the received unstructured text by mapping the multiple key-phrases on to the surface of the spherical map using the self-organizing map; forming phrase clusters for the template contextual relation map; forming phrase clusters for the dynamic information contextual relation map by using the phrase clusters of the generated template contextual relation map; constructing template and dynamic information key-phrase frequency histograms consisting of the frequency of occurrences of key-phrases, respectively, from the generated template contextual relation map and the dynamic information contextual relation map; generating template and dynamic information three-dimensional structured maps from each of corresponding template and dynamic key-phrase frequency; and extracting desired information by mapping the dynamic information three-dimensional structured map on to the template three-dimensional structured map.
- 16. The method of claim 15, further comprising:
receiving unstructured text from various text sources, wherein the text sources are selected from the group comprising of product manuals, maintenance manuals, and any documents including unstructured text.
- 17. The method of claim 15 further comprising:
extracting multiple key-phrases from the unstructured text sources; and forming the multiple key-phrases from each of the extracted multiple key-phrases.
- 18. The method of claim 17, wherein key-phrases comprise:
one or more extracted key-phrases and associated preceding and following words adjacent to the extracted key-phrases to include contextual information.
- 19. The method of claim 18 wherein extracting multiple key-phrases comprises:
extracting multiple key-phrases from the unstructured text sources based on specific criteria selected from the group comprising filtering to removing all words comprising three or fewer letters, filtering to remove general words, and filtering to remove rarely used words.
- 20. The method of claim 19, further comprising:
transforming each of the extracted key-phrases into a unique numerical representation.
- 21. An intelligent information mining method, comprising:
receiving unstructured text from various unstructured text sources; extracting multiple key-phrases from the unstructured text; transforming each of the multiple key-phrases into a unique numerical representation; obtaining a predetermined number of transformed key-phrases from the transformed multiple key-phrases; generating a first layer template contextual relation map by mapping the predetermined number of key-phrases to surface of a three-dimensional map using a self-organizing map to categorize each of the predetermined number of transformed key-phrases based on contextual meaning; forming phrase clusters using the first layer template contextual relation map; constructing a template phrase frequency histogram consisting of frequency of occurrences of predetermined number of transformed key-phrases from the first layer template contextual relation map; generating a three-dimensional template structured map using the template phrase frequency histogram so that the generated three-dimensional template structured document map includes text clusters based on similarity of relationship between the formed phrase clusters; generating a dynamic information contextual relation map by mapping remaining transformed key-phrases to a three-dimensional dynamic information map using the self-organizing map to categorize the remaining key-phrases based on the contextual meaning; constructing a dynamic information key phrase frequency histogram consisting of frequency of occurrences of remaining transformed key-phrases from the generated dynamic information contextual relation map; generating a three-dimensional dynamic information structured document map using the dynamic information phrase frequency histogram and the generated dynamic information contextual relation map which includes clusters of information using the self-organizing map such that locations of the information in the clusters determine similarity relationship among the formed clusters; and extracting desired information by mapping the generated three-dimensional dynamic information structured document map over the generated three-dimensional template structured document map.
- 22. The method of claim 21, further comprising:
extracting multiple key-phrases from the unstructured text sources; and forming the multiple key-phrases from each of the extracted multiple key-phrases.
- 23. The method of claim 22, wherein extracting multiple key-phrases comprises:
extracting multiple key-phrases from the unstructured text sources based on a specific criteria selected from the group comprising, filtering to remove all words comprised of three or fewer letters, and filtering to remove rarely used words.
- 24. The method of claim 23, wherein the key-phrases can comprise:
one or more key-phrases and/or one or more key-phrases.
- 25. The method of claim 24, wherein the key-phrases comprise:
one or more extracted key-phrases and associated preceding and following words adjacent to the extracted key-phrases to include contextual information.
- 26. The method of claim 15, further comprising:
comparing extracted desired information to an expected desired information to compute any error in the extracted desired information; if the error exists based on the outcome of the comparison, and wherein the error is due to the error in the formation of the template three dimensional structured map using fuzzy prediction algorithm and basis histogram to extract desired information; comparing the outcome of the extracted desired information obtained using fuzzy prediction algorithm and basis histogram to the expected desired information; if the extracted desired information and the expected desired information are substantially same, then the dynamic information three-dimensional structured document map is corrected using learning vector quantization (LVQ) based negative learning error correcting algorithm; if the extracted desired information and the expected desired information are not substantially same, then the template contextual relation map is corrected using the learning vector quantization (LVQ) based negative learning error correcting algorithm to correct the template contextual relation map; extracting desired information using corrected template contextual relation map; comparing the extracted desired information to the expected desired information; and if the extracted desired information is substantially different form the expected desired information based on the outcome of the comparison, then repeating the above steps until the extracted desired information is substantially same as the expected desire information.
- 27. The method of claim 26, wherein using the LVQ based negative learning error correcting algorithm to correct the template three-dimensional structured document map and the template contextual relation map comprises:
applying a substantially small negative and positive learning correction to an outer cover of the correct and incorrect clusters in the template three-dimensional structured document map and the template contextual relation map using the equation: wj(n+1)=wj(n)−η(n) πj,i(x)(n)[x(n)−wj(n)]wj(n+1)=wj(n)+η(n) πj,i(x)(n)[x(n)−wj(n)]wherein wj=weights of node j , X(n) =input at time n, and πj,i (n)=neighborhood function centered around winning node I(x).
- 28. An information mining method, comprising:
extracting multiple key-phrases from unstructured text; transforming each of the multiple key-phrases into a unique numerical representation; obtaining a predetermined number of transformed multiple key-phrases from the multiple key-phrases; generating a layer of template contextual relation map by mapping the predetermined number of transformed multiple key-phrases on to a surface of a spherical map using a self-organizing map and a gaussian approximation neighborhood technique; generating a layer of dynamic information contextual relation map for the received unstructured text by mapping the transformed multiple key-phrases on to the surface of the spherical map using the self-organizing map and the gaussian approximation neighborhood technique; forming phrase clusters for the template contextual relation map; forming phrase clusters for the dynamic information contextual relation map by using the phrase clusters of the generated template contextual relation map; constructing template and dynamic information key-phrase frequency histograms consisting of the frequency of occurrences of key-phrases, respectively, from the generated template contextual relation map and the dynamic information contextual relation map; generating template and dynamic information three-dimensional structured maps from each of corresponding template and dynamic key-phrase frequency histograms; and extracting desired information by mapping the dynamic information three-dimensional structured map on to the template three-dimensional structured map, respectively.
- 29. The method of claim 28, further comprising:
extracting multiple key-phrases from the unstructured text sources; and forming the multiple key-phrases from each of the extracted multiple key-phrases.
- 30. The method of claim 29, wherein extracting multiple key-phrases comprises:
extracting multiple key-phrases from the unstructured text sources based on a specific criteria selected from the group comprising, filtering to remove all words comprised of three or fewer letters, and filtering to remove rarely used words.
- 31. The method of claim 30, wherein the key-phrases can comprise:
one or more key-phrases and/or one or more key-phrases.
- 32. The method of claim 30, wherein the key-phrases comprise:
one or more extracted key-phrases and associated preceding and following words adjacent to the extracted key-phrases to include contextual information.
- 33. The method of claim 28, wherein the gaussian neighborhood function technique, comprises:
updating values of weight vectors of winner category and neighborhood using the equation: wj(n+1)=wj(n)−η(n) πj,i(x)(n)[x(n)−wj(n)]wj(n+1)=wj(n)+η(n) πj,i(x)(n)[x(n)−wj(n)]wherein wj=weights of node j , X(n)=input at time n, and πj,i (n)=neighborhood function centered around winning node I(x) given by the gaussian distribution function: exp(−d2IJ/2σ2(n)) wherein η(n)=the learning rate with typical range [0.1-0.01], η0exp(−n/τ2), σ(n)=Standard deviation σ0exp(−n/τ1), σ0=3.14 /6, and τ1, τn=time constants, where τ1=1000/log (σ0) and τn=1000.
- 34. The method of claim 28, further comprising:
calculating a cumulative frequency of each category mapped to a cell in the 3D template and 3D dynamic information contextual spherical maps; calculating goodness factor for each calculated cumulative frequency; labeling each category based on the calculated goodness factor; and clustering the labeled categories using least mean square clustering algorithm.
- 35. The method of claim 34, further comprising:
if index (min(dm,cluster centers)) ∈(i,j), then merging the clustered categories based on the labels, wherein m is midpoint between centers of clusters (I,J).
- 36. The method of claim 34, wherein calculating the goodness factor comprises:
calculating the goodness factor of all categories Ci, w.r.t each cell j using the equation: G(C1, j)=FClust(C1)/FColl(C1) wherein FCell is the category Ci in relation to other categories in the cell j, FColl relates to category Ci to the whole collection, wherein 3G(Ci,j)=Fj(Ci)*Fj(Ci)Fj(Ci)+∑i⊂⃒AIJFj(Ci)wherein i Alj if d (i, j)<(r1=the radius of the neutral zone), r1=3*di,j, i,j being adjacent and Fj(Ci)=fj(Ci)/ΣJfj(Ci)−the relative frequency of category Ci, wherein fj (Ci)=the frequency of category Ci in j.
- 37. A computer-implemented system for intelligent information mining, comprising:
a web server to receive unstructured text data from various text sources; a key-word/phrase extractor to extract multiple key-phrases from the unstructured text data; and an analyzer to transform each extracted key-phrase into a unique numerical representation such that the transformed unique numerical representation; wherein the analyzer obtains a predetermined number of extracted multiple key-phrases and generates a layer of template contextual relation map by mapping predetermined number of multiple key-phrases to a three-dimensional map using a self organizing map and a gaussian distribution technique to categorize the predetermined number of multiple key-phrases based on contextual meaning; wherein the analyzer generates a layer of dynamic information contextual relation map by mapping the multiple key-phrases to the three-dimensional map using a self-organizing map and a gaussian distribution technique to categorize the multiple key-phrases based on the contextual meaning; wherein the analyzer forms word clusters for each of the generated contextual relation maps, and the analyzer further constructs a key-phrase frequency histogram consisting of frequency of occurrence of product and query related key-phrases, respectively from each of the generated contextual relation maps; and wherein the analyzer generates template and dynamic information three-dimensional structured document maps from the constructed key-phrase frequency histogram and the generated template and dynamic information contextual relation maps using the self-organizing map and wherein the analyzer further extracts desired information by mapping the dynamic information three-dimensional structured document map to template three-dimensional structured document map.
- 38. The system of claim 37, wherein the various text sources comprise:
text sources selected from the group comprising product manuals, maintenance manuals, and service manuals.
- 39. The method of claim 37, wherein the analyzer extracts multiple key-phrases from the received unstructured text sources, and wherein the analyzer forms multiple key-phrases from each of the extracted multiple key-phrases.
- 40. The system of claim 39, wherein the analyzer extracts multiple key-phrases from the unstructured text sources based on specific criteria selected from the group consisting of filtering to remove all words comprised of three or fewer letters, and filtering to remove rarely used words.
- 41. The system of claim 39, wherein the key-phrases comprise:
one or more key-phrases and/or one or more key-phrases.
- 42. The system of claim 39, wherein key-phrases comprise:
one or more extracted key-phrases and associated preceding and following words adjacent to the extracted key-phrases to include contextual information.
- 43. A computer-readable medium having computer executable instruction for intelligent information mining, comprising:
extracting multiple key-phrases from unstructured text; obtaining a predetermined number of key-phrases from the multiple key-phrases; generating a layer of template contextual relation map by mapping the predetermined number of key-phrases to a three-dimensional map using a self-organizing map; generating a layer of dynamic information contextual relation map for the received unstructured text by mapping the transformed key-phrases to the three-dimensional map using the self-organizing map; forming phrase clusters for the template contextual relation map; forming phrase clusters for the dynamic information contextual relation map by using the phrase clusters of the generated template contextual relation map; constructing template and dynamic information key-phrase frequency histograms consisting of the frequency of occurrences of key-phrases, respectively, from the generated template contextual relation map and the dynamic information contextual relation map; generating template and dynamic information three-dimensional structured maps from each of corresponding template and dynamic key-phrase frequency histograms and the contextual relation maps; and extracting desired information by mapping the dynamic information three-dimensional structured map on to the template three-dimensional structured map.
- 44. A computer system for intelligent information mining, comprising:
a processor; an output device; and a storage device to store instructions that are executable by the processor to perform a method of intelligent information mining from unstructured text data, comprising:
extracting key-phrases from unstructured text data; generating three dimensional template contextual self organized maps based on the extracted key-phrases; generating three dimensional dynamic information contextual self organized maps for information to be classified; and identifying desired information from a comparison of the three dimensional template maps with the dynamic information maps.
- 45. An information mining method comprising:
extracting key-phrases from unstructured text data; generating n-dimensional template contextual self organized maps based on the extracted key-phrases; generating n-dimensional template structured self organized maps based on the extracted key-phrases for the training samples; generating n-dimensional dynamic information contextual self organized maps for information to be classified; generating n-dimensional dynamic information structured self organized maps for information to be classified; and identifying desired information from a comparison of the n-dimensional template maps with the dynamic information maps, wherein n is greater than three.
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application is related to the co-pending, commonly assigned U.S. patent application Ser. No. 09/825,577, filed May 10, 2001, entitled “INDEXING OF KNOWLEDGE BASE IN MULTILAYER SELF-ORGANIZING MAPS WITH HESSIAN AND PERTURBATION INDUCED FAST LEARNING” is hereby incorporated by reference in its entirety. This application is also related to the co-pending, commonly assigned U.S. patent application Ser. No. 09/860,165, filed May 17, 2001, entitled “A NEURO/FUZZY HYBRID APPROACH TO CLUSTERING DATA” hereby incorporated by reference in its entirety.