This application is related to the co-pending, commonly assigned U.S. patent application Ser. No. 09/825,577, filed May 10, 2001, entitled “INDEXING OF KNOWLEDGE BASE IN MULTILAYER SELF-ORGANIZING MAPS WITH HESSIAN AND PERTURBATION INDUCED FAST LEARNING” is hereby incorporated by reference in its entirety. This application is also related to the co-pending, commonly assigned U.S. patent application Ser. No. 09/860,165, filed May 17, 2001, entitled “A NEURO/FUZZY HYBRID APPROACH TO CLUSTERING DATA” hereby incorporated by reference in its entirety.
This invention relates generally to the field of information mining, and more particularly pertains to an automated intelligent information mining technique.
With the explosive growth of available information sources it has become increasingly necessary for users to utilize information mining techniques to find, extract, filter, and evaluate desired information. Human translation is generally laborious, expensive, and error-prone and not a feasible approach for extracting desired information.
Automating information mining techniques to mine information in text documents can be difficult because the text documents are in human readable and understandable format that lack inherently defined structure and appears as meaningless data for the information mining techniques, because text can come from various sources, such as a database, e-mail, Internet and/or through a telephone in different forms. Also, text documents coming from various sources can be high dimensional in nature containing syntactic, semantic (contextual) structure of words/phrases, temporal and spatial information which can cause disorderliness in the information mining process.
Current information mining techniques such as hierarchical keyword searches, statistical and probabilistic techniques, and summarization using linguistic processing, clustering, and indexing dominate the unstructured text processing arena. The most prominent and successful of the current information mining techniques require huge databases including domain specific keywords, comprehensive domain specific thesauruses, computationally intensive processing techniques, laborious human interface and human expertise.
There has been a trend in the development of information mining techniques to be domain independent, to be adaptive in nature, and to be able to exploit contextual information present in text documents to improve processing speeds of information mining techniques. Current techniques for information mining use self-organizing maps (SOMs) to exploit the contextual information present in the text. Currently, SOMs are the most popular artificial neural network algorithms. SOMs belong to a category of competitive learning networks. SOMs are generally based on unsupervised learning (training without a teacher), and they provide a topology that preserves contextual information of unstructured document by mapping from a high dimensional data (unstructured document) to a two dimensional map (structured document), also called map units. Map units, or neurons, usually form a two dimensional grid and hence the mapping from high dimensional space onto a plane. Thus, SOMs serve as a tool to make clusters for analyzing high dimensional data. Word category maps are SOMs that have been organized according to word similarities, measured by the similarity between short contexts of the words. Contextually interrelated words tend to fall into the same or neighboring map nodes. Nodes may thus be viewed as word categories.
Current pending U.S. patent application Ser. No. 09/825,577, dated May 10, 2002, entitled “INDEXING OF KNOWLEDGE BASE IN MULTILAYER SELF-ORGANIZING MAPS WITH HESSIAN AND PERTURBATION INDUCED FAST LEARNING” discloses such an information technique using the SOMs that is domain independent, adaptive in nature that can exploit contextual information present in the text documents, and can have an improved learning rate that does not suffer from losing short contextual information. One drawback with this technique is that the histogram formed from the clusters is very much dependent on the clusters and is very specific and sensitive to the cluster boundary. The elements in or near the boundary may suffer from this rigidity. This might have adverse effects on the accuracy of the information mining.
The SOM based algorithm disclosed in the above-mentioned pending application uses heuristic procedures and so termination is not based on optimizing any model of the process or its data. The final weight vectors used in the algorithm usually depend on the input sequence. Different initial conditions yield different results. It is recommended that the alteration of several parameters of the self-organizing algorithm, such as learning rate, the size of update neighborhood and the strategy to alter these parameters during learning from one data set to another will yield useful results. There is a need for an improved adaptive algorithm responsive to changing scenarios and external inputs. There is a further need for uniformity in neighborhood size. There is yet a further need for an algorithm that preserves neighborhood relationships of the input space in the face of bordering neurons that have fewer neighborhoods than others.
The present invention provides an automated intelligent information mining technique for various types of information mining applications such as data and text mining applications, identification of a signal from a stream of signals, pattern recognition applications, and/or natural language processing applications. Unstructured text is received from various text sources, and key-phrases are extracted from the received unstructured text. Each of the extracted key-phrases are transformed into a unique numerical representation. Layers of template and dynamic information contextual relation maps are generated by mapping the transformed key-phrases to the surfaces of three-dimensional maps, respectively, using a self-organizing map and a gaussian distribution (function approximation of neighborhood). Further, word clusters are formed and corresponding key-phrase frequency histograms are constructed for each of the generated contextual relation maps. Template and dynamic information three-dimensional structured document maps are generated from the constructed key-phrase frequency maps and the generated contextual maps using the self-organizing map and gaussian distribution technique. Desired information is extracted by masking the generated dynamic information three-dimensional structured map to the template three-dimensional structured map.
If the extracted information is not substantially the same as the expected information, a fuzzy prediction algorithm using basis histogram on the histograms obtained from the template and dynamic information contextual relation maps is used to extract desired information. The extracted desired intelligent information obtained using the fuzzy prediction algorithm is also compared to the expected information. A learning vector quantization (LVQ) based negative learning error correcting algorithm is used to correct the formed 3D template information structured map, when the extracted information obtained using the fuzzy prediction algorithm is substantially same as the expected information.
The LVQ based negative learning error correcting algorithm is used to correct the three-dimensional template contextual relation map, when the extracted desired intelligent information obtained using the fuzzy prediction algorithm is not substantially same as the expected information.
Other aspects of the invention will be apparent on reading the following detailed description of the invention and viewing the drawings that form a part thereof.
This document describes an improved automated information mining technique applicable to various types of information mining applications such as data and text mining applications, identification of a signal from a stream of signals, pattern recognition applications, and/or natural language processing applications.
The process begins with operation 110 by receiving unstructured text from various sources such as a data base/data warehouse, a LAN/WAN network, SAN, Internet, a voice recognition system, and/or a mobile/fixed phone. Operation 110 can also begin by receiving image signals that are stored in a buffer, online, and/or a file.
Operation 110 further includes extracting multiple key-phrases from the received unstructured text. In some embodiments, element 110 also extracts multiple key-words from the received text and can form the multiple key-phrases from the extracted key-words. In these embodiments, element 110 extracts key-words from the received text based on a specific criteria such as filtering to remove all words comprised of three or fewer letters, and/or filtering to remove rarely used words. The formed key-phrases can include one or more extracted key-words and any associated preceding and following words adjacent to the extracted key-words to include contextual information. In some embodiments, element 110 further morphologizes the extracted key-words based on fundamental characteristics of the extracted key-words. For example, the element 110 can morphologize in such a way that morphed (altered) words' pronunciation or meaning remain in place.
Operation 120 transforms each of the extracted key-words, phrases and/or morphed words to a unique numerical representation. Extracted key-words are transformed such that the transformed unique numerical representation does not result in multiple similar numerical representations, to avoid ambiguous prediction of meaning of the translated words in the received text.
Operation 130 generates a layer of three-dimensional (3D) template contextual relation map using a self-organizing map (SOM) to categorize the extracted key-phrases based on contextual meaning. In some embodiments, the layer of 3D template contextual relation map is generated by obtaining a predetermined amount of key-phrases from the extracted multiple key-phrases. In some embodiments, the 3D template contextual relation map is a spherical shaped template contextual relation map.
Before proceeding with generating of the 3D template contextual relation map, the map parameters are set to naturally converge around a sphere by considering each row of neurons in the map to represent a horizontal slice of a sphere with the angle of latitude between adjacent slices being equal. The number of neurons nk in the slice k is proportional to the circumference of the slice. The following equation is used to calculate the number of neurons nk given d, the number of slices,
nk=2d sin(π/2−θn)
Where
The resulting map will be the acceptable shape to converge to the topology of a sphere.
Input patterns (multiple key-phrases) are then presented to the self-organizing map (artificial neural network).
x1, x2, . . . xnεRn
where each of the x1, x2, . . . xn are triplets (normalized unique representations of key-phrases including preceding word, reference key word, and succeeding word) of the high dimensional text data
Random weights are then initialized using a random number generator and normalized between 0 and 1 (because the inputs to the network are also normalized between 0 and 1). The strength between input and output layer nodes are referred to as weights, and updating of weights is generally called learning.
wi≡[wi1, wi2, . . . , win]TεRn, where wi1, wi2, . . . , win are the random weights, where ‘n’ is a dimension of the input layer. Generally, ‘n’ is based on the number of input patterns/vectors (key-phrases). In the following example of assigned random weights, dimension ‘n’ is initialized using 10.
0.24, 0.98, 0.47, . . . , 0.25, 0.94, 0.62
In some embodiments, the initial neighborhood radius is set to σ0=π/6, the initial neighborhood is taken as the circle with radius σ0.
Compute distance to all nodes using modality-vectors as follows:
d1j−cos−1[(x1xj+y1yj+z1zj)/(√(x12+y12+z12)*√(xj2+yj2+zj2))] 0.6239, 0.81. 0.04 . . .
The winner among all the nodes are then determined as follows:
update the value of the weight vector of the winner and neighborhood using the following equation:
wj(n+1)=wj(n)+η(n) πj,i(x)(n)[x(n)−wj(n)]
Wherein Wj=weights of node j
X(n)=input at time n
πj,i (n)=neighborhood function centered around winning node I(x) given by
exp(−d21j/2σ2(n))
Wherein η(n)=the learning rate with typical range [0.1–0.01]
η0exp(−n/τ2)
σ(n)=Standard deviation
σ0exp (−n/τ1)
σ0=3.14/6
Wherein τ1, τn=time constants
τ1=1000/log (σ0), τn=1000
The angle subtended by categories (neurons) at the center of the sphere can be taken as measure of topographic distance between two neurons. If two neurons are spatially located at (x1, y1, z1) and (x2, y2, z2) then angular distance is given by
d1j=cos−1[(x1x2+y1y2+z1z2)/(√(x12+y12+z12)*√(x22+y22+z22))]
Operation 140 includes generating a layer of 3D dynamic information contextual map using the extracted multiple key-phrases. The process used to generate the layer of 3D dynamic information contextual relation map is similar to the above-described process of generating the layer of 3D template contextual relation map. In some embodiments, the 3D dynamic information contextual map is a spherical shaped 3D dynamic information contextual map. Operations 130 and 140 are performed in parallel in one embodiment, but may also be performed serially.
Operation 150 includes forming phrase clusters for the generated template and dynamic information contextual relation maps. In some embodiments, the phrase clusters are formed based on positions obtained from the above-illustrated equations using the least square error algorithm.
Operation 160 includes constructing key-phrase frequency histogram consisting of frequency of occurrences of multiple key-phrases using the generated template and dynamic information contextual relation maps. In some embodiments, the key-phrase frequency histogram is constructed by determining the number of times each of the key-phrases appear, in each of the generated contextual relation maps.
Operation 170 then includes generating template and dynamic information three-dimensional (3D) structured document maps using the constructed phrase frequency histogram and the generated contextual relation maps using the self-organizing map so that each of the generated 3D structured document maps include phrase clusters based on similarity relationship between the formed word clusters. Operation 180 then obtains desired intelligent information by mapping the generated 3D dynamic information structured map to the template 3D structured map.
In some embodiments, the template and dynamic information contextual relation maps and the template and dynamic information structured maps are generated by mapping the transformed multiple key-phrases on to the surface of the spherical map using the self-organizing map and the gaussian approximation neighborhood technique. The gaussian distribution enables the neighbor neurons selected for weight updation to be independent of the neighborhood structure. In these embodiments, gaussian approximation neighborhood technique includes updating values of weight factors of winner category and neighborhood using the equation:
wj(n+1)=wj(n)−η(n) πj,i(x)(n)[x(n)−wj(n)]
wj(n+1)=wj(n)+η(n) πj,i(x)(n)[x(n)−wj(n)]
Wherein wj=weights of node j , X(n)=input at time n, and πj,i (n)=neighborhood function centered around winning node I(x) is given by the gaussian distribution function:
exp(−d21J/2σ2(n))
Wherein η(n)=the learning rate with typical range [0.1–0.01], η0 exp(−n/τ2), σ(n)=Standard deviation σ0exp(−n/τ1), σ0=3.14/6, and τ1, τn=time constants, where τ1=1000/log (σ0) and τn=1000.
The process begins with extracting desired intelligent information from unstructured text using the 3D template contextual map and the 3D template structured information map at 210 as described-above with reference to
If the extracted information is not substantially the same as the expected information, operation 230 applies a fuzzy prediction algorithm using basis histogram on the histograms obtained from the template and dynamic information contextual relation maps to extract desired intelligent information. One such fuzzy prediction algorithm is described in U.S. patent application Ser. No. 09/860,165, filed May 17, 2001, entitled “A NEURO/FUZZY HYBRID APPROACH TO CLUSTERING DATA” hereby incorporated by reference in its entirety. Operation 240 compares the extracted desired intelligent information obtained using the fuzzy prediction algorithm to the expected information. Operation 242 includes applying a learning vector quantization (LVQ) based negative learning error correcting algorithm to correct the formed 3D template information structured map, when the extracted desired intelligent information is substantially same as the expected information.
Operation 250 includes applying the LVQ based negative learning error correcting algorithm to correct the 3D template contextual relation map, when the extracted desired intelligent information obtained using the fuzzy prediction algorithm is not substantially same as the expected information. The information extraction continues using the corrected 3D self organizing maps.
In some embodiments, applying the LVQ based negative learning error correcting algorithm includes applying substantially small negative and positive learning correction to an outer cover to correct and incorrect cluster boundaries in the 3D template structured map and the 3D template contextual relation map using the equation:
wj(n+1)=wj(n)−η(n) πj,i(x)(n)[x(n)−wj(n)]
wj(n+1)=wj(n)+η(n) πj,i(x)(n)[x(n)−wj(n)]
Wherein wj=weights of node j , X(n)=input at time n, and πj,i (n)=neighborhood function centered around winning node I(x).
Operations 205–250 are repeated until the extracted desired intelligent information is substantially same as the expected information.
The process begins with operation 310 by calculating a cumulative frequency for each category mapped to a cell in the 3D template and dynamic information contextual relation maps. Operation 320 includes calculating a goodness factor for each calculated cumulative frequency. In some embodiments, goodness factor each category Ci, w.r.t each cell j is calculated using the equation:
G(Ci, j)=FClust(Ci)/FColl(Ci)
wherein FCell is the category Ci in relation to other categories in the cell j, FColl relates to category Ci to the whole collection, wherein
wherein i⊂Alj if d (i,j)<(r1=the radius of the neutral zone), r1=3*di,j, i,j being adjacent and Fj(Ci)=fj(Ci)/Σjfj(Ci)−the relative frequency of category Ci, wherein fj(Ci)=the frequency of category Ci in j.
Operation 330 includes labeling each category mapped to a cell in the 3D template and 3D dynamic information based on the calculated goodness factor. Operation 340 includes clustering the labeled categories by applying least mean square error clustering algorithm to each of the categories.
Operation 350 then includes comparing each category using the following equation:
index(min(dm,cluster centers))∈(i,j)
Operation 350 includes stopping the process 300 if the above condition is not true otherwise operation 360 includes merging the clustered categories based on the labels. In some embodiments, clusters are merged by finding midpoint (m) between the centers of clusters (I,j). Distance from m to all cluster centers is then determined. The above equation is then used to merge the clusters.
Category 1
PREREQUISITES MAKE SURE THESE SYSTEMS WILL OPERATE: AIR/GROUND SYSTEM (AMM 32-09-02 /201). MAKE SURE THE AIRPLANE IS IN THIS CONFIGURATION: ELECTRICAL POWER (AMM 24-22-00 /201)
Category 2
DO THE ANTENNA AND CABLE CHECK PROCEDURE FOR THE APPLICABLE LEFT CENTER,RIGHT) ILS RECEIVER (AMM 34-00-00/201 ). PERFORM PSEU BITE PROCEDURE (FIM 32-09-03,
Category 3
L OR R SYSTEM FAULTS, REPLACE THE PRESSURE SWITCH, S25 (S30), FOR THE ALTERNATING CURRENT MOTOR PUMP (ACMP) IN THE LEFT (RIGHT) HYDRAULIC SYSTEM (AMM 29-11-18 /401 ). C SYSTEM FAULTS, ADJUST THE PRESSURE SWITCH, S10003 (S10016), FOR THE ALTERNATING CURRENT MOTOR PUMP (ACMP) C1 (C2) IN THE CENTER HYDRAULIC SYSTEM (AMM 29-11-19 /401).
Category 4
EXAMINE AND RREPAIR THE CIRCUIT BETWEEN THE FCC CONNECTOR D381A, PIN K3 AND TB127, PIN G43 (WDM 22-15-12.). CHECK FOR CONTINUITY BETWEEN PINS A7 AND A8 OF CONNECTOR D2712A, (WDM 21-31-21. ).
After completing the operations 110 and 120 described-above with reference to
Word, Code & Winner Nodes for Category 1
PREREQUISITES (0.027631 0.030854 0.024407) *22* MAKE (0.030854 0.024407 0.036636) *22* SURE (0.024407 0.036636 0.037852) *22* THESE (0.036636 0.037852 0.036835) *22* SYSTEMS (0.037852 0.036835 0.043527) *22* WILL (0.036835 0.043527 0.028883) *22* OPERATE: (0.043527 0.028883 0.002341) *22* AIR/GROUND (0.028883 0.002341 0.036835) *22* SYSTEM (0.002341 0.036835 0.000068) *22* AMM (0.036835 0.000068 0.018451) *22* MAKE (0.030521 0.024407 0.036636) *22* SURE (0.024407 0.036636 0.002341) *22* AIRPLANE (0.036636 0.002341 0.000013) *22* IN (0.002341 0.000013 0.037857) *22* THIS (0.000013 0.037857 0.006376) *22* CONFIGURATION: (0.037857 0.006376 0.009961) *22* ELECTRICAL (0.006376 0.009961 0.030730) *22* POWER (0.009961 0.030730 0.000068) *22* AMM (0.030730 0.000068 0.015399) *22*
Word, Code & Winner Nodes for Category 2
DO (0.251298 0.250007 0.252589) *61* ANTENNA (0.250007 0.252589 0.255671) *61* CABLE (0.252589 0.255671 0.256019) *61* CHECK (0.255671 0.256019 0.280867) *61* PROCEDURE (0.256019 0.280867 0.250317) *71* FOR (0.280867 0.250317 0.252683) *61* APPLICABLE (0.250317 0.252683 0.272725) *61* LEFT (0.252683 0.272725 0.250155) *61* (CENTER,RIGHT) (0.272725 0.250155 0.250461) *61* ILS (0.250155 0.250461 0.283956) *61* RECEIVER (0.250461 0.283956 0.250068) *71* AMM (0.283956 0.250068 0.267012) *61* PERFORM (0.280567 0.280230 0.280904) *7* PSEU (0.280230 0.280904 0.280904) *7* PSEU (0.280904 0.280904 0.254216) *7* BITE (0.280904 0.254216 0.250001) *5* E (0.254216 0.250001 0.280867) *7* PROCEDURE (0.250001 0.280867 0.250309) *7* FIM (0.280867 0.250309 0.261688) *5* FIG. (0.250309 0.261688 0.254357) *7* BLOCK (0.261688 0.254357 0.252048) *7* ACTION (0.254357 0.252048 0.253202) *7*
Word, Code & Winner Nodes for Category 3
L (0.500011 0.500001 0.500021) *26* OR (0.500001 0.500021 0.500001) *26* R (0.500021 0.500001 0.536835) *26* SYSTEM (0.500001 0.536835 0.511313) *10* FAULTS, (0.536835 0.511313 0.533973) *10* REPLACE (0.511313 0.533973 0.530854) *10* PRESSURE (0.533973 0.530854 0.536723) *10* SWITCH, (0.530854 0.536723 0.500317) *10* FOR (0.536723 0.5003170.502491) *26* ALTERNATING (0.500317 0.502491 0.506677) *26* CURRENT (0.502491 0.506677 0.525109) *26* MOTOR (0.506677 0.525109 0.531013) *10* PUMP (0.525109 0.531013 0.500054) *10* (ACMP) (0.531013 0.500054 0.500013) *26* IN (0.500054 0.500013 0.522725) *26* LEFT (0.500013 0.522725 0.500899) *26* (RIGHT) (0.522725 0.500899 0.516218) *26* HYDRAULIC (0.500899 0.516218 0.536835) *26* SYSTEM (0.516218 0.536835 0.500068) *10* AMM (0.536835 0.500068 0.518451) *26*
C (0.518418 0.500001 0.536835) *26* SYSTEM (0.500001 0.536835 0.511313) *84* FAULTS, (0.536835 0.511313 0.502084) *26* ADJUST (0.511313 0.502084 0.530854) *26* PRESSURE (0.502084 0.530854 0.536723) *26* SWITCH, (0.530854 0.536723 0.500317) *13* FOR (0.536723 0.500317 0.502491) *26* ALTER (0.500317 0.502491 0.526291) *26* NATING (0.502491 0.526291 0.506677) *11* CURRENT (0.526291 0.506677 0.525109) *26* MOTOR (0.506677 0.525109 0.531013) *26* PUMP (0.525109 0.531013 0.500054) *26* (ACMP) (0.531013 0.500054 0.500013) *26* IN (0.500054 0.500013 0.505884) *10* CENTER (0.500013 0.505884 0.516218) *10* HYDRAULIC (0.505884 0.516218 0.536835) *26* SYSTEM (0.516218 0.536835 0.500068) *11* AMM (0.536835 0.500068 0.518451) *26*
Word, Code & Winner Nodes for Category 4
EXAMINE (0.772573 0.760548 0.784599) *79* RREPAIR (0.760548 0.784599 0.756085) *79* *79* BETWEEN (0.756085 0.754019 0.750301) *79* FCC (0.754019 0.750301 0.756376) *79* CONNECTOR (0.750301 0.756376 0.750802) *79* PIN (0.756376 0.750802 0.750802) *79* PIN (0.750802 0.750802 0.751140) *79* WDM (0.750802 0.751140 0.750971) *79* CHECK (0.753168 0.756019 0.750317) *79* FOR (0.756019 0.750317 0.756376) *79* CONTINUITY (0.750317 0.756376 0.754019) *79* BETWEEN (0.756376 0.754019 0.780423) *79* PINS (0.754019 0.780423 0.750021) *79* OF (0.780423 0.750021 0.756376) *79* CONNECTOR (0.750021 0.756376 0.751140) *79* WDM (0.756376 0.751140 0.753758) *79*
Also, for further illustration, the categories such as:
represent a document vector. This vector gets mapped to a single cell in the 3D SOM map. In this way,
SOM 1 Labeling
Labels Generated for each SOM 1 Elements is shown in figure For left top element in figure label is calculated as follows
SOM 1 Clustering
Following illustrates the formation of basis histograms obtained by training and using template 3D contextual relation and structured maps shown in
Following illustrates the Fuzzy prediction by inputting the above basis histograms extracted from each category in the process of training:
Input to Fuzzy prediction is 0 0 0 0 3 0 0 0 3 0 0 0 1 0 0 0 0 0 0 0 0 2. Correct classification is category 2. Classification obtaining by 3D dynamic information structured map is 1. Classification obtained using fuzzy prediction is category 2. Therefore, LVQ based negative learning is applied to the 3D template information structured map.
The computer-implemented system 500 includes a key-word/phrase extractor 530. The key-word/phrase extractor 530 is connected to the web server 520 and extracts multiple key-phrases from the received text. In some embodiments, the key-word/phrase extractor 530 can also extract multiple key-words from the received text and can form the multiple key-phrases from the extracted key-words. In some embodiments, the keyword/phrase extractor 530 extracts key-words from the received text based on specific criteria such as filtering to remove all words comprising three or fewer letters, filtering to remove general words, and/or filtering to remove rarely used words. The formed key-phrases can include one or more extracted key words and any associated preceding and succeeding (following) words adjacent to the extracted key words to include contextual information. In some embodiments, the key-word/phrase extractor 530 can further morphologize the extracted key-words based on fundamental characteristics of the extracted key-words. For example, the key-word/phrase extractor 530 can morphologize in such a way that morphed (altered) words' pronunciation or meaning remain in place.
An analyzer 540 is coupled to the key-word/phrase extractor 530 transforms each of the extracted product-related information and query key-words, phrases and/or morphed words to a unique numerical representation such that the transformed unique numerical representation does not result in multiple similar numerical representations, to avoid ambiguous prediction of meaning of the translated words in the received text. Analyzer 540 also performs the three dimensional mapping and classification as described above.
Block 550 represents an interface for communicating desired information generated by system 500. In some embodiments, block 550 provides the information to a display for display to a user. In further embodiments, block 550 provides the information via a network to another system on the network, or simply stores the information in local or remote storage for later use.
Computer-readable instructions stored on a computer-readable medium are executable by the processing unit 602 of the computer 610. A hard drive, CD-ROM, and RAM are some examples of articles including a computer-readable medium. For example, a computer program 625 capable of extracting desired intelligent information from unstructured text according to the teachings of the present invention may be included on a CD-ROM and loaded from the CD-ROM to a hard drive. The computer-readable instructions allow computer system 610 to provide generic access controls in a COM-based computer network system having multiple clients and servers.
The above-described computer-implemented technique provides, among other things, a method and apparatus for an intelligent information mining that can be domain independent, that can adapt in nature, that can exploit contextual information present in the text documents. In addition, the technique describes a closed loop system including an error feedback function to reduce clustering errors and cluster boundary sensitivity. The spherical SOM map described with respect to some embodiments is one illustration of the invention. Further embodiments utilized a generalized n dimensional SOM map.
Number | Name | Date | Kind |
---|---|---|---|
5283838 | Togawa et al. | Feb 1994 | A |
5343465 | Khalil | Aug 1994 | A |
5619709 | Caid et al. | Apr 1997 | A |
5625552 | Mathur et al. | Apr 1997 | A |
5640494 | Jabri | Jun 1997 | A |
5724987 | Gevins et al. | Mar 1998 | A |
5963965 | Vogel | Oct 1999 | A |
5974412 | Hazlehurst et al. | Oct 1999 | A |
6035057 | Hoffman | Mar 2000 | A |
6134541 | Castelli et al. | Oct 2000 | A |
6157921 | Barnhill | Dec 2000 | A |
6171480 | Lee et al. | Jan 2001 | B1 |
6226408 | Sirosh | May 2001 | B1 |
6625585 | MacCuish et al. | Sep 2003 | B1 |
6886010 | Kostoff | Apr 2005 | B1 |
6904420 | Shetty et al. | Jun 2005 | B1 |
6931418 | Barnes | Aug 2005 | B1 |
20020069218 | Sull et al. | Jun 2002 | A1 |
20020099675 | Agrafiotis et al. | Jul 2002 | A1 |
20020129015 | Caudill et al. | Sep 2002 | A1 |
20030210816 | Comaniciu et al. | Nov 2003 | A1 |
Number | Date | Country | |
---|---|---|---|
20040103070 A1 | May 2004 | US |