The present invention may take physical form in certain parts and steps, embodiments of which will be described in detail in the following description and illustrated in the accompanying drawings that form a part hereof, wherein:
A plurality of computers, such as computers 102 and 104, may be coupled to user computers 112, 114 and 116 via networks 120 and 130. User computers 112, 114, and 116 may also be coupled to report parsing computer 132. One or more of the computers shown in
One or more networks may be in the form of a local area network (LAN) that has one or more of the well-known LAN topologies and may use a variety of different protocols, such as Ethernet. One or more of the networks may be in the form of a wide area network (WAN), such as the Internet.
The cellular network 190 may comprise a wireless network and a base transceiver station transmitter (not shown). The cellular network may include a second/third-generation (2G/3G) cellular data communications network, a Global System for Mobile communications network (GSM), GPRS, Wi-Fi, UMTS, CDMA, WCDMA, or other wireless communication network such as a WLAN network.
In addition, a broadcasting network 180 may include a radio transmission of IP datacast over DVB-H. The broadcast network 180 may broadcast a service such as a digital or analog television signal and supplemental content related to the service via a transmitter (not shown). The broadcast network 180 may also transmit supplemental content which may include a television signal, audio and/or video streams, data streams, video files, audio files, software files, and/or video games.
A mobile device such as mobile device 192 may comprise a wireless interface configured to send and/or receive digital wireless communications within cellular network 190 or broadcasting network 180. The mobile device may comprise a mobile telephone, personal digital assistants (PDAs), a digital player, a mobile terminal or the like. The information received by mobile device 192 through the cellular network 190 or broadcast network 180 may include voice data, electronic images, audio clips, and video clips. As part of cellular network 190, one or more base stations (not shown) may support digital communications with mobile device 192 while the mobile device 192 is located within the administrative domain of cellular network 190.
Computer devices such as computers 102, 104, and 112-116 may be connected to one or more of the networks via twisted pair wires, coaxial cable, fiber optics, radio waves or other media. It will also be appreciated that the network connections shown are illustrative and other techniques for establishing a communications link between the computers can be used such as TCP/IP, Bluetooth, Ethernet, FTP, HTTP, and IEEE 802.11x and the like may be utilized.
In an aspect of the invention, report parsing computer 132 may require information from external sources to process textual report data found in various log files and/or reports. Requests for such information may be transmitted from report parsing computer 132 to a data gathering system 138. Data gathering system 138 may include a processor, memory and other conventional computer components and may be programmed with computer-executable instructions to communicate with other computers and/or telecommunications devices. Data gathering system 138 may access such information from various data stores such as data store 140. Data store 140 may store log files and reports for a specified period of time for later review and analysis. In an embodiment of the invention, all report data may be stored in data store 140 and may be implemented with a group of networked server computers or other storage devices.
Report parsing computer 132 may be programmed with computer-executable instructions to parse log file data. With reference to
In
Returning to
Next, in step 304 a transaction database may be created from the textual tokens. The transaction database may be located external to the computing device such as data store 140.
In step 306, a search may be conducted to detect frequent patterns as illustrated in step 308. As those skilled in the art will realize searching for frequent patterns may involve an iterative process that may require several iterations of scanning until detection of frequent pattern emerges.
In an aspect of the invention, a frequent pattern may refer to a pattern whose frequency is greater than or at least as great as a frequency threshold. In another aspect of the invention, a frequent pattern may refer to selection of most often occurring patterns that emerge during the searching process. In various other embodiments, frequent patterns may comprise frequent sets, free sets and/or closed sets. A frequent pattern mining algorithm like, e.g., the Apriori algorithm may be used to detect the frequent patterns. However, as those skilled in the art will realize other frequent pattern mining algorithms may be utilized that are able to find frequent patterns in the data. The frequent patterns may be combinations of items (i.e., words) that occur often (i.e., there are more occurrences than a specified frequency threshold) together in the same transaction. In another aspect of the invention, a frequency detection algorithm may be used to detect frequent patterns.
In step 310, the detected frequent patterns may be filtered to detect various arrangements of patterns. The filtering of the frequent patterns may include examining each detected frequent pattern for various arrangements of patterns. The filtering may be used so that only patterns that represent message templates remain. Each item of a frequent pattern may be analyzed with the position of each item in the detected frequent pattern determined. As used in various aspects of the invention, position may refer to absolute positions of items within a record and/or relative positions between items. Those skilled in the art will realize that a position may be a distance measured from beginning or end of text. Furthermore, relative distances may be measured from message end, from middle most token, from an arbitrary anchor point, and/or related to other tokens included in a frequent pattern.
The position of each item of the detected frequent pattern may be compared. If the pattern consists of items whose positions within the transactions from which they originate are consecutive and there are gaps of at most “n” positions between the items, then the pattern is interpreted to represent a message template. The variable n may represent the maximum number of words that a variable field may contain. The variable n may be adjusted, but reasonable results may be obtained with values of n=1, n=2, n=3, and n=4. Those skilled in the art will realize that various other values may also be freely selected for n. The gaps in the pattern may represent variables that have been inserted into the template.
The results of filtering in step 310 may be displayed on display 208. For example,
In step 312, a message template may be generated based on the arrangements of patterns. The generated message templates may be used to parse free-text message data on an automatic basis as shown in step 314. The parsing of free-text message data based on a generated template may allow for processing of legacy log reports for various systems that include audit, financial reporting, and/or other similar systems.
In another aspect of the invention, frequent episodes may also be detected. In
In step 366, a search may be conducted to detect frequent episodes as illustrated in step 368. As those skilled in the art will realize searching for frequent episodes may involve an iterative process that may require several iterations of scanning until detection of frequent pattern emerges.
In step 370, the detected frequent episodes may be filtered to detect various arrangements of patterns. Each item of a frequent episode may be analyzed with the position of each item in the detected frequent episode determined. As used in various aspects of the invention, position may refer to absolute positions of items within a record and/or relative positions between items. Those skilled in the art will realize that a position may be a distance measured from beginning or end of text. Furthermore, relative distances may be measured from message end, from middle most token, from an arbitrary anchor point, and/or related to other tokens included in a frequent episode.
The position of each item of the detected frequent episode may be compared. The results of filtering in step 360 may be displayed on display 208. In step 372, a message template may be generated based on the arrangements of episodes. The generated message templates may be used to parse free-text message data on an automatic basis as shown in step 374.
In another aspect of the invention, the methods described above may be applied recursively to log entry chains in order to detect variable log entries in entry chains as illustrated in
While the invention has been described with respect to specific examples including presently preferred modes of carrying out the invention, those skilled in the art will appreciate that there are numerous variations and permutations of the above described systems and techniques that fall within the spirit and scope of the invention.