Contemporary systems may output large quantities of data. Such systems may vary widely and may include a variety of data. For example, such systems may include aircraft health data, financial data, or data from healthcare systems. When monitoring such complex systems, large quantities of data are generated and may hinder the understanding of events in the system.
In one aspect, an embodiment of the innovation relates to a method of categorizing events found in data including receiving data related to one or more systems or assets from the one or more systems or assets, receiving data regarding reported events, selecting an event from the data related to the one or more systems or assets based on the data regarding the reported events, and labeling the selected event.
In another aspect, an embodiment of the innovation relates to a system for categorizing events including a computer-searchable database containing data related to one or more systems or assets and data related to reported events and a labeling module in communication with the database and configured to: receive data from the database, select from the data an event related to the reported events table, and label the event based thereon.
In the drawings:
An initial pictorial explanation of an embodiment of the innovation may prove useful.
It is contemplated that feature extraction at 12 may be used to process the initial data 10. This may be done in any suitable manner including by a general-purpose computer running feature extraction algorithms. For example, the initial data 10 may be processed in some way to derive features that make the component health of the system or asset stand out. In this manner, the initial data 10 may be processed in such a way that it provides useful information, which is indicative of a notable event. The processed data may be thought of as unlabeled data information or unlabeled data events 14. Such unlabeled data events 14 along with reported events 16 may be processed and a selection 20 may be made based thereon. More specifically, an event may be selected from the unlabeled data events 14 based on the reported events 16. The selection 20 may also optionally take into consideration human validation 18, although this need not be the case. The selection 20 may also be performed in any suitable manner including by a general-purpose computer running a selection algorithm. Based on the selection 20 labeled data events 22 may be created. In this manner, the unlabeled data events 14 may be categorized into labeled data events 22.
Such labeled data events 22 may then be used for a variety of purposes. For example, classifiers must be trained in a supervised manner, which means that it is necessary to have a labeled dataset available, i.e. a list of events, with their features, labeled with the nature of the event. Thus, the labeled data events 22 may be used for classifier training and/or classifier testing. Such a classifier may be understood to be a model that performs statistical classification; more specifically such a model may identify to which set of categories a new observation belongs, on the basis of a training set of data containing observations whose category membership is known.
In this manner, the health of a fleet of assets may be monitored by continuously watching performance data and detecting events such as shifts or trends in performance indicators. When an event is detected, the embodiments of the innovation may determine the actual nature of the event such as whether it is a fault, which type of fault, or which severity, from the features of the event and from a list of reported events. Using this information to label the dataset in an automated way may be a challenge because the dates where the faults were detected or confirmed may not, in general, match the date where the event could be observed in the data. The embodiments of the innovation automatically or semi-automatically select data events that are the most likely to be related to each reported event and if necessary allow a user to confirm those matches in a quick and efficient way.
By way of non-limiting example, a system 50 for categorizing events has been illustrated in
As noted above, embodiments described herein may include a computer program product comprising machine-readable media for carrying or having machine-executable instructions or data structures stored thereon. Such machine-readable media may be any available media, which may be accessed by a general purpose or special purpose computer or other machine with a processor. By way of example, such machine-readable media can comprise RAM, ROM, EPROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of machine-executable instructions or data structures and that can be accessed by a general purpose or special purpose computer or other machine with a processor. When information is transferred or provided over a network or another communication connection (either hardwired, wireless, or a combination of hardwired or wireless) to a machine, the machine properly views the connection as a machine-readable medium. Thus, any such a connection is properly termed a machine-readable medium. Combinations of the above are also included within the scope of machine-readable media. Machine-executable instructions comprise, for example, instructions and data, which cause a general-purpose computer, special purpose computer, or special purpose processing machines to perform a certain function or group of functions.
Embodiments will be described in the general context of method steps that may be implemented in one embodiment by a program product including machine-executable instructions, such as program codes, for example, in the form of program modules executed by machines in networked environments. Generally, program modules include routines, programs, objects, components, data structures, etc. that have the technical effect of performing particular tasks or implement particular abstract data types. Machine-executable instructions, associated data structures, and program modules represent examples of program codes for executing steps of the method disclosed herein. The particular sequence of such executable instructions or associated data structures represent examples of corresponding acts for implementing the functions described in such steps.
Embodiments may be practiced in a networked environment using logical connections to one or more remote computers having processors. Logical connections may include a local area network (LAN) and a wide area network (WAN) that are presented here by way of example and not limitation. Such networking environments are commonplace in office-wide or enterprise-wide computer networks, intranets and the internet and may use a wide variety of different communication protocols. Those skilled in the art will appreciate that such network computing environments will typically encompass many types of computer system configurations, including personal computers, hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like.
Embodiments may also be practiced in distributed computing environments where tasks are performed by local and remote processing devices that are linked (either by hardwired links, wireless links, or by a combination of hardwired or wireless links) through a communication network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
An exemplary system for implementing the overall or portions of the exemplary embodiments might include a general purpose computing device in the form of a computer, including a processing unit, a system memory, and a system bus, that couples various system components including the system memory to the processing unit. The system memory may include read only memory (ROM) and random access memory (RAM). The computer may also include a magnetic hard disk drive for reading from and writing to a magnetic hard disk, a magnetic disk drive for reading from or writing to a removable magnetic disk, and an optical disk drive for reading from or writing to a removable optical disk such as a CD-ROM or other optical media. The drives and their associated machine-readable media provide nonvolatile storage of machine-executable instructions, data structures, program modules and other data for the computer.
It will be understood that the computer-searchable database 52 may be any suitable database, including a single database having multiple sets of data, multiple discrete databases linked together, or even simple tables of data. Regardless of the type of database, the computer-searchable database 52 may be provided on storage medium on a computer (not shown) or may be provided on a computer readable medium, such as a database server. It is contemplated that the computer-searchable database 52 may include data to be categorized and data that may aid in categorizing the uncategorized data. Such data may be received by the computer-searchable database 52 in any suitable manner. By way of non-limiting examples, such computer-searchable database 52 may include among other information data regarding reported events 54 and unlabeled data related to one or more systems or assets 56. It is contemplated that the computer-searchable database 52 may include additional information or additional data 58, such as data to aid in categorizing the data. It is further contemplated that such additional data 58 may include initial data as discussed above.
The labeling module 60 may be executed on a computer 62 configured to access or query the computer-searchable database 52 and classify selected events. It will be understood that the labeling module 60 may access the computer-searchable database 52 via a communication network or computer network coupling the labeling module 60 to the computer-searchable database 52. By way of non-limiting example, such a computer network may be a local area network or a larger network such as the internet. It is contemplated that the labeling module 60 may make repeated queries of the computer-searchable database 52. The labeling module 60 may be configured, for each of the reported events, to determine from the unlabeled events information which is the most likely to be related to a reported event. In implementation, such a selection process may be converted to an algorithm to correlate the unlabeled events and the reported events. Such an algorithm may be converted to a computer program comprising a set of executable instructions, which may be executed by the labeling module 60. Additional inputs to the computer program may include inputs from the computer-searchable database 52 and the learning module 70. The computer program may have an executable instruction set for receiving or querying data from the computer-searchable database 52, selecting from the at least one data table an event related to the reported events table, and labeling the event based thereon.
The learning module 70 may be executed on a computer 72 configured to communicate with the labeling module 60. The learning module 70 and/or the computer 72 may also be configured to communicate with the computer-searchable database 52. The learning module 70 may have a display 74 capable of presenting the selections made by the labeling module 60 to a user 80 for verification. Further, the learning module 70 may present alternative selections for a user to select from. The display 74 may be any suitable display for displaying information, verification methods, and alternative selections to the user 80. Although the display 74 has been illustrated and described as being included within the computer 72, it is contemplated that the display may be a separate device operably coupled to the learning module 70.
The learning module 70 may also have or be operably coupled to one or more cursor control devices 76 and one or more multifunction keyboards 78, which the user 80 may use to interact with the learning module 70 and the display 74. A suitable cursor control device 76 may include any device suitable to accept input from the user 80 and to convert that input to a graphical position on the display of the learning module 70. Various joysticks, multi-way rocker switches, mice, trackballs, and the like are suitable for this purpose. Although the labeling module 60 and learning module 70 have been illustrated separately, it is contemplated that they may be included in a single device. The learning module 70 may also provide the input made by the user 80 to the labeling module 60 and the inputs may be used by the labeling module 60 in its classification of the selected event.
During operation, the labeling module 60 may communicate with the computer-searchable database 52 and may receive data from the computer-searchable database 52 including unlabeled data and reported events data. The labeling module 60 may select an event from the unlabeled data that relates to a specific event in the reported events table, and the labeling module 60 may label the event based thereon.
The labeling module 60 may also determine a confidence in a relationship between the selected event and the reported events. For example, an algorithm may be used for determining the confidence. Such an algorithm may be converted to a computer program comprising a set of executable instructions, which may be executed by the labeling module 60. The labeling module 60 may determine whether the determined confidence satisfies a predetermined threshold. Based on such a determination, for example, when the confidence satisfies the predetermined threshold, the labeling module 60 may label the unlabeled event. Alternatively, the labeling module 60 may activate the learning module 70 for user input regarding the relationship between the selected event and the reported events. For example, when the learning module 70 is activated the display 74 may request that the user 80 approve the selected event or make an alternative selection. In such an instance, the labeling module 60 may label the selected event with a classifier based upon the user input. This may include that the labeling module 60 labels the selected event in the computer-searchable database 52. It is also contemplated that the labeling module 60 may be self-learning and may include example based evidentiary reasoning that may capture the input from the user 80 in a way that the system 50 can build and learn from the user interaction.
In accordance with an embodiment of the innovation,
At 104, the labeling module 60 may receive data regarding reported events. The reported events data may also take any suitable form including that the reported events data may include at least one data table. The reported events may include events that have been physically observed such as known faults that occurred on assets, with the date on which they were detected and their nature, a list of known economical events, or a list of confirmed medical diagnostics. In the instance where the faults relate to an asset, the reported events may include a log of maintenance records.
At 106, the labeling module 60 may select an event from the unlabeled data based on the data regarding the reported events. Such a selection may be achieved in various ways using various algorithms or computer programs. For example, the labeling module 60 may determine a most important event in the unlabeled data. In such an instance the labeling module 60 may ignore random fluctuation events and the labeling module 60 may use a simple algorithm to select the most important event in the recent history of the asset. This may require defining a way of measuring the significance of an event. For example, if shifts in parameters are considered, the importance of shifts can be measure by their size normalized by the noise level. The importance of an event may be estimated by calculating the sum of the normalized sizes of shifts in all parameters. This selection method may incorrectly select an event if there is a large step in an unrelated parameter.
The labeling module 60 may thus alternatively base the selection on one most important event in one or more relevant parameters. This may be achieved, for each event classification, by assigning weights to all parameters describing how relevant they are for the given event classification. For example, for an event classification of an oil leak, a weight of 90% may be assigned to oil pressure and 10% to sound level if they know that an oil leak always causes a drop in oil pressure and sometimes causes a specific noise. No weight may be assigned to vibration as vibration may be determined to be not relevant to an oil leak. In such an instance, a weighted sum of the normalized sizes may be used to determine the most important event observed in the relevant parameters for the event classification.
Further still, the labeling module 60 may use predefined classifiers to identify, in a list of data events, which event is, probabilistically, the most likely to be related to a reported event. This may require that models have been built for the classification of such events. Regardless of how the event is selected from, the unlabeled data, at 108, the labeling module 60 may label the selected event based on the data regarding the reported events. More specifically, the selected event may be tied to the reported event.
It will be understood that the method of categorizing and labeling events is flexible and the method 100 is illustrated merely for illustrative purposes. For example, the sequence of steps depicted is for illustrative purposes only, and is not meant to limit the method 100 in any way, as it is understood that the steps may proceed in a different logical order or additional or intervening steps may be included without detracting from embodiments of the innovation. By way of non-limiting example, the receiving of the data at 102 and 104 may occur in any order and the data may be received simultaneously. Further, the method 100 may also include determining a confidence level in the selection as illustrated in
If the threshold is satisfied, the method may move on to label the event at 108 as described above. If the threshold is determined to not be satisfied, then the method may require validation of the selection or an alternative selection to be made by a user at 114. This may be done by the learning module 70 as previously described. Once the validation or alternative selection is received, the method may move on to label the event at 108.
Specific examples may prove useful and thus the remainder of this description will pertain to data received regarding an aircraft. The labeling module 60 may receive the unlabeled dataset, which may take any form including that of an unlabeled data events table. A generic example of an unlabeled data events table is shown in Table 1 below.
The table may include a variety of information that may include raw data, data obtained by performing feature extraction on the initial data, asset IDs, the date of the data event, and various features or characteristics. Table 2, below, shows a table that is specific to a fleet of aircraft engines.
As illustrated, Table 2 provides information regarding a number of events related to a fleet of engines. Features have been obtained from the various initial engine data and five unlabeled data events have been shown along with various information regarding each event. More specifically, shifts in pressure, shifts in vibration level, information related to an oil valve, and dates of the selected shifts have been included.
Furthermore, the labeling module 60 may receive the reported events data. Again, such data may take any form including that of a table as shown in Table 3 below.
According to the system and methods described above, each event in the reported events table may be processed in turn. More specifically, for each event, algorithms may be used to determine, from the list of unlabeled events, which one is the most likely to be related to a reported event. For example, an algorithm to determine a most recent event may be used. In such an instance, if a reported event of type T occurred on date D on asset number A, then the last data event that occurred on the asset A before date D is selected to be labeled as T. This relies on the assumption that no data events occurred between the day the fault happened and the day it was discovered. This is not always the case, which is why alternative algorithms and methods such as those described above may be used.
Thus, from the two datasets, the selected event may be classified. A labeled data event table may be created based upon such classifications. In such a labeled data table, a label or classification has been assigned to the data events when they were found to match in terms of asset id, date, and ideally, features with a reported event. The label or classification indicates the nature of the matching reported event. Events in the unlabeled data set may be thus be categorized. The classified events may also take any form, including that of a table as shown below in Table 4.
Another example may prove useful. Table 5 illustrates a list of unlabeled data events where there have been shifts in performance parameters for an engine.
Table 6 illustrates a portion of a reported events table having one row of the list of reported events.
In the reported events information, an oil leak was detected in asset number 10 on Jun. 18, 2012. In looking for a related event in the unlabeled data events table the labeling module 60 may take into account that when an oil leak occurs, the oil pressure usually drops, but vibration is not affected. If the simple algorithm described above where the last data event that occurred on the asset A before date D is selected to be labeled as T is used the classification ‘oil leak’ will be associated with the latest data event that occurred in the asset before the discovery date. In the above example, this is event ID 4, which occurred on June 17. This may be an incorrect selection, as event ID 4 seems to be a minor event possibly due to random fluctuations in the data. Selecting event ID 3 would also be incorrect, as it seems to be an event related to vibration, and not the oil leak.
If the labeling module 60 were to detect a last data event before June 18 which is relevant to one of one of the classification parameters such as by assigning weights to all parameters describing how relevant they are for the given event classification then event ID 2 would be selected. Event ID 2 must have been due to the oil leak as it is the only event featuring a large drop in oil pressure and the labeling module 60 may utilize an algorithm that is weighted accordingly.
In some cases, algorithms may prove unable to identify correctly the relevant data event or they may select an event with a high level of uncertainty. This can be due to several data events being found in the asset near the date of the observed event, with similar significance or probabilities. If the confidence in the selection is less than a predetermined threshold, the selection may be validated by the user. This may be done by a user interface illustrating information based on the unlabeled data. Such graphical illustrations may make it easier for the user to determine the related event in a quick and efficient manner instead of requiring the user to look manually through the data.
By way of example, the labeling module may have selected event ID 3 as the oil leak, but with low confidence. A window 200, as shown in
Beneficial effects of the above described embodiments include that large amounts of data gathered from a complex system may automatically or semi-automatically be assessed by a system and unlabeled events may be categorized. Further, in the event that there is a low confidence in such a selection the information may be easily assessed by a user as the relevant data may be quickly and efficiently conveyed to a user.
Labeling datasets manually may be very time-consuming and performing such labeling automatically is not an easy task and may lead to mislabeled datasets, which may then produce incorrect models. The above described embodiments provide a variety of benefits including that data sets may be automatically or semi-automatically labeled in an accurate manner. The above described embodiments may be used to determine probabilistically the nature of events detected in the data and may rapidly provide for the categorization of such events giving a competitive advantage.
To the extent not already described, the different features and structures of the various embodiments may be used in combination with each other as desired. That one feature may not be illustrated in all of the embodiments is not meant to be construed that it may not be, but is done for brevity of description. Thus, the various features of the different embodiments may be mixed and matched as desired to form new embodiments, whether or not the new embodiments are expressly described. All combinations or permutations of features described herein are covered by this disclosure.
This written description uses examples to disclose the innovation, including the best mode, and also to enable any person skilled in the art to practice the innovation, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the innovation is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements with insubstantial differences from the literal languages of the claims.
This application is a national stage application under 35 U.S.C. §371(c) of prior filed, co-pending PCT application serial number PCT/EP2013/072207, filed on Oct. 23, 2013, titled “SYSTEM AND METHOD FOR CATEGORIZING EVENTS”. The above-listed application is herein incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2013/072207 | 10/23/2013 | WO | 00 |