The subject matter disclosed herein, generally relates to processing of time series data. More specifically, the subject matter relates to correcting errors of monotonically non-decreasing operational data of a data source, for example a locomotive.
Locomotives, for example are complex electromechanical systems. A typical locomotive is equipped with one or more sensors to measure operational parameters of the locomotive. Continuously monitoring and recording of the operational parameters of the locomotive helps in many ways. The operational parameters that may be monitored include, but not limited to, speed, braking times, fuel consumption, mileage, distance traveled, power requirement in terms of KWh. Analysis of such data enables the customers to implement cost-effective maintenance schemes.
Several errors may be observed in the measured operational data and hence such errors need to be corrected for effective utilization. Observed errors in the measured operational data are due to, but not limited to, faulty sensors, switching of cab panels, and electronic errors. Systematic identification and documentation of the data errors are required to investigate the root causes responsible for generating inaccurate data within the locomotive panel readings. Conventionally, correction of errors of the received operational data is performed by manual processing. The manual processing is extensively labor intensive and not easily repeatable on additional data. Locomotive operational data is classified and hence, in-house processing of the measured data may be preferable and outsourcing of manual operation may not be an available option. Also, devising of newer techniques for processing of locomotive operational data requires access to a vast amount of locomotive operational data during design and validation phases.
An enhanced technique for correcting the operational data of a data source is desirable.
In accordance with one aspect of the present technique, a method for generating a corrected data for deriving a decision related to a data source is disclosed. The method includes receiving measurement data representative of an operational parameter from the data source. The operational parameter includes a monotonous time series data. The method also includes identifying an event based on the measurement data and determining an event category based on the identified event. The method further includes processing the measurement data using a statistical data correction technique, based on the determined event category, to generate the corrected data for deriving the decision related to the data source.
In accordance with another aspect of the present technique, a system for generating a corrected data for deriving a decision related to a data source is disclosed. The system includes a processor based device configured to receive measurement data representative of an operational parameter from the data source. The operational parameter includes a monotonous time series data. The processor based device is further configured to identify an event based on the measurement data and to determine an event category based on the identified event. The processor based device is further configured to process the measurement data using a statistical data correction technique, based on the determined event category, to generate the corrected data for deriving the decision related to the data source.
In accordance with another aspect of the present technique, a non-transitory computer readable medium encoded with a program to instruct a processor based device for generating a corrected data for deriving a decision related to a data source is disclosed. The program instructs the processor based device to receive measurement data representative of an operational parameter from the data source. The operational parameter includes a monotonous time series data. The program further instructs the processor based device to identify an event based on the measurement data and to determine an event category based on the identified event. The program also instructs the processor based device to process the measurement data using a statistical data correction technique, based on the determined event category, to generate the corrected data for deriving the decision related to the data source.
These and other features and aspects of embodiments of the present invention will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
Embodiments of the present invention relate to a statistical data correction technique applied to a measurement data received from a data source to generate a corrected data for deriving a decision related to the data source. The measurement data is a monotonically non-decreasing time series data representative of an operational parameter of the data source. An event is identified from the received measurement data based on a signal representative of a first derivative of the received measurement data. An event category is determined based on the identified event. The received measurement data is processed using a statistical data correction technique, based on the determined event category, to generate a corrected data.
The processor based device 114 may include a controller, a general purpose processor, or a Digital Signal Processor (DSP). The processor based device 114 may receive additional inputs from a user through a control panel or any other input device such as a keyboard of the computer system 112. The processor based device 114 is configured to access computer readable memory modules including, but not limited to, a random access memory (RAM), and read only memory (ROM) modules. The memory medium may be encoded with a program to instruct the processor based device 114 to enable a sequence of steps to correct errors in the measurement data measured by the sensors 104, 106, 108. In one embodiment the computer system 112 may be a standalone system and may be communicatively coupled to the data collection center 110. In another embodiment, the computer system 112 may be part of the data collection center 110.
The shift is representative of an error condition in the measurement data. In the illustrated embodiment, the shift at the data sample 208 is classified as a non-correcting shift. After the data sample 208, a new linear trend line 216 is generated different from a linear trend line 214 such that the two linear trend lines 214, 216 are not collinear. The data sample 210 of the illustration is classified as a self-correcting shift. The self-correcting shift generates a linear trend line 218 which is collinear with the linear trend line 216. Techniques for identification, classification, and correction of both non-correcting shift and self-correcting shift are explained in greater detail with reference to subsequent figures.
The identified event is indicative of the presence of an error in the measurement data. The error may belong to one among a plurality of categories including a self-correcting event, a non-correcting event, an out of range event, an intercept event and a date error event. The out-of-range event refers to a shift event at the last data sample of the measurement data. The intercept event refers to a deviation of an intercept value of a trend line of the measurement data from an intercept value of an average trend line of a fleet of data sources. A date error event may refer to a missing date, a date after the withdrawal of the data source from service, or to a date before the introduction of the data source into the service. An event category is determined based on the measurement data and the identified event as explained in the next paragraph with reference to
The identified event at the data sample 508 is referred to as a first event and the data sample 508 is selected as a “first point” of the measurement data. The identified event at the data sample 514 is referred to as a second event. In the illustrated embodiment, the first event at the data sample 508 and the second event at the data sample 514 are adjacent events. A data sample 510 adjacent to the second event at the data sample 514, is selected as a “second point” of the measurement data. The line joining the first point (the data sample 508) to the second point (the data sample 510) is referred to as the secant line 512. Similarly, the secant line 520 is formed with reference to an identified event corresponding to the data sample 514. For the formation of the secant line 520, the data sample 514 is referred to as a first event and is selected as a “first point”. The identified event at a data sample 516 is referred to as a second event. The first event and the second event at the data samples 514, 516 respectively are mutually adjacent events. A data sample 518 adjacent to the second event at the data sample 516, is selected as a second point. The secant line 520 joins the data sample 514 to the data sample 518. Similarly, a secant line is formed for every identified event of the curve 506.
A slope of a secant line is determined based on the coordinates of the first point and the second point joined by the secant line. For example, if the first point has a value y1 and the second point has a value y2, the slope of the secant line is represented by,
where t2 is the time instant corresponding to the second point and t1 is the time instant corresponding to the first point.
A score value corresponding to an identified event is determined based on the slope of the secant line corresponding to the identified event. The score value is represented by:
Where, sl is representative of a slope of the secant line corresponding to the identified event, med is representative of a median of a plurality of the first derivative values of measurement data, MAD is the median absolute deviation of a plurality of the first derivative values of the measurement data. In the illustrated embodiment, the score value corresponding to the data sample 508 is (−)32.20768 and the score value corresponding to the data sample 514 is (+)0.3259564. It may be noted herein that the magnitude of the score value corresponding to a non-correcting shift is greater compared to the magnitude of the score value corresponding to a self-correcting shift.
The magnitude of the score value determined as explained in the previous paragraph is compared with a second threshold value. If the score value is greater than the second threshold value, the identified event is declared as a non-correcting event. If the score value is smaller than or equal to the second threshold value, the event is declared as a self-correcting event. In one exemplary embodiment, the second threshold value may be equal to the first threshold value. The first threshold value and the second threshold values may be chosen based on at least one of the historical data, and user requirements. In an exemplary embodiment, the first threshold value is determined by empirical methods and the second threshold value is determined based on an average trend line corresponding to a plurality of measurement data.
The measurement data may be processed using a statistical data correction technique to generate a corrected data for deriving a decision related to the data source. The statistical data correction technique is based on the determined event category. In one exemplary embodiment, the processing involves removing a discontinuity in the measured data if the determined event category is a non-correcting event. The discontinuity may be removed by aligning two trend lines generated by the non-correcting event to be collinear. In another exemplary embodiment, the processing involves interpolating the measurement data if the determined event category is the self-correcting event. Interpolation refers to an averaging operation performed on a plurality of data samples along a pair of collinear trend lines generated by the self-correcting shift. In another exemplary embodiment, the processing involves extrapolating the measurement data if the determined event category is an out-of-range event. Extrapolation refers to an averaging operation performed on a plurality of data samples along a trend line and extending the trend line to a data sample at which an out-of-range event occurs. If the determined event category is the intercept event, the processing involves replacing the measurement data by a fleet level average data. The fleet level average data may be referred to as an average of a plurality of measurement data of the same operational parameter from a plurality of vehicles operating in a similar environment. In an exemplary embodiment of the processing technique, a date-error event is corrected. The processing of a date-error event involves including at least one of a missing date of operation of the data source, correcting a first date prior to a service introduction date of the data source, and correcting a second date after a service completion date (or data retrieval date) of the data source. For example, if the data source is operating from 1 Jan. 2007, any date entry prior to 1 Jan. 2007 is identified as a date error event. Similarly, for example, if the data source is withdrawn from service from 31 Dec. 2012, date entries after 31 Dec. 2012 are considered as date error events. As another example, if data is retrieved from the data source on 4 May 2010, a date entry after 4 May 2010 is considered as a date error event. When a date entry for a data sample of the measurement data is not available, a missing date of operation is determined to correct the date error event. For example, if a first data sample has a date entry of 1 Mar. 2008 and a second data sample has a date entry of 1 Apr. 2008, a date error event in-between the first data sample and the second data sample is corrected by determining a suitable date in between 1 Mar. 2008 and 1 Apr. 2008.
The decision related to the data source generated by the statistical data correction technique includes, but not limited to, prognostics information about the data source. The decision may also be related to the end of life of one or more individual components of the data source. The decision related to the data source helps to build accurate reliability models that are used in estimating price of maintenance contracts of the data source and to predict the short and long term profitability of offerings from the service provider.
A first derivative of the data samples of the measurement data after the date correction is computed 1006. Thereafter, the first derivative is compared with a first threshold value 1008. If the first derivative corresponding to a data sample is greater than the first threshold value, an event is identified at the corresponding data sample and the time instant corresponding to the data sample is recorded 1012. If the first derivative is lesser than the first threshold value, the measurement data at the corresponding data sample is considered as error free data 1010.
For each of the identified event, a score value is determined 1014 based on the date corrected measurement data and the identified event. The score value is determined by constructing a secant line at the identified event, determining a slope of the secant line using equation (1), and by computing a statistical value based on the determined slope value using equation (2). The score value is then compared with a second threshold value 1016 and an event category of the identified event is determined based on the comparison. If the score value is greater than the second threshold value, the identified event is determined as a non-correcting event 1018. If the score value is lesser than or equal to the second threshold value, the identified event is determined as a self-correcting event 1020.
The measurement data is processed based on the determined event category to correct one or more errors. Furthermore, events are corrected according to the following sequence including self-correcting event, an out of range event, a non-correcting event, and an intercept event. The measurement data is interpolated 1022 at the self-correcting event to correct a self-correcting error. If the identified event corresponds to the last data sample among the plurality of data samples, an out-of-range event is identified and the measurement data is extrapolated 1024 to correct the error. In the case of the non-correcting event, the measurement data is processed to remove the discontinuity 1026. If the identified event is an intercept event, the intercept value of the measurement data of the data source is replaced by a fleet level average data 1028 to correct the error condition. The processed data 1030 is free of date errors and shift errors.
The exemplary statistical data correction technique facilitates to build accurate reliability models of the data source. When the data source is a self-propelled vehicle such as locomotives, for example, the exemplary statistical data correction technique provide inputs to models that competitively price and predict the short and long term profitability of maintenance contract associated with the vehicle.
It is to be understood that not necessarily all such objects or advantages described above may be achieved in accordance with any particular embodiment. Thus, for example, those skilled in the art will recognize that the systems and techniques described herein may be embodied or carried out in a manner that achieves or improves one advantage or group of advantages as taught herein without necessarily achieving other objects or advantages as may be taught or suggested herein.
While the technology has been described in detail in connection with only a limited number of embodiments, it should be readily understood that the invention are not limited to such disclosed embodiments. Rather, the technology can be modified to incorporate any number of variations, alterations, substitutions or equivalent arrangements not heretofore described, but which are commensurate with the spirit and scope of the claims. Additionally, while various embodiments of the technology have been described, it is to be understood that aspects of the inventions may include only some of the described embodiments. Accordingly, the inventions are not to be seen as limited by the foregoing description, but are only limited by the scope of the appended claims.