Many utilities have begun the transition from using traditional analog or digital meters to installing smart meters at customer sites. These smart meters provide a number of technological advantages, one of them being the ability to communicate usage information directly with the utility. With the advent of smart meter technology the traditional utility meter reader is being replaced by automated communication methods between the utility and the smart meter directly. Once smart meters are installed in a utility's territory, there is no longer a need to send meter reading personnel to read each individual meter. A drawback of the lack of eyes in the field reading meters is the reduction of theft leads because traditionally many reports of energy theft came directly from the meter readers themselves as they were able to observe signs of meter tampering while reading the meters. The lack of eyes in the field to generate leads requires new ways to generate theft detection leads. Fortunately, smart meters provide a significant amount of data that can be used in the theft detection process, such that leads can be generated from data analysis processes instead of through direct meter visual observation.
Data analysis of steaming data is not limited to theft detection for a utility. Analogous to the streams of interval meter read data that are created by smart meters are processes outside the utility industry also creating streams of data. One can envision a series of credit card transactions as a stream of data, scans at a check-out counter for a particular cashier as a stream of data, or a series of shows watched on TV as reported by a cable box as a stream of data. These streams of data can all be monitored remotely and fused with other available data in order to evaluate the likelihood of the presence of an activity. The activity may be theft related or may be completely unrelated, such as monitoring a stream of data for signs of a change in household demographics.
Accordingly, there is a need for alternative systems, program products on machine-readable media, and methods for detecting the presence of an activity remotely by analyzing data streams along with other information available which may impact the data points in a data stream, as will be discussed in further detail below.
With reference to the figures where like elements have been given like numerical designations to facilitate an understanding of the present subject matter, various embodiments of a system and method for compensating for timing misalignments are described. In order to more fully understand the present subject matter, a brief description of applicable circuitry will be helpful.
Embodiments of the present disclosure provide a description of an activity detection process for use in a utility with smart meter implementations, or any other industry that can provide streams of user activity. The process is applicable to both electricity and gas theft detection processes, but is not limited to utility based theft detection. As discussed in further detail below, embodiments of the present disclosure combine information fusion techniques with machine learning algorithms in order to evaluate an entity's potential involvement in an activity based on three individual measures: comparison to their peers, comparison to self, and comparison to truth. By fusing interval data, externally acquired customer attributes, and system alerts such as meter events with state of the art machine learning processes a robust activity detection process is created.
As used herein, the term “entity” may refer to an individual person and/or customer and/or end user, a single household, a single premises, a single structure, an individual card (e.g., credit card, debit card, or similar financial-related device), or any similar thing or unit consistent with the application for which the disclosed system and method is employed.
With attention drawn to
A first scoring mechanism, or module or process, is a peer comparison module 120 which evaluates an entity's data stream as compared to the data streams of the entity's peers and calculates a peer comparison score. A second scoring mechanism is a self comparison module 130 which compares an entity's present data stream to the entity's historical data stream and calculates a self comparison score. A third scoring mechanism is a truth comparison module 140 which compares the entity's data stream to the data streams of known actors of an activity and calculates a truth comparison score. Each of the peer comparison, self comparison, and truth comparison modules functions using one or more of the set of inputs 110, where the inputs used by one module may be different than the inputs used for another module. The set of inputs 110 includes data streams 101, system alerts 102, ground truth data streams 103, impact variables 104, and external data 105. Each of these is discussed in more detail below.
After the three comparison scores are calculated, a fusion process is used in the score fusion process 150 to fuse the three individual scores and calculate a single activity confidence score. The score fusion process 150 may take on one or more of different forms. One form may determine an activity confidence score using a simple weighted average between the peer comparison module 120, the self comparison module 130, and the truth comparison module 140. In an embodiment, a more complex form may utilize machine learning techniques, such as a Bayesian Network in order to learn over time the correlation of the various modules' scores on actual target activity detection. The score fusion process 150 may also make use of internal system alerts to boost activity confidence scores where applicable. In a non-limiting example, in an energy theft detection scenario, the presence of meter alerts may be included in the score fusion process. Meter alerts may be correlated with the self comparison and truth comparison scoring modules in order to boost the confidence in those scoring modules.
A lead (i.e., an indicator that an entity is engaged in the target activity) may be generated in the lead generation block 160 from the activity confidence score. Additionally, the lead may be sent to an investigation unit 170 which may perform further investigation on the entity associated with the lead. In an embodiment, leads are generated based on the activity confidence scores coupled with any system alerts 102 that may be helpful in identifying the target activity. Leads may be prioritized based on any number of factors. As a non-limiting example, in an energy theft detection scenario, leads may be prioritized based solely on the activity confidence score, based on likelihood to recover revenue, or based on total amount of potential revenue recovery from an entity.
In an embodiment, once activity leads are generated by the lead generation block 160, a feedback process may be put in place to provide feedback to the score fusion process 150. The investigation unit 170 minimally needs to provide information as to whether or not the target activity was actually observed for an entity. The feedback process may be an automated process based on responses received by entities targeted by a target activity lead list from lead generation block 160. As a non-limiting example, in an energy theft detection scenario, the feedback from the investigation unit 170 may result from technicians in the field reporting back to the system the status of service investigations performed on the actual meters of the suggested theft detection leads. If the lead resulted in catching a thief, the feedback would be positive. If the lead was a false alarm the technician would report that back to the system as well. Feedback from the investigation unit may be provided back to the score fusion block 150 to refine the process used to calculate activity confidence scores.
In an embodiment, a data stream 101 is required for each of the peer comparison, self comparison, and truth comparison modules. Typically, the data stream is associated with a single entity. The contents of a data stream may vary by industry, but a data stream typically comprises a time series of data points for an entity. As a non-limiting example, in an electric utility industry scenario a data stream for an entity typically includes time series data corresponding to energy consumption data by the entity. In an embodiment, the entity in the case of an electric utility scenario is a single premises, and the data stream may be provided in relatively small time intervals such as every 15 minutes, hourly, or daily. As another non-limiting example, in a gas utility industry scenario the data stream for an entity would typically correspond to the amount of gas used by the entity during each reported time interval, which is typically reported in therms. As yet another non-limiting example, in a cable television industry scenario, the data stream for an entity may correspond to the series of channels watched by the entity, e.g., by the people in a particular household.
In an embodiment, a system alert 102 includes data available from, for example, an infrastructure. The system alerts can be used to help identify an activity of interest. In a non-limiting example, for an electric utility theft detection process, system alerts may correspond with alerts generated by smart meters, such as tamper alerts, customer account transitions (e.g., new account, canceled account), bill payment history, service information such as cuts in electric service or activation of a meter, as well as system wide outage information. In other industries the system alerts may correspond to a schedule of known programming, holiday schedules, or store hours. Other examples of system alerts 102 include, but are not limited to, alerts generated by cable set-top boxes, vehicles, or computer alerts such as an incorrect username and/or password. Those of skill in the art will understand that the system alerts are not limited to the system alerts described above but include other alerts that may be available from, for example, the infrastructure. In an embodiment, system alerts 102 is used by the score fusion block 150.
In an embodiment, ground truth 103 includes data streams of known actors of an activity, e.g., a data stream for a known energy thief. The ground truth 103 typically includes one or more entities and their corresponding data streams that display attributes of a desired activity that is being monitored. In an energy theft scenario, the ground truth 103 would correspond to a set of the meter read data (i.e., data stream) of one or more known energy thieves. In an embodiment, ground truth 103 is used by the truth comparison block 140.
In an embodiment, the peer comparison 120, self comparison 130, and/or truth comparison 140 modules may use regression analysis or other forecasting techniques to predict data streams moving forward for an individual. The data streams may correlate with impact variables where the impact variables may affect a data stream. Impact variables 104 include, but are not limited to, an hour of the day, a day of the week, a temperature value local to an entity, a cloud cover value local to an entity, a humidity value local to an entity, a minutes of sun value local to an entity, a holiday schedule for an entity, a television schedule for an entity, and combinations thereof. In an embodiment, external variables needed for data stream forecasting are provided to the activity detection system and/or process as impact variables. As a non-limiting example, in an energy theft scenario external variables include weather, season, and day of the week. In other industries the impact variables may be different, as appropriate for that industry. In an embodiment, impact variables 104 are used by the peer comparison block 120, the self comparison block 130, and the truth comparison block 140.
In an embodiment, external data 105 includes attributes of an entity that typically are particular to the entity itself. As a non-limiting example, in an energy theft scenario, external data 105 may correspond, where appropriate, with attributes detailing an entity's physical premises, an entity's demographics, an entity's financial state, or combinations thereof. Additionally, external data 105 may include one or more of premises attributes such as square footage, type of construction materials, the presence of a basement, a local air conditioning code, and a location. Furthermore, external data 105 may include one or more of demographic attributes such as an age of an entity or the ages of the occupants of a premises, a number of persons in the premises, ethnicity of an entity, an indicator of an environmental interest of an entity, and whether the entity owns or rents a premises. Still further, external data 105 may include one or more of financial attributes such as attributes of a financial cluster to which an entity may belong, a credit score for the entity, a mortgage amount for the entity, and a credit state for the entity. In an embodiment, external data 105 is used by the peer comparison block 120.
Considering
In an embodiment, the peer group assignment process 200 is used to identify one or more peer groups for an entity when those peer groups are not known a priori. If peer groups for an entity have already been identified, those peer group assignments can be used and the peer group assignment process 200 may be skipped. For an entity or sets of entities with unknown peer groups, identifying the peer groups which each of those entities may belong may follow a process similar to the peer group assignment process 200. In an embodiment, data streams 101 for an entity and any predetermined impact variables 104 are use in order to create an individual model 221 for each entity. The individual model 221 may include a regression model with several different coefficients. If regression modeling is not the best suited model for the type of data stream available, other modeling approaches may be used. Once an individual model 221 is generated for each entity, a model clustering process 222 can be employed on the individual models. The model clustering process 222 operates to group entities with similar individual models. A non-limiting example of a model clustering approach includes using a Gaussian mixture model coupled with an expectation maximization fitting algorithm to group entities into clusters. The output of the model clustering process 222 is used to train the predictive model used in the cluster to attribute correlation process 223.
In an embodiment, external data 105, as discussed above, is used to predict the peer group assignments in process 225 without considering the data streams of the entities, as follows. In the cluster to attribute correlation process 223, a correlation process is utilized in which external data 105 is used to predict particular clusters of attributes of the entities. In an embodiment, the output of the model clustering process 222 is used to train the predictive model used in the cluster to attribute correlation process 223, as stated above. The correlation process may include one or more different forms of correlation. One such form may be a decision tree type of analysis. Once attribute to peer group correlations are learned in the cluster to attribute correlation process 223, the attribute correlation model is used to assign each entity to a cluster in the attribute based cluster assignment process 224. Then, the entities are assigned to one or more peer groups in the peer group assignment process 225.
In an embodiment, entities are assigned to clusters based on similarity between their data stream models. Once clusters of similar data streams have been identified, external attributes are used to predict to which cluster an entity should belong, such that an entity is clustered only by that entity's external attributes, not by that entity's data stream and/or individual model. An entity may be assigned to exactly one cluster, or an entity may be assigned a likelihood of belonging to several clusters. As a non-limiting example, a peer group for the electric utility industry may correspond to a group of entities that use a similar amount of energy and respond similarly to impact variables such as weather.
In an embodiment, once the peer group assignments 225 have been made (based on the entity's external data 105 as described above), the peer comparison scoring process 300 compares, at the normalcy scoring process 226, a first entity's data stream 101 with the data streams of other entities assigned to a particular peer group that includes the first entity. This comparison can take on many forms and provides a method of assessing how similar the first entity is to the other entities in the particular peer group. A normalcy score is calculated, individually, for each peer group cluster that the first entity has been assigned. A typical normalcy scoring process may use the cumulative distribution function of the peer group cluster to estimate how far away an entity is from the majority of his peers in the particular peer group. As a non-limiting example, in an electric energy theft scenario, the peer groups may be characterized by a mean daily kWh usage, and a distribution is fit to the data, such as a Gamma distribution. The normalcy score for each entity to an assigned cluster becomes the cumulative distribution of the Gamma function for the entity's mean daily kWh usage. Thus, the normalcy score provides an indication of the percentage of an entity's peers that use less energy than the entity does.
Since an entity may be assigned with certain probabilities to more than one peer group, the entity's peer group scores for all peer groups to which the entity is assigned are combined in the normalcy score combination process 227. The normalcy score combination may take many forms. In an embodiment, a simple approach may use a weighted average of all normalcy scores weighted by the probability that an individual belongs to that peer group.
After the normalcy score combination process, the score transformation process 228 is utilized to transform the combined normalcy score into a standardized value, such as a 0 to 1 value. A non-limiting example of a score transformation process that may be used is a sigmoid function. The standardized value is the peer comparison score 229. In an embodiment, a peer comparison score closer to 1 provides more indication of the target activity, and a peer comparison score of 0 provides less indication of the target activity. As a non-limiting example, in the case of the energy theft detection peer comparison model, a peer comparison score close to 1 would indicate that an entity had considerably lower usage than the entity's peers which may be an indication of energy theft by the entity.
In an embodiment, the self comparison module, or process, 130 compares updates to an entity's data stream to the entity's historic data stream. The process involves generating an individual model 431 of the entity's data stream 101 taking into account impact variables 104. The individual model 431 may be, but does not have to be, the same as the individual model 221 in
At block 436, if the observed change is sustained for a certain, predetermined, period of time, the entity is assigned to a false positive reduction process 437. The false positive reduction process is used to search for explanations for the observed change which may entail using input from external data 105. If, at block 436, the observed change is not sustained for the certain, predetermined, period of time, the entity's data stream is considered to be normal, the entity's individual model 431 is updated and the entity continues to be monitored for future unexpected changes.
As a non-limiting example, in an electric energy theft scenario, the false positive reduction process 437 may employ a model that looks to detect if the energy consumption profile for the entity is consistent with an occupied premises versus a vacant premises, or may correlate changes in a premises' demographics to drops in energy consumption. Entities that have a sustained observed change in their data stream and who pass all false positive checks in the false positive reduction process 437 are provided a high self comparison score by the self comparison score process 438 which, in an embodiment, is a score closer to 1.0. Entities whose data streams are consistently similar to their forecasted value, as determined in change detection block 434, are provided self comparison scores, by the self comparison score process 438, which, in an embodiment, are closer to 0.
Now considering
In an embodiment, the truth comparison module, or process, 140 compares an entity's data stream to data streams of entities exhibiting the target activity. As a non-limiting example, in an energy theft scenario, the truth comparison module 140 may compare the energy consumption patterns of known energy thieves, i.e., ground truth 103, to energy consumption patterns (i.e., data streams) of other entities in the system. If the interval data of a utility customer entity matches the patterns exhibited by known thieves of the utility, the entity would be given, in an embodiment, a high score for the truth comparison module 140.
In an embodiment, the truth comparison module 140 contains three main steps. The activity models process 541 defines models of the activity that is being monitored, i.e., the target activity. Since there may be more than one indicator of a target activity, there may be more than one activity model 541. In a non-limiting example, in the case of energy theft detection one activity model may use the ground truth data stream 103 to model known thieves that have stolen energy by intermittently bypassing their meter on nights and weekends, while another activity model may model those thieves that steal energy by tampering with the meter causing it to issue large numbers of false reads and alerts. The data stream 101 of an entity is then compared to the activity models 541 at the activity detection process 542 which results in a score at scoring process 543. If an entity's data stream 101 matches an activity model 541, the entity is given, in an embodiment, a high score from the truth comparison scores process 544. In an embodiment, if an entity's data stream 101 does not compare to any activity models 541, the entity is given, in an embodiment, a low score by the truth comparison scores process 544.
With attention now drawn to
Now turning to
Considering
With attention now drawn to
In a still further embodiment, the system includes a transmitter 1203 for transmitting to an investigation unit 1210 an identifier for the first entity, and a receiver 1204 for receiving from the investigation unit 1210 information regarding an analysis of the target activity associated with the first entity. Additionally, the processor 1202 modifies the activity confidence score for the first entity based on the analysis of the target activity associated with the first entity.
In another embodiment, the system 1200 calculates a peer comparison score and includes circuitry for generating an individual model for the first entity based on at least one of a data stream for the first entity and an impact variable for the first entity; circuitry for assigning the first entity to one or more clusters in a first set of clusters based on the individual model for the first entity; circuitry for assigning the first entity to one or more clusters in a second set of clusters based on a set of external data for the first entity; circuitry for correlating the assigning of the first entity to one or more clusters in the first set of clusters with the assigning of the first entity to one or more clusters in the second set of clusters, where the results of the correlation are used to refine the assigning of the first entity to the second set of clusters; and circuitry for assigning the first entity to at least one peer group based at least in part on the results of the correlation.
In yet another embodiment, the system 1200 further calculates the peer comparison score using circuitry for assigning other entities (i.e., not the first entity) of the plurality of entities to one or more peer groups; circuitry for calculating a first normalcy score for the first entity in a first peer group including comparing the data stream for the first entity with the corresponding data streams of other entities assigned to the first peer group; circuitry for calculating a second normalcy score for the first entity in a second peer group including comparing the data stream for the first entity with the corresponding data streams of the other entities in the second peer group; circuitry for calculating a combined normalcy score for the first entity based on the first normalcy score and the second normalcy score; and circuitry for normalizing the combined normalcy score for the first entity.
In still yet another embodiment, the system 1200 calculates the self comparison score using circuitry for generating an individual model for the first entity based on at least one of a data stream for the first entity and an impact variable for the first entity; circuitry for calculating a forecast data stream for the first entity based at least in part on the individual model for the first entity; circuitry for calculating a difference between the individual model for the first entity and the forecast data stream for the first entity; circuitry for calculating a time period for which the calculated difference is greater than a first predetermined threshold; and circuitry for assigning the self comparison score to the first entity based on at least one of the calculated difference and the time period.
In still yet a further embodiment, the system 1200 calculates the truth comparison score using circuitry for comparing a ground truth data stream for the target activity with a data stream model for the first entity; and circuitry for assigning a truth comparison score to the first entity based at least in part on a result of the comparison of the ground truth data stream with the data stream model for the first entity.
Other embodiments of the present disclosure include a machine-readable medium having stored thereon a plurality of executable instructions to be executed by a processor, the plurality of executable instructions comprising instructions to: calculate for each entity in a plurality of entities: a peer comparison score for the activity for the entity; a self comparison score for the activity for the entity; a truth comparison score for the activity for the entity; and an activity confidence score for the entity; and select a first entity from the plurality of entities, where the selecting is based at least in part on the activity confidence score for the first entity.
A further embodiment includes additional executable instructions comprising instructions to: transmit to an investigation unit an identifier for the first entity; receive from the investigation unit information regarding an analysis of the activity associated with the first entity; and modify the activity confidence score for the first entity based on the analysis of the activity associated with the first entity.
Another embodiment includes additional executable instructions comprising instructions to calculate the activity confidence score based at least in part on one or more of the peer comparison score, the self comparison score, and the truth comparison score.
Still another embodiment includes additional executable instructions comprising instructions to calculate the peer comparison score by: generating an individual model for the first entity wherein the individual model is based on at least one of a data stream for the first entity and an impact variable for the first entity; assigning the first entity to one or more clusters in a first set of clusters, wherein the assigning to the one or more clusters in the first set of clusters is based on the individual model for the first entity; assigning the first entity to one or more clusters in a second set of clusters, wherein the assigning to the one or more clusters in the second set of clusters is based on a set of external data for the first entity; correlating the assigning of the first entity to one or more clusters in the first set of clusters with the assigning of the first entity to one or more clusters in the second set of clusters, wherein the results of the correlation are used to refine the assigning of the first entity to the second set of clusters; and assigning the first entity to at least one peer group, wherein the assigning to the at least one peer group is based at least in part on the results of the correlation.
Yet still another embodiment includes additional executable instructions comprising instructions to calculate the peer comparison score by: assigning other entities of the plurality of entities to one or more peer groups; calculating a first normalcy score for the first entity in a first peer group, wherein the calculating of the first normalcy score includes comparing the data stream for the first entity with the corresponding data streams of other entities assigned to the first peer group; calculating a second normalcy score for the first entity in a second peer group, wherein the calculating of the second normalcy score includes comparing the data stream for the first entity with the corresponding data streams of the other entities in the second peer group; calculating a combined normalcy score for the first entity, wherein the combined normalcy score is based on the first normalcy score and the second normalcy score; and normalizing the combined normalcy score for the first entity.
Yet further embodiment includes additional executable instructions comprising instructions to calculate the self comparison score by: generating an individual model for the first entity wherein the individual model is based on at least one of a data stream for the first entity and an impact variable for the first entity; calculating a forecast data stream for the first entity, wherein the forecast data stream is based at least in part on the individual model for the first entity; calculating a difference between the individual model for the first entity and the forecast data stream for the first entity; calculating a time period for which the calculated difference is greater than a first predetermined threshold; and assigning the self comparison score to the first entity based on at least one of the calculated difference and the time period.
Yet still a further embodiment includes additional executable instructions comprising instructions to calculate the truth comparison score by: comparing a ground truth data stream for the activity with a data stream model for the first entity; and assigning a truth comparison score to the first entity based at least in part on a result of the comparison of the ground truth data stream with the data stream model for the first entity.
While some embodiments of the present subject matter have been described, it is to be understood that the embodiments described are illustrative only and that the scope of the invention is to be defined solely by the appended claims when accorded a full range of equivalence, many variations and modifications naturally occurring to those of skill in the art from a perusal hereof.
Number | Name | Date | Kind |
---|---|---|---|
5586126 | Yoder | Dec 1996 | A |
6397166 | Leung et al. | May 2002 | B1 |
6549463 | Ogura et al. | Apr 2003 | B2 |
8180873 | Bhatt et al. | May 2012 | B2 |
20020082886 | Manganaris et al. | Jun 2002 | A1 |
20090240609 | Cho et al. | Sep 2009 | A1 |
20090248560 | Recce et al. | Oct 2009 | A1 |
20100076613 | Imes | Mar 2010 | A1 |
20100286937 | Hedley et al. | Nov 2010 | A1 |
20110307932 | Fan et al. | Dec 2011 | A1 |
20120278227 | Kolo et al. | Nov 2012 | A1 |
20120284790 | Bhargava | Nov 2012 | A1 |
20130045755 | Davis | Feb 2013 | A1 |
20130096987 | Omitaomu et al. | Apr 2013 | A1 |
20130103215 | Dai et al. | Apr 2013 | A1 |
20130191052 | Fernandez et al. | Jul 2013 | A1 |
20130226689 | Nemitz et al. | Aug 2013 | A1 |
20130227286 | Brisson | Aug 2013 | A1 |
20130227689 | Pietrowicz et al. | Aug 2013 | A1 |
20130307693 | Stone et al. | Nov 2013 | A1 |
20140035752 | Johnson | Feb 2014 | A1 |
20140074510 | McClung et al. | Mar 2014 | A1 |
20140163927 | Molettiere et al. | Jun 2014 | A1 |
20140214464 | Willis et al. | Jul 2014 | A1 |
Entry |
---|
Angelo Costa et al.—“Sensor-driven agenda for intelligent home care of the elderly” Expert Systems with Applications—vol. 39, Issue 15, Nov. 1, 2012, pp. 12192-12204. |
Youngki Lee et al.—“Scalable Activity-Travel Pattern Monitoring Framework for Large-Scale City Environment” Published in: Mobile Computing, IEEE Transactions on (vol. 11, Issue: 4); Jun. 9, 2011—pp. 644-662. |
Mashima et al:, “Evaluating Electricity Theft Detectors in Smart Grid Networks”; Georgia Institute of Technology; pp. 1-20. |
Number | Date | Country | |
---|---|---|---|
20140280208 A1 | Sep 2014 | US |