The embodiments herein generally relate to sensor data processing and more particularly, to a system and method for forecasting sensor data using a deep learning model.
In present times, urban environment is monitored to make its infrastructure smart using multiple sensors which are located at public places, for example ATM, banks, administrative areas, buildings, shopping, petrol station, airport, transport area, health care or hospital area, natural-geographical locations, rest areas, hang-outs, tourist sights, museums, restaurants etc. These sensors help in smart decision making and for automation of the city administration. These sensors may detect noise, environmental parameters, vehicles etc. to measure and monitor various infrastructure and operational aspects of a city.
At present, the sensors by themselves are not very reliable and have limitations due occurrence of error while measuring. Causes of error may be power cuts, Wi-fi connection loss, artifacts, manufacturing defects, environmental aspects such as dust etc. Further, lifespan of the functioning sensor is also not predictable in outdoor environment. To overcome these limitations, usually multiple sensors are deployed for automation and an estimation is made considering data from all sources. Existing solutions of data optimization are based on anomaly detection only. Anomaly detection may identify anomaly with respect to historical data of that sensor alone. Which is not sufficient to arrive at close, accurate or appropriate probable sensor values of a faulty sensor. Also, in existing systems it is not possible to identify the origin of the error. It is also not possible in existing approaches to suggest a correction value with less margin of error. Thus, human input is required to overcome the sensor errors and it is not possible to correct the faulty values with minimum margin of error by existing approaches.
Accordingly, there remains a need for comprehensive approach for predicting or forecasting the sensor data for automation in urban environment.
In an embodiment, a sensor data forecasting system that forecasts sensor data using a deep learning model is provided. The sensor data forecasting system includes a memory that stores a set of instructions and a processor that executes the set of instructions and is configured to generate a database of a time stamped and indexed sensor data, wherein the sensor data is received from a plurality of sensors of a plurality of sensor types implemented in a location, characterized in that, the processor is configured to (i) determine a false value by analyzing the time stamped and indexed sensor data, wherein the false value is determined based on predetermined parameters that comprise one or more of a constant value, an abnormally high or low value, a false value that is determined to be impossible or improbable, or a calibration error, (ii) determine a category of the false value by analyzing one or more of (a) historical sensor data of a first sensor, (b) comparative sensor data of the first sensor and a second sensor, and (c) comparative sensor data of one or more third sensors and the first sensor, wherein the first, second and third sensors are selected from the plurality of sensors, wherein the first sensor and the second sensor belong to a first sensor type of the plurality of sensor types and the one or more third sensors belongs to a second sensor type of the plurality of sensor types, (iii) determine an imputation method based on the category of the false value, wherein the imputation method employs one or more of (1) a Kalman filter, (2) a nearest neighbor value, (3) a statistical analysis of repeating sensor values of the plurality of sensors, (iv) impute the false value or determine an erroneous sensor from the plurality of sensors, (v) implement the Kalman filter that determines a sensor variance at each data point of the sensor data to generate optimum sensor value, (vi) forecast sensor data for a subsequent time stamps based on the optimum sensor values as determined at each data point by a trained Recurrent Neural Net (RNN) model, and (vii) perform automation of tasks at urban infrastructure based on the forecasted sensor data for urban management by generating commands at a predetermined events or instances as determined by the forecasted sensor data.
In some embodiments, the processor executed set of instructions are configured to (i) receive the sensor data from the plurality of sensors, wherein the plurality of sensor types comprises one or more of weather data, geo-profile and events data in the location and (ii) train the Recurrent Neural Net (RNN) model using the sensor data and the plurality of sensor types to identify a false value based on contextual understanding for each sensor type of the plurality of sensor types based on a user input.
In some embodiments, the processor executed set of instructions are configured to train the RNN model with one or more of (a) the sensor data of a time lag of a predetermined duration; (b) weather data that comprises the weather data comprises a temperature, a wind speed, a humidity, a presence or absence of rain, a presence or absence of clouds and luminosity, (c) a presence or absence of a predetermined point of interest that is analyzed using geo-profile of the location, (d) prescheduled events, or (e) determined cyclic events of weekdays or week-ends, days of a month and year.
In some embodiments, the processor executed set of instructions are configured to determine a false value indicating the constant value for predetermined threshold number of consecutive time-stamps specific to the sensor type by analyzing historical sensor data.
In some embodiments, the processor executed set of instructions are configured to determine the abnormally high or low value as determined by a predetermined threshold values specific to the sensor type.
In some embodiments, the processor executed set of instructions are configured to perform comparative sensor data analysis of the first sensor and a second sensor of the first sensor type indicates the false value that is determined to be impossible or improbable based on the sensor type.
In some embodiments, the processor executed set of instructions are configured to determine the calibration error based on constant higher or lower value readings for a sensor as determined by the comparative sensor data analysis.
In some embodiments, the processor executed set of instructions are configured to detect abnormal variance of the first sensor from the plurality of sensors by the comparative sensor analysis using Levene's test and the first sensor is indicated as an erroneous sensor.
In some embodiments, the processor executed set of instructions are configured to impute the sensor data by taking average of a particular time stamp of repeating sensor value trends over a period of time and replace a false value with the average value for the time stamp.
In some embodiments, the processor executed set of instructions are configured to impute the sensor data by replacing a false value by a nearest neighbor value using KNN algorithm.
In some embodiments, the processor executed set of instructions are configured to impute the sensor data by replacing the false value by an interpolation value, wherein the previous and subsequent time stamp values are processed to determine a mid-value for a data point of the false value.
In some embodiments, the processor executed set of instructions are configured to impute the sensor data by replacing the false value by an interpolation of at least two repeating sensor value trends over a period of time.
In another aspect, a method of forecasting sensor data at urban infrastructure using a sensor data forecasting system is provided. The method comprising steps of: generating a database of a time stamped and indexed sensor data, wherein the sensor data is received from a plurality of sensors implemented in a location, characterized in that, determining a false value by analyzing the time stamped and indexed sensor data, wherein the false value is determined based on predetermined parameters that comprise one or more of a constant value, an abnormally high or low value, a false value that is determined to be impossible or improbable, or a calibration error, determining a category of the false value by analyzing one or more of (a) historical sensor data of a first sensor, (b) comparative sensor data of the first sensor and a second sensor, and (c) comparative sensor data of third sensor and the first sensor, wherein the first, second and third sensors are selected from the plurality of sensors wherein, the first sensor and the second sensor belong to a first sensor type and the third sensor belong to a second sensor type, determining an imputation method based on the category of the false value, wherein the imputation method employs one or more of (1) a Kalman filter, (2) a nearest neighbor value, (3) a statistical analysis of repeating sensor values of the plurality of sensors, imputing, using the processor of the sensor data forecasting system, the false value or determine an erroneous sensor from the plurality of sensors, implementing the Kalman filter that determines a sensor variance at each data point of the sensor data to generate optimum sensor value, forecasting sensor data for a subsequent time stamps based on the optimum sensor values as determined at each data point by a trained Recurrent Neural Net (RNN) model and performing automation of tasks at the urban infrastructure based on the forecasted sensor data for urban management by generating commands at predetermined events or instances.
These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended mainly to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
Various embodiments disclosed herein provide a sensor data prediction system and a method thereof. Referring now to the drawings, and more particularly to
Along with an increasing or decreasing trend, most urban environment data have some form of seasonality trends, i.e. variations specific to a particular time frame. For example, if the sales of a woolen jacket over time are analyzed, there are higher sales in winter seasons than in summer season. Most of the sensor data is by nature time series data. For an urban environment, the sensor data is not only not independent, but they are also dependent on various other dynamic factors. For example, a typical set of contextual data are weather, events, week days, weekends, vacations, point of interests in that location like hospitals, schools.
In some embodiments, multiple sensor domains are identified. A threshold value is determined specific to a domain of a sensor. The sensor 112 may be determined to be erroneous if the sensor data continuously or intermittently shows values that cross the predetermined threshold. In some embodiments, a false value is identified based only on historical data analysis of a sensor over a period of time. In some embodiments, a false value is identified based on comparative analysis of multiple sensors from the same sensor type. The sensor type may be a location or type of the sensor based on the sensor data the sensor transmits or the mechanism of collecting or transmitting the sensor data. In some embodiments, a false value is identified based on cross domain contextual understanding of sensor data. For example, waste bin fill rate pattern is different for a bin outside restaurant compared to other bins in same location. Also, bin fill rate is high in the evening compared to morning of a day. Another example is waste bins outside cinema halls may fill when shows start or end. Presence or absence of restaurant, cinema hall, school etc. changes the waste bin fill rate and that is identified and used for forecasting of bins filling in urban waste management system.
In some embodiments, the nearest neighbor sensor values are used to impute. In an embodiment, KNN is an algorithm that is used for matching a point with its closest k neighbors in a multi-dimensional space. KNN may be used for data that is continuous, discrete, ordinal and categorical which makes it useful for dealing with all kind of missing data. The reason for using KNN for missing values is that a point value can be approximated by the values of the points that are closest to it, based on other variables.
In some embodiments, Kalman filters is used for imputing sensor values based in previous timestamp. Kalman filter operates on state-space models of the form, details of it are as explained elsewhere herein.
In some embodiments, threshold values are predetermined for a domain or a type of a sensor. The value anomaly identification module 204 records a number of continuous reoccurrence of the sensor value and if the number of re-occurrences of the sensor value is more than the predetermined threshold for a given type of sensor 112, it is identified as constant value anomaly data point. The time period for which the getting constant value is acceptable and is dependent on the domain. For example, getting the same parking occupancy for few hours is acceptable but getting the exact same value of environment temperature for long hours indicates the malfunctioning of the sensor 112.
In some embodiments, the value anomaly correction module 206 removes all the identified constant value anomaly data points and replaces them with different methods of correction. In some embodiments, a nearest neighbor sensor value replaces the identified constant value anomaly data points.
In some embodiments, the value anomaly correction module 206 removes all spike anomaly values. In some embodiments, the value anomaly correction module 206 imputes the spike anomaly using Kalman filter. For example, if domain does not have drastic change in values Kalman filter is applied to the entire time range of the sensor data. Kalman filter provides the optimal estimates of the states for t=1, 2 . . . , T. for example, imputation of temperature sensor data.
In some embodiments, when the values are of high variance and following a repeating trend, an average of the particular time frame is taken, and that value is used to correct the missing value. For example, values following a daily trend, average of each hour is taken and the value anomaly correction module 206 imputes the value to the unavailable hour using the historical average for the unavailable hour.
In some embodiments, the values do not follow any repeating trend, so the value anomaly correction module 206 uses interpolation to impute the value. The previous and after time stamped value of the sensor is used to find the average of mid unknown sensor value.
In an embodiment, if the values have hourly cyclicity and daily trend, the value anomaly correction module 206 uses interpolation on daily trend and overlays it with the variance of hourly cycle.
In some embodiments, 3 to 5 sigma standard deviation is used to set the normal range boundary.
In some embodiments, the value anomaly correction module 206 imputes the outlying value anomaly using the Kalman filter. For example, if domain does not have drastic change in values Kalman filter is applied to the entire time range of the sensor data. Kalman filter gives the optimal estimates of the states for t=1, 2 . . . , T. Imputing data is via the measurement equation yt=Zαt+ε, εt˜N(0,H) as mentioned elsewhere herein, for example, imputation of temperature sensor data.
In some embodiments, when the values are of high variance and following a repeating trend, an average of the particular time frame is taken, and that value is used to correct the missing value. For example, values following a daily trend, average of each hour is taken and the value anomaly correction module 206 imputes the value to the unavailable hour using the historical average for the unavailable hour.
In some embodiments, the values do not follow any repeating trend, so the value anomaly correction module 206 uses interpolation to impute the value. The previous and after time stamped value of the sensor is used to find the average of mid unknown sensor value.
In an embodiment, if the values have hourly cyclic and daily trend, the value anomaly correction module 206 uses interpolation on daily trend and overlays it with the variance of hourly cycle.
In some embodiments, to find calibration error, the value anomaly identification module 204 ranks the sensor based on values for each timestamp and then aggregate all rankings by the sensor 112. In an embodiment, if the aggregated ranking lies outside the range of 10 to 90, the sensor 112 is determined to be faulty.
yt=Zαt+ε εt˜N(0,H)
αt1=Tαt+ηt ηt˜N(0,Q)
α1˜N(a1,P1)
where yt is the observed series (possibly with missing values) but at is fully unobserved. The first equation (the “measurement” equation) says that the observed data is related to the unobserved states in a particular way. The second equation (the “transition” equation) says that the unobserved states evolve over time in a particular way.
The Kalman filter operates to find optimal estimates of at (at is assumed to be Normal: αt˜N(at,Pt), so what the Kalman filter actually does is to compute the conditional mean and variance of the distribution for at conditional on observations up to time t).
In the typical case, (when observations are available) the Kalman filter uses the estimate of the current state and the current observation yt to do the best it can to estimate the next state αt+1, as follows:
at+1=Tat+Kt(yt−Zαt)
Pt+1=TPt(T−KtZ)′+Q
where Kt is the “Kalman gain”.
When there is no observation, the Kalman filter may compute at+1 and Pt+1 in the best possible way. Since yt is unavailable, the Kalman filter cannot make use of the measurement equation, but it can still use the transition equation. Thus, when yt is missing, the Kalman filter instead computes:
at+1=Tat
Pt+1=TPtT′+Q
Essentially, the imputation module determines that given αt, the most probable interpretation is as to αt+1 without data is just the evolution specified in the transition equation. Imputation can be performed for any number of time periods with missing data.
If there is data yt, then the first set of filtering equations take the most probable value determined at missing data time stamp, and correct the value by a number based on correctness of the previous estimate as determined.
Once the Kalman filter has been applied to the entire time range, you have optimal estimates of the states at, Pt for t=1, 2, . . . , T. Imputing data is then simple via the measurement equation. In particular, you just calculate:
ŷt=Zat
The system further includes a user interface adapter 19 that connects a keyboard 15, mouse 17, speaker 24, microphone 22, and/or other user interface devices such as a touch screen device (not shown) or a remote control to the bus 12 to gather user input. Additionally, a communication adapter 20 connects the bus 12 to a data processing network 25, and a display adapter 21 connects the bus 12 to a display device 23 which may be embodied as an output device such as a monitor, printer, or transmitter, for example.
The advantage of the sensor data forecasting system is that it understands and interprets various kind of data accurately leading to robust automation system while handling a huge amount of data generated from large number of sensors covering multiple locations. The system aids in safety, urban management, waste management etc. and provides solutions for urban planning for big and small cities across various parameters in user friendly comprehensive interactive environment.
The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications without departing from the generic concept, and, therefore, such adaptations and modifications should be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
201941029307 | Jul 2019 | IN | national |