Embodiments described herein relate to methods and systems for communicating information to a user.
The increasing deployment of different types of sensor around cities provides opportunities for applications to update citizens with real-time information about the environment. In order for this information to be useful, and inform immediate decisions, it is desirable for the information to be translated into an easily understood description of the sensor readout. As one example, it is more useful to inform users that it is difficult to breathe in a particular area of a city, than to simply provide those users with numerical measurements of carbon dioxide concentration, humidity and temperature.
In order to translate the numerical data into a meaningful description, it is necessary to provide semantic labels that correspond to the sensor data. Conventional systems achieve this manually, by using dedicated external annotators or experts to tag the sensor data offline. Such systems can provide information to users with short delays. However, they can be inefficient in terms of energy and bandwidth usage, as well as being costly to implement.
Embodiments of the invention will now be described by way of example with reference to the accompanying drawings in which:
According to a first embodiment, there is provided a computer implemented method for communicating information to a user; the method comprising:
In some embodiments, the semantic data is stored in association with values of the parameters that have been received in the same time window as the semantic data, and/or which have been received from the same location as said semantic data.
In some embodiments, the semantic data comprises one or more words or phrases provided by the users; wherein
In some embodiments, the method comprises receiving further semantic data from the one or more users in response to the request, and storing the further semantic data in association with sensor data that is received in the same time window as the further semantic data or which originates from the same location as the further semantic data.
In some embodiments, the level of confidence with which a respective word or phrase is considered to reflect the determined value(s) of the one or more parameter(s) is determined at least in part based on the number of times the word or phrase appears in the semantic data that is stored in association with values of sensor data that are deemed to correspond to the determined value(s) of the parameter(s) at the specified location.
In some embodiments, the values of stored sensor data that are deemed to correspond to the determined value(s) of the parameter(s) at the specified location are values that lie within a predetermined range of the determined value(s).
In some embodiments, the sensor data contains values of a plurality of parameters and the method comprises:
In some embodiments, a determination is made as to the level of confidence with which the set of values of parameters can be considered to reflect the value of each parameter in the specified location.
In some embodiments, the semantic data comprises one or more words or phrases provided by the users; wherein
In some embodiments, the method comprises receiving further semantic data from the one or more users in response to the request, and storing the further semantic data in association with sensor data that is received in the same time window as the further semantic data or which originates from the same location as the further semantic data.
In some embodiments, the level of confidence with which a respective word or phrase is considered to reflect the determined set of values is determined at least in part based on the number of times the word or phrase appears in the semantic data that is stored in association with values of sensor data that are deemed to correspond to the determined set of values.
In some embodiments, the values of stored sensor data that are deemed to correspond to the determined set of values of the parameters at the specified location are values that lie within a predetermined range of the determined set of values.
In some embodiments, the one or more sensors are environmental sensors, and the sensor data indicates values of one or more environmental parameters.
In some embodiments, the environmental parameters include one or more of temperature, humidity and noise level in the vicinity of the sensor(s).
In some embodiments, knowledge is created in the form of machine generated, human interpretable information, by mapping the values of measured parameters to the received semantic data.
According to a second embodiment, there is provided a non-transitory computer readable medium comprising computer executable instructions that when executed by a computer will cause the computer to carry out a method according to any one of the preceding claims.
According to a third embodiment, there is provided a computer system for receiving and communicating information to a user; the system comprising:
In embodiments described herein, a system including a plurality of sensors is provided, which can autonomously create knowledge in the form of an association between semantic labels and numerical data, whilst minimising the need for external input. The system gathers the required semantic labels by crowd sourcing them through its own users, who may supply the labels using personal communication devices, including mobile phones, laptops, tablets etc. The crowd sourcing works in combination with iterative model building and as such it is uncertainty and user request driven.
Each region also has an associated data aggregator 105, 107. The data aggregator is used to collect sensor data in the form of sensor measurements from the various sensors and semantic data from the user's personal communication devices. The semantic data is comprised of labels or tags i.e. a short textual description of the parameters being measured by the sensors. For example, in the case where the sensors are used to monitor temperature, the semantic data may include statements such as “hot”, “warm”, “cold” etc. The users enter the semantic data onto their personal communication devices through a standard user interface; this may comprise the use of a dedicated software application, or alternatively may involve the user's drafting an SMS text message or other form of written message in their device.
The sensors S transmit the sensor data over one or more communications channels to the respective data aggregator in their region. The communication channel(s) may include any one of a number of standard channels as known in the art, including a wired connection, cellular network, wireless LAN etc. Similarly, the users may transmit the semantic data to the data aggregator over one or more communication channels, which may be the same or different channel(s) as used for sending the sensor data. Each data aggregator aggregates the sensor data received from the sensors in its region. The aggregator in turn forwards the aggregated sensor data, together with the semantic data received from the user devices, to a server 109. In this way, the server 109 receives aggregated sensor readings and semantic data from the different regions 101, 103. The server can then use the sensor readings and semantic data to build up a knowledge base for associating particular values of the measured parameters with descriptions of those parameters. In essence, the server is able to accrue knowledge in the form of machine generated, but human interpretable information, by mapping the values of measured parameters to descriptions of those parameters.
By recording the time and location from which the semantic data and the sensor data originate, it is possible to correlate those data with one another; in the event that sensor data and semantic data originate from the same time and place, one can infer that those data are likely to reflect the same parameter value(s).
The database 401 is used to store data received from the data aggregator(s) in the respective regions. As described above, the sensor data and semantic data may be stored in association with one another based on the time at which they are generated, and/or the location or region from which they originate.
The data mining module 403 is configured to analyse the data in the database and to establish relationships between the two types of data within the database; the data mining module is used to establish a link between a particular numerical value, or group of numerical values, and a particular semantic label. For example, the data mining module may determine that values of temperature above a certain threshold tend to be associated with a semantic label of “hot”, whilst those beneath that threshold tend to be associated with a different semantic label, such as “cold”.
The knowledge providing engine 405 is used to provide readouts to user's requests for information concerning a measured parameter in a particular region. The steps involved in providing this information are summarised in the flow-chart of
The knowledge providing engine will only send a semantic tag or label to the user if it determines that the certainty with which the particular semantic label is associated with the measured sensor readings is above a threshold. For example, in the case where the sensor data relates to temperature, the knowledge providing engine will not send a reading of “hot” to the user unless it is determined to within a specified degree of certainty that the term “hot” is a true reflection of the temperature in the region of interest. The certainty of association between the sensor data and a particular semantic label is determined in step S503, in conjunction with the data mining module; as described below, there may be different ways of establishing whether or not the certainty is great enough to permit the semantic data being sent to the user.
In the event that the knowledge providing engine determines that it does not possess sufficient certainty to warrant sending of a particular semantic label to the user, the knowledge providing engine may prompt the crowd sourcing engine to issue a request for users in the region of interest to provide updated semantic labels, reflective of the current value of the parameter in question (step S505). The crowd sourcing engine 407 may issue the request in the form of an email, SMS message or other electronic communication, which may be received at the users' personal communication devices. The semantic labels received from the users in response to the crowd sourcing request can be used to respond to the initial user's request for information about the region of interest. In addition, the newly received semantic labels can be added to the database (step S506), where they can aid the data mining module in thereafter establishing appropriate semantic labels to match with particular values of sensor data.
The process described above may be repeated over time. As the amount of data stored in the database 401 increases with each crowd sourcing request, there will be a concomitant increase in the certainty with which the data mining module/machine learning module and knowledge providing engine are able to correlate particular numerical values of sensor data with particular sematic labels. Thus, at a certain point, the knowledge providing engine will no longer need to prompt the crowd sourcing engine to request input from users, but will be able to identify an appropriate semantic label to send to a user based on the data already stored in the database and the relationships identified by the data mining/machine learning module. At this point, the method will proceed to steps S507 and S508. The steps of the method according to the present embodiment are also shown pictorially in
A number of means may be employed for defining the certainty with which a particular semantic label can be said to reflect the value of one or more parameters in the sensor data. In some embodiments, machine learning may be used to identify associations between the received sensor data and semantic data. In one example, the system may wait until a predetermined number of results has been obtained (for example, the system may require that a threshold number of crowd sourcing requests has been issued), after which the system may associate a particular sensor data value with the semantic label that is most commonly seen to be associated with that sensor data value in the database. The server may still continue to send crowd sourcing requests at intervals (repeating steps S505 and S506 of
In another embodiment, pattern mining may be used. Pattern mining operates over categorical data and outputs frequent combinations of data values. Pattern mining is applicable for cases in which the sensor data comprises more than one parameter; for example, pattern mining may be applicable where the sensor data includes measurements of both temperature and humidity, rather than just temperature alone. In one embodiment in which pattern mining is used, the server may derive the probability that a particular set of sensor measurements reflect the true value of those parameters in the region of interest.
By way of example, continuing with the case in which the sensor data relates to temperature and humidity readings, the server will receive multiple readings of both temperature and humidity from the sensors located in the region of interest. In this case, the server may determine an aggregate vector “m” where m comprises a single value for each one of the sensed parameters—the vector m may be represented as m={5° C., 5% humidity}, for example. The server will estimate a probability density function “P” of the sensor measurements using kernel density estimation. Following this, the server will compute P([m−r, m+r]), where r is an application specific parameter. If the value of P([m−r, m+r]) is large enough, the server can determine that there is sufficient certainty about this vector of measurements; that is, the server can determine that the selected combination of values for the different parameters in the vector m provide a true reflection of the conditions in the region of interest.
The server will next query the database to identify users and tags that are stored in association with sensor measurements in the region [m−t, m+t] where t is a user defined threshold. Having done so, a single relation data mining technique such as frequent item-set mining can be applied on the results to find the most popular (and, by extension, the most relevant) combination of tags for the current sensor data measurements.
If P([m−r, m+r]) is too small (using a user defined threshold), then there will be insufficient certainty in the database about the vector of measurements. In this case, the server will initiate crowd sourcing via the gateways for which the current vector of measurements is close to m. After it receives all the information and the database is updated, pattern mining can be performed. These steps are summarised in the flow-chart of
In the present example, there are two users (user1 and user2) located in the first region 801. A third user (user3) located in the second region 803 sends a request for information about the first region 801 to the server 809.
The table shown in
In the present case, the mean temperature and humidity readings obtained from the most recent batch of sensor data in the first region 801 are T: 28° C. and H: 50, where the letters T and H stand for temperature and humidity, respectively. Thus, the table includes rows for which the sensor data lies in the interval T: 28° C. +/−5% and H: 50% +/−5%.
As can be seen, the table includes 2 entries from user1, and 3 entries from user2. The knowledge providing engine determines the most frequent combination of tags that users agree on i.e. “warm” and “unpleasant”. Following this, the knowledge providing engine is able to infer that conditions in the interval T: 28° C+/−5% and H: 50%+/−5% are considered as warm and unpleasant. The knowledge providing engine in turn generates a message for sending to user3 of the form “most people think that current conditions in the region 1 are warm and unpleasant.”
In response to a user's enquiry about the current noise level at the site, the knowledge providing engine extracts a feature vector from the most recent set of sensor measurements; as before, the feature vector comprises a list of values for the different parameters, in this case the different noise parameters described above. The server then consults the database to identify semantic labels that correspond to the values in the feature vector. Referring still to
Embodiments described herein provide an improved system in terms of flexibility/cost, average delay of response, average energy consumption of the mobile devices of the users and average bandwidth usage. Embodiments provide increased flexibility since they do not require external experts to provide labels for the sensor data. As a result, applications can be launched directly and provide knowledge to users immediately through the dynamic synergy of model building and crowd sourcing.
It can be seen that the average delay of response for the system employing a continuous data update (line 1003) is very small; this is because the system always has the most recent data in hand for sending to a user upon receipt of that user's request. Thus, there is no lag time between receiving a request for information and transmitting the data in response. For systems that use user triggered data updates (line 1005), a delay is incurred each time a user requests information as the system needs to first source the semantic data from the users before responding. For both of these conventional types of system, the average delay in responding to the users' requests remain constant over time.
In contrast, in embodiments described herein, the average delay in responding to a user's request for information is initially larger than the conventional systems, but the delay decreases over time and converges to a level which is similar to that of continuous data update systems and smaller than the user triggered systems. The delay is initially larger because the processing (data mining) carried out for every user request after crowd-sourcing, is very heavy. However, as more data is gathered, and more knowledge is produced and stored, there is less need to crowd-source for semantic data and less need for processing as well. The sensor data is also periodically updated. Therefore, the average delay of response in embodiments decreases as the system is used and converges at a level which is similar to that of continuous data update systems and smaller than the user triggered systems. At this point, there is almost no need to crowd source and process data in response to a user's request. The rate at which the average delay decreases will depend on the true data distribution (its skewness, variance etc).
While certain embodiments have been described, these embodiments have been presented by way of example only and are not intended to limit the scope of the invention. Indeed, the novel methods, devices and systems described herein may be embodied in a variety of forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the invention. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2015/052633 | 9/11/2015 | WO | 00 |