This disclosure relates generally to apparatuses, methods, computer programs and computer program products for generating activity information for a cell.
As used herein the term “cell” is used broadly to encompass any extent of space or surface (e.g., an area, a location, a telecommunications network cell).
Location based services (LBSs) are services that provide information to a user and/or perform a task for the user based on the user's location. LBSs are becoming extremely popular. For example, U.S. Patent Publication No. 2012/0290434 describes in one embodiment a recommendation system that is configured to select items to recommend to a user based on the user's location context. This increase in popularity is being driven partly by the fact that the majority of today's communication devices (e.g., smartphones) contain a Global Positioning System (GPS) receiver (and/or other means) that enables the communication device to accurately determine the device's position. Many LBSs have been developed that can provide a recommendation to a user based on user's location and preferences. For instance, a recommendation system can select applications (“apps”) to recommend to a user based on the user's current location and historical activity information regarding apps that other users used at or near that location. Thus, the recommendation system may recommend to a user in a train station a particular train schedule app because many other users in the train station used that particular app within the last month.
A drawback of such a recommendation system is that the database of historical activity information on which its recommendations are based may be sparse. For example, for some cells, the database may not include any historical activity information on which to base a recommendation for a user.
The present disclosure discloses apparatuses, methods, computer programs and computer program products that are designed to overcome the above described drawback. More specifically, the present disclosure discloses apparatuses, methods, computer programs and computer program products for generating activity information for a cell, such as a cell for which historical activity information is lacking. Advantages provided by the technical improvements described herein include, but are not limited to: lessening the impact of the classic sparse data problem experienced by many data mining systems; it can be used as the basis for context-awareness recommendation systems, such as app predictions/recommendations; it can be used to make an app coverage map when there are limited data collectors (terminal recordings); it's useful for location based marketing; it can be used to predict future activity; it can be used to create profiling for various locations; and it can be used for geospatial analysis tasks
In one aspect, a method for generating activity information for a cell is disclosed. In some embodiments, the method is performed by a data processing system comprising a processor. The method includes, for each cell identifier included in a set of cell identifiers storing meta-data (e.g., a cell type value, a set of usage intensity values, etc.) concerning a cell identified by the cell identifier so that the meta-data is associated with the cell identifier. For at least a subset of the set of cell identifiers, the method further includes storing activity information so that the activity information is associated with the cell identifier, the activity information being related to activities that have occurred within the cell identified by the cell identifier. The method also includes determining that an amount of activity information associated with a first cell identifier included in the set of cell identifiers is less than a threshold activity amount and using the meta-data associated with the first cell identifier and meta-data associated with other cell identifiers included in the set of cells to determine a group of one or more cells that are similar to the cell identified by the first cell identifier. For each cell included in the group of similar cells, the method also includes obtaining the activity information associated with the cell identifier that identifies the similar cell; generating activity information for the first cell identifier using the obtained activity information; and storing the generated activity information so that it is associated with the first cell identifier.
In this manner, activity information can be generated for a cell and associated with a cell. This is highly advantageous because a recommendation system can then use the generated activity information to provide intelligent recommendations to users located in the cell, even if there is no actual historical activity information for the cell. Without this method, the recommendation system may not have sufficient activity information to make such an intelligent recommendation.
In some embodiments, each cell identifier included in the set of cell identifiers is a character string formed by encoding a geographic coordinate.
In the some embodiments, the method also includes receiving from a wireless communication device (WCD) an application activity item, the application activity item comprising: a) an application identifier identifying an application used by a user of the WCD, b) location information identifying a location of the WCD at the time the user of the WCD used the identified application, and c) a timestamp identifying a point in time at which the user of the WCD used the identified application in the identified location. The method may also include, after receiving the application activity item, using the location information to select from the set of cell identifiers one of the cell identifies; and, after selecting the cell identifier, storing the application identifier and timestamp so that they are associated with the selected cell identifier.
In some embodiments, the obtained activity information comprises: a first value associated with a certain activity that occurred within a first cell included in the group and a second value associated with a certain activity that occurred within a second cell included in the group, and generating the activity information comprises calculating a value using the first and second values.
In some embodiments, the method also includes using the generated activity information to select an item to recommend to a user located in the cell identified by the first cell identifier.
In some embodiments, determining that an amount of activity information associated with the first cell identifier is less than a threshold activity amount consists of determining that no activity information is associated with the first cell identifier.
In some embodiments, the method also includes: determining a likelihood (L) that the cell identified by the first cell identifier has an amount of activity and determining that L exceeds a threshold, and generating activity information for the first cell identifier using the obtained activity information is performed as a result of (1) determining that L exceeds the threshold and (2) determining that an amount of activity information associated with the first cell identifier is less than a threshold activity amount.
In another aspect, the present disclosure describes an apparatus for generating activity information for a cell. In some embodiments, the apparatus includes a data storage system storing a cell database, the cell database for storing: i) a set of cell identifiers, including a first cell identifier, each cell identifier included in the set identifying a cell, ii) for each of the cell identifiers, meta-data concerning the cell identified by the cell identifier so that the meta-data is associated with the cell identifier, and iii) for at least a subset of the cell identifiers, activity information so that the activity information is associated with the cell identifier, the activity information being related to activities that have occurred within the cell identified by the cell identifier. The apparatus also includes a data processing system comprising a processor. The processor is adapted to: determine that an amount of activity information associated with the first cell identifier is less than a threshold activity amount; use the meta-data associated with the first cell identifier and meta-data associated with other cell identifiers included in the set of cells to determine a group of one or more cells that are similar to the cell identified by the first cell identifier; for each cell included in the group of similar cells, obtain the activity information associated with the cell identifier that identifies the similar cell; generate activity information for the first cell identifier using the obtained activity information; and store the generated activity information so that it is associated with the first cell identifier.
In another aspect, the present disclosure describes a computer program for generating activity information for a cell. The computer program comprising computer readable instructions which when run on a data generation system causes the data generation system to: for each cell identifier included in a set of cell identifiers, store meta-data concerning the cell identified by the cell identifier so that the meta-data is associated with the cell identifier; and for at least a subset of the set of cell identifiers, further store activity information so that the activity information is associated with the cell identifier, the activity information being related to activities that have occurred within the cell identified by the cell identifier. The instructions further enable the data generation system to determine that an amount of activity information associated with a first cell identifier included in the set of cell identifiers is less than a threshold activity amount; use the meta-data associated with the first cell identifier and meta-data associated with other cell identifiers included in the set of cells to determine a group of one or more cells that are similar to the cell identified by the first cell identifier; for each cell included in the group of similar cells, obtain the activity information associated with the cell identifier that identifies the similar cell; generate activity information for the first cell identifier using the obtained activity information; and store the generated activity information so that it is associated with the first cell identifier.
In another aspect, a computer program product is provided. The computer program product comprises a non-transitory computer readable medium storing the above described computer program.
The accompanying drawings, which are incorporated herein and form part of the specification, illustrate various embodiments.
For example, consider a user 101 that is in possession of and using a communication device 102 (e.g., a smartphone, computer, tablet, etc.) within a particular cell (e.g., the South Beach neighborhood of Miami Beach). The user may cause the communication device 102 to transmit, via access point 104 (e.g., Wi-Fi access point, base station, etc.) and network 110, a request to an on-line shopping system (not shown) that includes recommendation system 112. In response to the request, the on-line shopping system may generate a page of information containing information the user requested as well as information about an item (e.g., sunglasses) that recommendation system 112 selected to recommend to user 101 based on activity information associated with the cell in which user 101 is located. The activity information could include actual historical activity information (e.g., purchase history information indicating that sunglasses are popular items within the cell) and/or generated activity information (such as assumed or estimated activity information).
Process 300 may begin in step 302, where data generation system 120 stores a set of cell identifiers including a first cell identifier, where each cell identifier included in the set identifies a cell. Data generation system may store the cell identifiers in cell database 208. Step 302 is optional because, in some embodiments, the set of cell identifiers may be stored by another system. In some embodiments, one or more cell identifiers included in the set of cell identifiers is a character string (e.g., a string of letters, numbers, and/or other characters) formed by encoding a geographic coordinate (e.g. pair of geographic coordinate values). For example, in some embodiments, a cell identifier can be a character string formed by a geocode service based on a postal address or a pair of geographic coordinate values (e.g., a latitude and longitude value pair), such as the geocode service available at geohash.org, retrieved on Sep. 19, 2013. In other embodiments, a cell identifier is a telecommunications network cell (or base station) identifier that is used in a cellular telecommunications system. Accordingly, a “cell” can by any arbitrary area, such as an area defined by geographic coordinates as well as an area defined by cells of a cellular telecommunication system.
In step 304, for each of the cell identifiers, data generation system 120 obtains and stores meta-data concerning the cell identified by the cell identifier so that the meta-data is associated with the cell identifier. For example, the meta-data concerning the cell may be stored in cell database 208 along with the cell identifier identifying the cell or may be linked to the cell by being stored in another database (not shown) and associated to the cell identifier e.g. by a link/pointer stored in the cell database 208 or in the other database. In some embodiments, the meta-data includes one or more of: a cell type value identifying a cell type (the set of cell types may include: suburban, rural, urban, exurban, city, town, village, café, university, airport, train station, hotel, etc.); and usage intensity information (e.g., a set of a tuples, where each tuple includes a usage intensity value and a time-of-day identifier). Additionally, the meta-data may also contain information identifying the relative frequency of existence of certain objects, business, institutions, etc. For example, the meta-data for a cell having a cell type of “suburb,” may include relative frequency of existence information that identifies the relative frequency of existence of banks, schools, cafes, bus stops within the cell. Referring to
In step 306, for each of at least a subset of the cell identifiers, data generation system 120 further obtains and stores activity information for the cell corresponding to the cell identifier so that the activity information is associated with the cell identifier. That is, the activity information for a cell may be stored in cell database 208 along with the cell identifier identifying the cell and the meta-data for the cell. The activity information is related to activities that have occurred within the cell identified by the cell identifier. In some embodiments the activity information comprises a set of one or more activity items, where each activity item includes an identifier identifying an activity (e.g., an app identifier identifying an app) and a timestamp. An activity item may also include a geographic coordinate or other location information.
In some embodiments, data collector 204 receives from a communication device 102 (e.g., a wireless communication device (WCD)) application activity information (i.e., activity related to use of an app), where the application activity information includes: one or more application activity items. In some embodiments, an application activity item includes: a) an app identifier identifying a specific app, b) location information identifying a location of the WCD at the time the user of the WCD used the identified application, c) a timestamp indicating the date and time of day the app was used. In response to receiving an application activity item from the WCD, data collector 204 may convert the location information included in the item to a cell identifier (using, for example, a service provided by geohash.org) and then perform step 306 (e.g., data collector may store the application activity item (or at some portion thereof) such that it is associated with the cell identifier that was created based on the location information).
Referring to
In step 308, data generation system 120 determines that an amount of activity information associated with the first cell identifier is less than a threshold activity amount. The threshold may be one activity item or any number of activity items. In some embodiments, determining that an amount of activity information associated with the first cell identifier is less than a threshold activity amount consists of determining that no activity information is associated with the cell identifier. In other embodiments, determining that an amount of activity information associated with the first cell identifier is less than a threshold activity amount comprises determining that the number of activity items included in the activity information that satisfy a set of one or more criterions is less than or equal to a threshold amount.
In step 310, data generation system 120, using the meta-data associated with the first cell identifier and meta-data associated with other cell identifiers included in the set of cells, determines a group of one or more cells that are similar to the cell identified by the first cell identifier. In some embodiments, given a threshold Theta between 0 and 1, a pair of points pi; pj are defined to be similar if the following holds: sim(pi; pj) greater or equal to Theta.
In one embodiment, data generation system 120 determines a group of cells that are similar to the first cell by determining neighborhood cells. The neighborhood cells to a given cell could be described by the Euclidian distance between the cells. One way of classifying cells into cluster in this Euclidian space is by calculating K-nearest cells, where the cells are described by geographical place type (P), time (T), context (C) and data type (D). An example method for calculating K-nearest neighbors is using the K-medoids method. It works like the following:
Method 1: K-MEDOIDS
Input: List of top 15 used categories per cell
Output: clustered cells
Begin
Repeat the following until the medoids stop moving:
end repeat
Return regions list with corresponding cell indices
In step 312, for each cell included in the group of similar cells, data generation system 120 obtains the activity information associated with the cell identifier that identifies the similar cell. In step 314, data generation system 120 uses the obtained activity information to generate activity information for the first cell identifier. The obtained activity information may include or be used to obtain: a first value associated with a certain activity that occurred within a particular cell included in the group (e.g., a value representing the number of tweets sent by users in the first cell of the group) and a second value associated with the certain activity that occurred within a second cell included in the group (e.g., a value representing the number of tweets sent by users in the second cell of the group). In some embodiments, generating the activity information comprises calculating a value using the first and second values (e.g., averaging the first and second values). In step 316, data generation system 120 stores the generated activity information (e.g., the average value and an identifier of the activity) so that it is associated with the first cell identifier. In embodiments where data generation system 120 does not perform step 302 because another system performs this step, data generation system 120 may accomplish step 316 by transmitting to the another system the generated activity information, whereby the another system stores the generated activity information so that it is associated with the first cell identifier.
In this way, for example, if cell 1 has no activity information associated with it, but cells 2 and 3 do have activity information and cells 2 and 3 are similar to cell 1, then we can assume activity information for cell 1. For instance, consider the following scenario: 5000 tweets were sent by users in cell 2 within the last month; 10000 tweets were sent by users in cell 3 within the last month; and cells 2 and 3 are equally similar to cell 1. In this scenario we can generate pseudo historical activity information for cell 1 by taking an average the number tweets associated with cell 2 and the number of tweets associated with cell 3 and associate that information with cell 1. That is, in this hypothetical scenario the pseudo historical activity information will indicate that 7500 tweets were sent by users in cell 1. In some embodiments, the degree of similarity is taken into account when generating the activity information for cell 1. For instance, if the similarity score between cell 1 and cell 2 is s1 and the similarity score between cell 1 and cell 3 is s2, then we can determine a weighted average for the number of tweets as follows: (s1×5000+s2×10000)/2, and assign this number of tweets to cell 1 for the relevant period (i.e., the last month).
As discussed above, recommendation engine 210 may use the generated activity information to select an item to recommend to a user of a communication device 102 located in the cell identified by the first cell identifier. The recommendation may be sent to the communication device 102 via e.g. a web server being a part of the recommendation system 112 and communicating with the communication device 102 with the help of HTTP (Hypertext Transfer Protocol) messages, like HTTP Get messages and HTTP response messages.
Referring now to
As discussed above, in some embodiments, the information identifying a cell may include or consists of a one or more geographic coordinates. In other embodiments, the information identifying a cell may be a telecommunications cell identifier that identifies a cell of a cellular telecommunications system.
In the embodiments where the information identifying the cell includes one or more geographic coordinates, the information identifying the cell may consist of i) two or more geographic coordinate values specifying a center point of the cell and ii) a size value identifying a size (e.g., area) of the cell. In other embodiments, such as embodiments in which the cell is a polygon (e.g., a quadrilateral), the information identifying the cell may consist of four geographic coordinates, each geographic coordinate specifying a corner of the cell.
For the sake of illustration, we shall assume that the information identifying the cells includes one or more geographic coordinates.
In step 406, cell manager 202 obtains a geographic coordinate for the cell (e.g., cell manager 202 may determine the geographic coordinate (latitude/longitude values) that defines the center point of the cell if that information was not provided by the user). In step 408, cell manager 202 encodes the obtained geographic coordinate to generate a cell identifier (cell id) for identifying the cell. For example, in step 408, cell manager 202 may use a geohashing service to generate a geohash for the geographic coordinate based on the geographic coordinate values of the geographic coordinate. In step 410, cell manager 202 stores the generated cell id (e.g., the generated cell id may be stored in database 208). Steps 402-410 repeat if the user wishes to define an additional cell.
Referring now to
For instance, in some embodiments where the cell id is a geohash, data collector 204 may use the selected cell id to obtain the meta data by sending to a geographic information system (GIS) (such as, for example, Open Street Maps, www.openstreetmap.org, retrieved on Sep. 20, 2013) a query containing the cell id. The GIS may use the cell id included in the query to obtain meta-data associated with the cell-id. Such meta-data may include the above described relative frequency of existence information.
As another example, in other embodiments, data collector 204 may use the selected cell id to obtain the meta data by obtaining a geographic coordinate for the cell and send to the GIS a query containing the geographic coordinate to obtain information concerning an area in which the geographic coordinate is located. Such information may include the above described relative frequency of existence information.
In step 606, data collector 204 stores the obtained meta-data with the cell id (see e.g.,
Referring now to
In step 704, after obtaining the activity information, which includes at least one activity item, data collector 204 selects an activity item that is included in the activity information. As discussed above, an activity item may include information identifying an activity (usage of specific app) and a timestamp. In some embodiments, for each activity item included in the set of activity information, the activity information includes location information identifying the location in which the activity was performed.
In step 706, for the selected activity item, data collector 204 uses the location information identifying the location in which the activity was performed to determine the cell in which the activity took place. And in step 708, data collector 204 stores the activity information in association with the cell id that identifies the cell determined in step 706. If not all of the activity items that were included in the obtained activity information have been selected and processed, then the process may repeat, otherwise it may end.
Referring now to
In step 806, activity estimator 206 determines whether an amount of activity information associated with the selected cell identifier is less than a threshold activity amount (T1). T1 may be one activity item or any number of activity items. In some embodiments, determining whether an amount of activity information associated with the selected cell identifier is less T1 consists of determining whether cell database 208 does not contain any activity information associated with the cell identifier. In other embodiments, determining whether an amount of activity information associated with the selected cell identifier is less than T1 comprises determining whether the amount of the filtered activity information is less than T1. For example, determining whether an amount of the filtered activity information is less than T1 comprises determining whether the number of activity items selected in step 804 is less than T1.
In step 808, which is performed as a result of activity estimator 206 determining that an amount of activity information associated with the selected cell identifier is less than T1, activity estimator 206 determines a likelihood (L) of the cell identified by the selected cell identifier having an amount of activity information that exceeds T1. This probability determination may be based on the average amount of activity information that is associated with all cells that are similar to the cell identified by the selected cell identifier.
In step 809, activity estimator 206 determines whether L is greater than a likelihood threshold (T2). If it is not, then the process may proceed back to step 802, otherwise it may proceed to step 810. In step 810, activity estimator 206 determines a group of cells that are similar to the cell identified by the selected cell identifier, as described above with respect to step 310. In step 812 for each cell in the group, activity estimator 206 obtains activity information for the cell (e.g., for each cell in the group, activity estimator 206 retrieves from cell database 208 the activity information associated with the cell identifier that identifies the cell). In step 814, activity estimator 206 uses the obtained activity information to generate activity information for the selected cell, as described above with respect to step 314. In step 812, stores the generated activity information in association with the cell identifier that identifies the selected cell
Referring now to
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the invention unless otherwise indicated herein or otherwise clearly contradicted by context.
Additionally, while the processes described above and illustrated in the drawings are shown as a sequence of steps, this was done solely for the sake of illustration. Accordingly, it is contemplated that some steps may be added, some steps may be omitted, the order of the steps may be rearranged, and some steps may be performed in parallel.
Furthermore, while data generation system 120 has been described in connection with recommendation system 112, this was done merely for illustration because, as described above, data generation 120 need not be a component of or function together with a recommendation system. For example, in other embodiments, data generation system 120 can be used with systems that: i) show top lists of an area; ii) use top lists and statistics for presenting most relevant information first (examples of such occasions are where there is too much information to present on a small screen); and iii) provide a search tool. With respect to search tools, if a user is searching for a particular item (e.g., thing or place) using a search tool, those items with the highest predicted relevance for the location (cell Id) would be shown first. An example of this is would be when an ad company wants to display something at a university campus with no history data collected from, they would base the ads filtering on other university campuses usage patterns. Another example would be if a user is searching for a term but spells the term wrong, the search engine could instead use the metadata about the location (being e.g. a university campus) and relate it to other search terms in places with history data and suggest a corrected spelling of the term to create a better smoother and leaner service for the user of the system. Yet another example is that the data generation system 120 could be used in a system for determining and/or predicting application/s usage for a given area/cell in order to determine future or current demands on a network to be able to provide the right quality of service network performance within an area/cell where certain applications are likely to be used. In other words, the data generation system 120 can be used in for example an Operations Support System (OSS).
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/SE2013/051181 | 10/8/2013 | WO | 00 |