The ability to predict demographic profiles of customers can be very important to many businesses. In particular, businesses may desire demographic information such as age, sex, income, and ethnical background when planning marketing campaigns or planning product releases. In some situations, businesses can rely upon government gathered demographic information. For example, many countries around the world perform a census or some other form of demographic gathering activity on a periodic basis where much of this information is gathered and divided up by geographical area.
For many businesses, however, government gathered demographic information is either too out-of-date or not gathered with enough frequency to be of maximum value. Specifically, many businesses would benefit from being able to track demographic data between each census. The ability to access up-to-date demographic information can provide an advantage to a business that is preparing to release or market a new product.
Additionally, a business may desire to gather demographic data about a particular geographic area during a particular time period. For instance, most demographic data is based upon the geographical area where individuals are domiciled. In many cases, however, a particular geographic area may have a different demographic make-up at night than it has during the workday. A business may be interested in knowing what the workday demographics of a particular area are when determining where to locate a new restaurant, for example. Similarly, a business may be interested in knowing the daytime demographics of the residents of the particular area when determining whether to launch a door-to-door marketing campaign in the area.
Accordingly, there is a need for methods and systems for providing up-to-date and/or geographically customizable demographics.
Embodiments disclosed herein relate to methods, systems, and computer program products for determining the demographics of a particular geographical area. In particular, in at least one embodiment, the real-time demographics of a geographical area can be approximated based upon a demographic profile that is associated with each individual mobile phone within the geographical area. Additionally, in at least one embodiment, a residential demographic can be associated with one or more mobile phones by determining the domicile of the mobile phone user.
Embodiments disclosed herein relate to a method for predicting the demographic characteristics of people within a geographic area that has cellular coverage. For example, the method can include determining that a first mobile phone user is domiciled within a first geographic area. Upon determining that the first mobile phone user is domiciled within the first geographic area, a demographic profile can be associated with the user. The present invention can then detect that the first mobile phone user has relocated domiciles to a second geographic area. The present invention can calculate an updated demographic profile of the second geographic area by incorporating the specific demographic profile associated with the first user into a demographic profile associated with the second geographic area.
Another embodiment disclosed herein relates to a system for predicting the demographic characteristics of people within a geographic area with cellular coverage. The system comprises a method for automatically identifying one or more mobile phones that are present within a particular geographic area. After identifying the mobile phones within a particular area, demographic profiles can be associated with the one or more mobile phones. Next, the present invention can determine that based upon the frequency with which a new mobile phone is present within the particular geographic area during a particular time of day that a new user associated with the new mobile phone is domiciled within the particular geographic area. In at least some embodiments, the new user may have previously been determined to be domiciled in another geographic area. Further, the present system can include updating a demographic profile associated with the particular geographic area to include information from a demographic profile associated with the new user. The demographic profile associated with the new user may be based upon a previous domicile of the new user.
The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.
To further clarify the advantages and features of the various embodiments of the invention, a more particular description will be rendered by reference to specific embodiments that are illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The various embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
Embodiments of the present invention relate to methods, systems, and computer program products for determining the demographics of a particular geographical area. In particular, in at least one embodiment, the real-time demographics of a geographical area can be approximated based upon a demographic profile that is associated with each individual mobile phone within the geographical area. Additionally, in at least one embodiment, a residential demographic can be associated with one or more mobile phones by determining the domicile of the mobile phone user.
Many modern mobile phone systems provide various methods and systems for tracking the location of the mobile phone, and by association the mobile phone user. For example, many mobile phones contain a Global Positioning System (GPS) module that provides highly accurate location information to both the mobile phone user and potentially to a cellular network that is communicating with the mobile phone. Cellular networks may also be able to track the geographic location of mobile phones that do not contain GPS modules by using various localization methods. For instance, multiple cellular receiving stations can be used to localize a mobile phone by analyzing the signal strength between each respective cellular receiving station and the mobile phone.
Once a location of a mobile phone has been determined various attributes of the mobile phone user can be inferred. For example, if the location of a mobile phone is consistently within the same geographic area between 11:00 pm and 4:00 am for an extended number of days, it can be inferred that the mobile phone user is domiciled within the geographic area. This feature may be of particular value when a home address of the mobile phone user is not otherwise known. Accordingly, in at least one embodiment the present invention provides a method for determining the geographical area in which a customer is domiciled.
Once a domicile of a mobile phone user has been determined, various demographic attributes can be inferred and associated with the mobile phone user. For instance, demographic data associated with the geographical area can be associated directly with the mobile phone user. In at least one embodiment, the demographic data can be based upon information gathered in a recent government census. For example, within the United States Census, information is gathered relating to ethnicity, race, income, education, and other specific. This previously gathered demographic data, or a subset of the data, can be associated with the mobile phone user, such that the mobile phone user is assumed to be representative of the demographics of the geographic area.
In addition to determining a domicile and associated demographics of a mobile phone user, in at least one embodiment, the present invention can also provide real-time demographics of a geographical area. For example, a cellular network can be used to identify each of the mobile phone users that are within the geographical area of interest during the time of interest. The individual demographics of each identified mobile phone user can then be accessed to determine a real-time demographic profile of the area of interest.
An embodiment of the present invention for performing at least some of the above-described functions is depicted in
In at least one embodiment, the demographic processing module 100 is the central processing unit for determining demographic data within the system 150. The demographic processing module 100 is in communication with the storage unit 110. The storage unit 110 can store census data that was previously gathered by a governmental organization or some other organization. The storage unit 110 can also store updated demographic information that the demographic processing module 100 has calculated.
The demographic processing module 100 is also in communication with a location tracking processor 120 that provides the demographic processing module 100 with information relating to the location of various mobile phone users. In particular, the location tracking processor 120 sends information to and receives information from a location module 130. The location module 130 in turn is in communication with a variety of location determination components. For example, the location module 130 can receive location information from GPS units that are integrated into various mobile phones. Additionally, the location module 130 can use various cellular stations 210 to perform various localization techniques to determine the location of mobile phones.
When the location tracking processor 120 determines a location of a mobile phone, the location tracking processor 120 can save that location within the mobile location storage unit 140. Based upon the various locations of the mobile phone over time, the location tracking processor 120 can infer specific attributes with the mobile phone. For example, if the location tracking processor 120 identifies that a particular mobile phone has been located within a particular geographical area during nighttime hours for a threshold number of day, the location tracking processor 120 can infer that the user of the mobile phone is domiciled within the particular geographic area.
In at least one embodiment, nighttime hours can comprise the hours between 8:00 pm-8:00 am, between 9:00 pm-7:00 am, between 10:00 pm-6:00 am, between 11:00 pm-5:00 am, between 12:00 pm-4:00 am, between 1:00 am-3:00 am, or through some other span of hours that individuals would normally be sleeping. Additionally, the threshold number of days required to infer a domicile can comprise a set span of days (e.g., one week, two weeks, one month, two months, three months, etc.), or can comprise a specific ratio of days. For example, the threshold may designate a mobile phone user as being domiciled within the geographic area that is the most common nighttime location of an associated mobile phone over a period of time. For instance, the location tracking processor 120 can determine that a mobile phone user is domiciled within a particular geographic area if an associated mobile phone was located in that geographic area during night time hours more often that it was located in any other geographic area during nighttime hours over a month period, or some other period of choice. Similarly, the location tracking processor 120 can determine that a mobile phone user is domiciled within a particular geographic area if the phone is located within that area during nighttime hours at least 2 out of every 3 days.
Once a domicile for a particular mobile phone has been determined, the demographic processing module 100 can update demographic information with relation to the particular mobile phone. For example, in at least one embodiment, if the mobile phone has not been previously associated with a demographic profile, the demographic processing module 100 can associate the demographics of the geographic area of the determined domicile with the mobile phone (and by associated the mobile phone user). In at least one embodiment, this action may comprise the demographic processing module 100 assigning the mobile phone with a demographic profile derived from a governmental census. In alternative embodiments, the demographic processing module 100 may assign the mobile phone a demographic profile that has been updated by the demographic processing module 100 since the previous government census.
As an example, assume that the demographic processing module 100 is analyzing mobile phones within geographic areas 200 and 202, as depicted in
Further, in this example, suppose, that in geographic area 200 the U.S. census has identified the following ethnical background distribution: a) white—50%, b) Latino/Hispanic—25%, c) African Americans—5%, and d) Asians—20%. In at least one embodiment, this demographic data is represented in vector form as [0.5, 0.25, 0.05, 0.2]. Similarly, suppose that in geographic area 202 the U.S. census identified the following ethnical background distribution: a) white—40%, b) Latino/Hispanic—30%, c) African Americans—10%, and d) Asians—20%. Accordingly, this demographic data can be represented in vector form as [0.4, 0.3, 0.1, 0.2]. As such, in some cases within this application a demographic profile is referred to as a demographic vector, and visa versa.
Using the vectors stated above, the demographic processing module 100 can associate each mobile phone that is determined to be domiciled within geographical area 200 or 202 with the appropriate respective demographic vector. For example,
Stated more broadly, the demographic processing module 100 can characterize the demographics within each area as a vector X0=(x1,x2,x3,x4), where x1+x2+x3+x4=1. Additionally, the demographic processing module 100 can associate multiple different vectors with each mobile phone, where each vector represents a different demographical attribute. For example, the demographic processing module 100 can associate a vector relating to ethnicity, a vector relating to age, and a vector relating to income with each mobile phone. Each of the associated vectors can be distinct or part of a matrix of vectors.
In addition to calculating a demographic profile associated with a mobile phone user, the demographic processing module 100 can also recalculate and update the demographic profile that is associated with a particular geographic area. For example,
While the demographics of the mobile phone users in
In
X
1=(N*X+D1+D2+ . . . Dm)/(N+M)
[0.48, 0.26, 0.06, 0.2]=(4.*[0.5, 0.25, 0.05, 0.2]+[0.4, 0.3, 0.1, 0.2])./(4+1)
Because no new mobile users with different demographic profiles moved into geographic area 202, the demographics of area 202 will remain the same, [0.4, 0.3, 0.1, 0.2]. Once the demographic processing module 100 calculates the new demographics of area 200 and 202, the updated demographics can be stored within the storage unit 110, and accessed for later calculations. Using this approach, demographics vectors corresponding to geographical areas can be recalculated for any time increment.
The above described calculations and demographics are based upon the mobile phone users that are determined to be domiciled within each respective area. This demographic information may be valuable to an advertising company in determining whether to run a particular ad campaign in that demographic area. In future calculations of demographics, each of the mobile users 220, 222, 224, 226, 230 can maintain their demographic profile as depicted in
In contrast, in at least one embodiment, after updating a demographic profile for a particular geographic area, the demographic processing module 100 can cause each mobile phone user 220, 222, 224, 226, 230 to inherit the updated demographic of the geographic area. In the example described above, this would mean that after updating the demographic associated with geographic area 200, each of the mobile phone users 220, 222, 224, 226, 230 would have their individual geographic profiles updated to [0.48, 0.26, 0.06, 0.2] to reflect the updated demographic profile of geographic area 200. In future demographic calculations, mobile phone users would all be treated as if their respective demographic profile was [0.48, 0.26, 0.06, 0.2].
Similarly, when a new mobile phone that has not previously been associated with a demographic profile is detected within a particular geographic area 200, 202, the new mobile phone can inherit the demographic profile of the area where the location tracking module processor 120 determines the mobile phone to be domiciled. For example, if a new mobile phone user is determined to be domiciled within area 200, then the demographic processing unit 100 can associate the new mobile phone user with the demographic profile of geographic area 200 (e.g., [0.5, 0.25, 0.05, 0.2]).
The above description is directed towards determining the demographic profiles of various mobile phone users that are domiciled within a particular geographic area. In at least one embodiment, however, it may be beneficial to determine the real-time demographics of a particular geographic area. For example, a company that is trying to determine a location for a future restaurant may be interested in knowing the demographics of a particular geographic area during a work week lunch break.
For instance,
For example,
D
A=(X1+X2+X3+ . . . +Xn)/n
Where DA is equal to the demographics vector at real time t, and X1, . . . , Xn are demographics vectors of the mobile users who's location, or location of their serving cell tower at time t is within the geographical area of interest. As applied to
[0.38, 0.35, 0.1, 0.17]=([0.3, 0.45, 0.1, 0.15]+[0.5, 0.2, 0.1, 0.2]+[0.6, 0.2, 0.05, 0.15]+[0.3, 0.3, 0.15, 0.25]+[0.2, 0.6, 0.1, 0.1])/5
In other words, using the above stated formula the demographic processing module 100 can identify that the real-time demographic profile of Area 200 is [0.38, 0.35, 0.1, 0.17] (38% white, 35% Latino/Hispanic, 10% African Americans, and 17% Asians). Using the demographic profile generated from the limited number of mobile phone users, a business or customer can infer that the entire geographic area 200 has a similar demographic.
In addition to the methods described above, there are additional methods for calculating a demographic profile for a geographic area, based upon the demographics of mobile phone users. For example, a weighted moving average can be used to calculate a demographic profile for a geographic area. In this case, the demographic processing module 100 relies upon several previous demographics vectors of a given geographic area. Accordingly, a exemplary formula for calculating a weighted average demographic profile is provided below:
X
i+1
=w
k
*X
i−k
+w
k−1
*X
i−k+1
+ . . . w
1
*X
i−1
+w
0
*X
i
In this equation, the demographics processing module 100 is predicting home demographics vector Xi+1 at a “next” time increment i+1. The inputs to the equation include previously observed values of home demographics vectors Xj at previous time increments j. Additionally, weighting factors (“wj”) are applied to each previous demographic vector, such that w0+w1+ . . . +wk=1. In at least one embodiment, the weights can all be equal. In an alternate embodiment, a higher weighting can be associated with more recent demographic vectors.
An additional method that can be used to calculate current and future demographic profiles of a geographic area can include calculating a “velocity” and “acceleration” of previous demographical change. For example, the following equations can be used to calculate velocity and acceleration, respectively:
V(i)=X(i+1)−X(i)
a(i)=V(i+1)−V(i)
Velocity (“V(i)”) is calculated by calculating the demographic profile vector at time “i,” and then again at time “i+1.” The two resulting demographic vectors are then subtracted from each other to generate a “velocity” associated with the change in demographics between time interval “i” and time interval “i+1.” In some cases, however, the demographic processing module 100 will not have to calculate the demographic vector for time interval “i” and time interval “i+1,” but instead can retrieve that demographic vectors from the storage unit 110, if they were previously calculated.
After calculating a demographic velocity, the demographic processing module 100 can predict a demographic vector of a geographic area at time increment j, where j is greater than i, by using only X(i) and the previously calculated V(i) and given that all coordinates of a vector X(j) stay nonnegative:
X(j)=X(i)+(j−i)*V(i)
Similarly, the demographic processing module 100 can predict a demographics vector for a particular area at time increment j, where j is greater than i, by using a demographic vector (“X(i)”), a demographic velocity (“V(i)”) and a demographic acceleration (“a(i)”). In particular, a predicted demographic vector of a particular geographic area at time “j” can be calculated using the below equation, given that all coordinates of a vector X(j) stay nonnegative:
X(j)=X(i)+(j−i)*V(i)+a*(j−i)*(j−i−1)/2
In at least one embodiment, the demographic processing module can recalculate the estimates of V(i) and a(i), at each current time increment “i”, by using previously observed values of X(i), X(i−1) and X(i−2) and using formulas provided above.
In addition to using demographic velocity and/or acceleration to predict a future demographic, in at least one embodiment, the demographic processing module 100 can also use historical demographic data relating to other geographic areas that have similar attributes to the geographic area of interest. For example the demographic processing module 100 can divide various portions of the demographic data relating to a plurality of geographic areas into multi-dimensional “bins.” For instance, the demographic processing module 100 can create a plurality of different bins for various demographic vectors. In the above discussed exemplary cases, the bins may comprise 4-dimensions, such that the bins are sized to fit the 4-dimensional demographic vectors. Each coordinate from within a demographic vector can then fall within a single 1-dimensional bin. For example, a 1-dimensional bin may be configured to receive a coordinate relating to the percentage of white/Caucasians within a particular geographic area. After creating the bins, the demographic processing unit 100 can divide the particular demographic coordinate for the demographic vectors into the plurality of bins, such that similar vector values are placed within the same bin.
Continuing with this example, the demographic processing module 100 can now utilize the bins to predict the next time increment demographics profile for a particular geographic area. To do so, the demographics processing module 100 first identifies the bin to which current value of X(i) belongs. Then the demographics processing module 100 takes into account all observed demographics vectors Yk, corresponding to different geographical areas that fall within the same demographics bin as vector X(i). Next, the demographic processing module 100 observes what actually happened to all vectors Yk at subsequent time increments. In the below equation these values are denoted as Yk(+1). Using the below equation, Yk(+1) can be used to predict the value of X(i+1).
X(i+1)=average of (Yk(+1))
Additionally, the demographic processing module 100 can determine the quality of the above prediction formula by measuring the standard deviation of the Euclidian norms of differences |(Yk(+1)−Yk)|. Smaller standard deviations relate to a higher confidence in the above prediction formula. Accordingly, the demographic processing module 100 can define the demographic bin corresponding to X(i) as stable bin if:
Average (Yk(+1)−Yk)=E, and Euclidean norm |E| is very small positive number close to 0, and
Std|(Yk(+1)−Yk)|=E1, and |E1| is less than some small threshold value
In contrast, the demographics processing module can define a demographics bin as unstable if the above conditions are not satisfied. In general, stable bins usually are related to the homophily property, which basically states that some people “tend to live among the people similar to themselves”.
Similar to the above recited method of predicting demographics of a particular geographic area, in at least one embodiment, the demographic processing module can use a multivariate regression model. Specifically, the model can use previously observed home demographics vectors X(i), (i is less or equal to j) as inputs to predict the next time increment home demographics vector X(j+1).
Accordingly,
For example,
Additionally,
Further,
Additionally,
Additionally,
Further,
Additionally,
Further,
Accordingly,
One skilled in the art will appreciate that, for this and other processes and methods disclosed herein, the functions performed in the processes and methods may be implemented in differing order. Furthermore, the outlined steps and operations are only provided as examples, and some of the steps and operations may be optional, combined into fewer steps and operations, or expanded into additional steps and operations without detracting from the essence of the disclosed embodiments.
Information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
The various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Embodiments of the present invention may comprise or utilize a special purpose or general-purpose computer including computer hardware, as discussed in greater detail below. Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are physical non-transitory storage media. Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, embodiments of the invention can comprise at least two distinctly different kinds of computer-readable media: physical non-transitory storage media and transmission media.
Physical non-transitory storage media includes RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmissions media can include a network and/or data links which can be used to carry or desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
Further, upon reaching various computer system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to physical storage media (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer system RAM and/or to less volatile physical storage media at a computer system. Thus, it should be understood that physical storage media can be included in computer system components that also (or even primarily) utilize transmission media.
Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.
Those skilled in the art will appreciate that the invention may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, pagers, routers, switches, and the like. The invention may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
In the above detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, similar symbols typically identify similar components, unless context dictates otherwise. The embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented herein. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations, all of which are explicitly contemplated herein. It will also be understood that any reference to a first, second, etc. element (for example first purchase information) in the claims or in the detailed description, is not meant to imply numerical sequence, but is meant to distinguish one element from another unless explicitly noted as implying numerical sequence.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.