The present indention relates to the field of digital identities and digital advertising. In particular, but not by way of limitation, the present disclosure teaches techniques for determining related digital identities.
The internet has changed the mass media landscape forever. Before the internet became a mainstream mass media system, advertisers were generally limited to communicating with potential customers using television, radio, and print media (newspapers and magazines) advertising. With the popularization of the global internet, advertisers can now advertise to billions of computer users as those computer users browse the World Wide Web on the internet.
Internet advertising has become a very large industry. Two of the most commonly used advertising channels on the World Wide Web are internet search advertising and banner advertisements. Internet search advertising operates by allowing users to enter search keywords into an internet search service and then Interspersing advertisements (generally related to the search keywords) within the results of the internet search. Banner advertisements are defined areas of a web page that contain advertisements in the same manner that traditional magazines and newspapers use newsprint area for advertising. Both internet search result advertisements and internet banner-advertisements have a significant advantage over prior advertising systems since the recipient of an internet advertisement may click on the internet advertisement to obtain more information or directly proceed to an Internet retailer for an immediate sale.
The internet advertising industry for advertising on personal computer systems on the internet has matured and become very sophisticated. The internet advertisers and internet advertising services use several techniques of obtaining information about internet users such that the most appropriate advertisements may be selected for each internet user. For example, Internet advertisement services may track the web browsing history from particular personal computer to determine the interests of that user and thus create a demographic profile of that internet user. Furthermore, the contents of a web page that is being delivered to personal computer may be analyzed to help select an appropriate banner advertisement that is closely related to the contents of the web page.
Although internet advertising to personal computer users that are browsing the World Wide Web has become relatively sophisticated, the overall internet advertising industry is still in its infancy. There are now many new digital electronic devices that use the internet and can be used to deliver advertising to their user. For example, cellular telephones, video game consoles, set-top video streaming boxes, internet radio devices, and tablet computer systems can all be used to deliver internet advertisements to their respective users. The techniques used to select and deliver advertisements to the users of these emerging internet platforms are relatively primitive. Thus, it would be desirable to provide tools that provide an improved ability to select appropriate internet advertisements to users of these new internet-connected digital electronic devices.
In the drawings, which arc not necessarily drawn to scale, like numerals describe substantially similar components throughout the several views. Like numerals having different letter suffixes represent different instances of substantially similar components. The drawings illustrate generally, by way of example, but not by way of limitation, various embodiments discussed in the present document.
The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show illustrations in accordance with example embodiments. These embodiments, which are also referred to herein as “examples,” are described in enough detail to enable those skilled in the art to practice the invention. It will be apparent to one skilled in the art that specific details in the example embodiments are not required in order to practice the present invention. For example, although some of the embodiments are mainly disclosed with reference to cellular telephones, the techniques disclosed in this document may be used with other types of digital electronic devices such as tablet computer systems and video game systems. The example embodiments may be combined, other embodiments may be utilized, or structural, logical and electrical changes may be made without departing from the scope of what is claimed. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents.
In this document, the terms “a” or “an” are used, as is common in patent documents, to include one or more than one. In this document, the term “or” is used to refer to a nonexclusive or, such that “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated. Furthermore, all publications, patents, and patent documents referred to in this document are incorporated by reference herein in their entirety, as though individually incorporated by references. In the event of inconsistent usages between this document and those documents so incorporated by reference, the usage in the incorporated reference(s) should be considered supplementary to that of this document; for irreconcilable inconsistencies, the usage in this document controls.
The present disclosure concerns digital computer systems.
In a networked, deployment, the machine of
The example computer system 100 of
The computer system 100 may include a video display adapter 110 that drives a video display system 115 such as a Liquid Crystal Display (LCD) in order to display visual output to a user. The computer system 100 may also include other output systems such as signal generation device 118 that drives an audio speaker.
Computer system 100 includes a user input system 112 for accepting input from a human user. The user input system 112 may include an alphanumeric input device such as a keyboard, a cursor control device (e.g., a mouse or trackball), touch sensitive pad (that may be overlaid on top of video display 115), a microphone, or any other device for accepting input from a human user.
The computer system 100 may include a disk drive unit 116 for storing data. The disk drive unit 116 includes a machine-readable medium 122 on which is stored one or more sets of computer instructions and data structures (e.g., instructions 124 also known as ‘software’) embodying or utilized by any one or more of the methodologies or functions described herein. The instructions 124 may also reside, completely or at least partially, within the main memory 104 and/or within a cache memory 103 associated with the processor 102. The main memory 104 and the non-volatile memory 106 associated with, the processor 102 also constitute machine-readable media. The non-volatile memory 106 may comprise a removable flash memory device.
The computer system 100 may include one more network interface devices 120 for transmitting and receiving data on one or more networks 126. For example wired or wireless network interfaces 120 may couple to a local area network 126. Similarly, a cellular telephone network interface 120 may be used to couple to a cellular telephone network 126. The various different networks 126 are often coupled directly or indirectly to the global internet 101. The instructions 124 and data 125 used by computer system 100 may be transmitted or received over network 126 via the network interface device 120. Such transmissions may occur utilizing any one of a number of well-known transfer protocols such as the well known File Transport Protocol (FTP).
Note that not all of the parts illustrated within
While the machine-readable medium 122 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of infractions for execution by the machine and that cause the machine to perform any one or more of the methodologies described herein, or that is capable of storing, encoding or carrying data structures utilized by or associated with such a set of instructions. The term “machine-readable medium” shall accordingly be taken to include, hut not be limited to, solid-state memories, optical media, battery-backed RAM, and magnetic media.
For the purposes of this specification, the term “module” includes an identifiable portion of code, computational or executable instructions, data, or computational object to achieve a particular function, operation, processing, or procedure. A module need not be implemented in software; a module may be implemented in software, hardware/circuitry, or a combination of software and hardware.
The global internet has become a mass medium that connects publishers of information with consumers of information. Some of the publishing on the internet is done on a subscription or paid-for basis wherein a consumer pays for access to specific information. For example, a news publisher may create news web site that provides specific premium content only to customers that pay a subscription fee. However, a very large portion of the informational content available on the global internet is freely available. Web site content, videos, podcasts, and games are all freely available on the internet. To fund much of the freely available content on the global internet, internet publishers rely on advertisers that pay to have their advertisements displayed alongside and embedded within an internet publisher's content.
Although internet advertising started with advertisements to personal computer users browsing the World Wide Web with web browser programs, the internet advertising market has grown significantly beyond that early stage.
Thus, as illustrated in
In order to really provide substantial value to internet advertisers, internet advertising must be well targeted. Advertising suntan lotion during the winter to a person living in Minnesota does not provide much value to the suntan lotion manufacturer. Thus, internet advertising services attempt to learn as much as possible about their audience in order to select the most appropriate internet advertisements. As set forth in the background, internet web site publishers that advertise to users running web browsers on personal computers have developed many different techniques for learning about their users. These techniques may include:
However, these advertising targeting techniques developed for personal computer based internet advertising often cannot be used with other internet-connected digital devices. For example, there may be no web browsing history on the video game console 259 of household A 250. Even though the cellular telephones 252 and 262 and the tablet computer system 257 may have web browsing histories to draw from, the users may only rarely use those devices for web browsing such that their very limited browsing history does not provide an accurate demographic profile of the user to guide advertisement selection. It is therefore often much more difficult to provide well-targeted advertising to internet-connected digital devices other than personal computer systems.
Referring to
When user X brings laptop computer system 251 and cellular phone 252 to workplace W 210 those two devices obviously can no longer use Wi-Fi router 265 to access the internet 201. Instead, user X connects laptop computer system 251 to the local area network 229 at work and configures cellular phone 252 to use a local Wi-Fi network provided by wireless access point 225. With these two internal connections, user X will then be able to access the internet 201 using laptop computer system 251 and cellular phone 252 through the firewall/proxy server 221 at the workplace 210.
When user X is at home A 250, the laptop computer system 251 and the cellular phone 252 will both use a single IP address A 263 that is on Wi-Fi router 265. Similarly, when that same user X is at workplace 210, laptop computer system 251 and cellular phone 252 will both use the single IP address W 223 that is on firewall/proxy server 221. Given that specific internet usage pattern data, an astute observer that knows nothing about user X could make the rational inference that laptop computer system 251 and cellular phone 252 are very likely used by the same person since laptop computer system 251 and cellular phone 252 are used together at both household 250 and at workplace 210. After having made such an inference, an advertiser may link together a digital identifier associated with laptop computer system 251 and a digital identifier associated with cellular phone 252 for advertising purposes. Such a pairing of distinct digital identities to a single user is referred to as digital identity pairing.
Assuming that a digital identity pairing has been performed accurately (the two paired digital devices are actually used by the same person) then this digital identity pairing may be used to greatly improve the targeting of internet advertising to both of the linked platforms. For example, if an advertising service correctly deduces that laptop computer system 251 and cellular phone 252 are used by the same person then the advertiser can advertise cellular telephone accessories for the specific brand of cellular phone 252 when that person uses their laptop computer system 251 to browse the World Wide Web. More importantly, when that person (user X) uses an advertising supported application on cellular phone 252, advertisers can leverage the longer and more detailed user profile data that has been collected from laptop computer system 251 to accurately select targeted advertisements for display on that cellular phone 252. Thus, digital identity pairing can greatly improve the quality of internet advertising on digital devices that have been correctly identified as belonging to the same user.
Referring to
The client identifier is used to identify a specific client device, a client program, a user identity, or any other type of digital identity. Examples of client identifiers include web browser cookies, cellular telephone device identifiers, MAC addresses, userIds, and any other similar identifier that is linked to a specific client device, client program, or user. The teachings of the present disclosure may be used with a wide variety of different client identifiers. In example digital identity pairings that will be disclosed with reference to
The common source/destination identifier is the identity of some source or destination that client devices (as identified by their client identifiers) will likely have in common if the two client devices are related. In the situation depicted in
The timestamps in each data triad may be used to ensure that the data used is relevant. The ownership of internet connected devices may change over time such that very old internet usage data should not be used. Furthermore, many Internet Protocol addresses are “dynamic addresses” that may be used by different entities at different times. Thus, internet usage data observations should have relatively close temporal relations in order to provide accurate digital identity pairing results. In addition to ensuring that internet usage observations are temporally proximate, certain embodiments of the disclosed system use the timestamps of internet usage data triads in a more sophisticated manner as will be disclosed in a later section of this document.
The triads of internet usage data (client identifier, common source/destination identifier, and timestamp) may be collected by internet servers that track each internet server request received. In particular, internet advertisement services and internet web publishers that track each advertisement or web page served are excellent sources of internet usage data information. Individual application programs (such as games, media aggregators, utilities, etc.) that run on client devices and report usage information to servers on the internet are also excellent sources of usage data.
Referring back to
In embodiments that use IP addresses as common source/destination identifiers, two different techniques have been used to select potential digital identity pairs for further analysis. A first strategy is to examine the number of different digital identities known to use the same IP address. Specifically, if less than a threshold number of digital identities are known to use a specific IP address then all of the different logical pairings of digital identities from that single IP address may be viewed as potential digital identity pairs. The reasoning is that if there are just a few different digital identities related to a single common IP then there is a good probability that some of those different digital identities are associated with the same person and that one may be able to statistically link the digital identities belonging to that same person. For example, a family household that shares a single internet account will likely have family members that use more than one digital identity that can be statistically linked.
In one embodiment the threshold value is set to six such that if there are six or less digital identities seen at a particular IP address then various logical combinations of those six or less digital identities may be considered potential digital identity pairs. For example, in
In another embodiment, the digital identity pairing system considers the specific IP address origin and determines if that IP address is an address where paired digital identities are likely to be found (such as household residences as set forth above). All of the static IP addresses on the internet are allocated by the internet Corporation for Assigned Names and Numbers (ICANN). By examining who owns a particular IP address, one may make a determination as to whether it will be easy to identify related digital identities may be located at that IP address. Thus, for example, IP addresses that are used by an internet service provider (ISP) to provide residential internet service may be good IP addresses to use when identifying potential digital identity pairs. Various other systems of identifying residential household IP addresses may also be used. In addition, other techniques of identifying likely digital identity pairs may also be used in step 330 in addition to or instead of the systems for identifying residential households.
After selecting sets of potential digital identity palm at stage 330, the digital identity pairing system then processes the gathered internet usage data at stage 340 to determine association scores for the potential digital identity pairings. Those digital identity pairings with the most favorable association scores will be deemed most likely to be associated with the same human user. Many different techniques may be used to calculate association scores for the potential digital identity pairings. A detailed explanation of one particular method of calculating association scores (and other related metrics that may also be used) is presented in a later section of this document but various other different scoring systems may be used.
Since the observed internet usage data will vary over time and certain chance activities may cause false digital identity associations to be detected, thee association scores may be post-processed to remove noise. For example, association scores may be smoothed out over time using various techniques. Thus, at stage 360, the association score data that has been generated over time may be post-processed such that outlier data points are largely filtered out. The end result of stage 360 is a set of high probability digital identity pairings.
Finally, at stage 390, the identified high probability digital identity pairs are used to improve the targeting of internet advertising to both of the paired digital identities. The accumulated profile data from the two separate digital identities may be combined in a synergistic manner to provide a detailed profile of the human user associated with the paired digital identities. This detailed profile may then be used to select the best internet advertisements for that human user when either of the digital identities requires an internet advertisement.
As set forth in the flow diagram of
User Z resides at household B 280 regularly uses laptop computer system CZ 281 and cellular phone DZ 282. While at household B 280, both CZ 281 and DZ 282 will use IP address B 283 that is assigned to Wi-Fi router 285 in use at household B 280. Both user X and user Z work together at workplace W 210 such that CX 251, DX 252, CZ 281, and DZ 282 are regularly used at workplace W 210. While at workplace W 210 those digital devices will all use IP address W 223 that is assigned to firewall/proxy 221 at workplace W 210. Many other digital devices (211, 212, 213, 214, 215, and 216) will also use IP address W 223 at workplace W 210.
Finally,
After collecting internet usage data (as set forth in stage 310), the next step in identifying digital identity pairs is to select a set of potential digital identity pairs as set forth in stage 320 of
After identifying a set of potential digital identity pairs, the digital identity pairing system then calculates association scores for all of the potential digital identity pairs as set forth in stage 340 of
In one particular embodiment, the digital identity pairing system uses a variation of Bayesian probability analysis to calculate an association score tor each of the potential cookie and deviceID digital identity pairs. In addition a “support” score and “confidence” score may also be calculated. The support, confidence, and association scores may be defined as follows:
Support=P(cookie, deviceID)
Confidence=P(cookie|deviceID)
Association(cookie→deviceID)=P(cookie|deviceID)/P(cookie)
These three scores may be used to Identify digital identity pairings and to rate the confidence in a digital identity pairing that has been made. The support score gives an indication of how much data support there is for the analysis of this particular cookie and deviceID pair. The confidence score gives an indication of how much confidence there is in the association score. The association score provide rating of how closely the cookie and deviceID are associated.
In the present disclosure, the support, confidence, and association scores are calculated using the set of internet usage observations on the various digital identities being considered that were collected in stage 310 of
co-occurrences(cookie, deviceID)=number of times both cookie and deviceID at the same location (same source identifier IP address).
P(cookie, deviceID)co-occurrences(cookie, deviceID)/total sample size
P(cookie|deviceID)=co-occurrences(cookie, deviceID)/occurrences(deviceID)
P(cookie)=number of occurrence (cookie)/total sample size
To best illustrate the manner in which support, confidence, and association scores may be calculated in one embodiment, the association scores for some potential digital identity pairings in
To illustrate how the digital identity pairing system works, an example is hereby presented wherein the digital identity pairing system attempts to pair cellular phone DX 252 from household A 250 with one of the laptop computers CX 251 or CY 261 used at household A 250. To calculate an association score for DX and CY the following information from the table in
The above observations are then used to calculate Association(CY→DX) as follows:
P(CY)=# of occurrence (CY)/total sample size=(2+7)/97=11/97
P(CY|DX)=co-occurrences(CY, DX)/occurrences(DX=2/(4+8)=1/6
Association(CY→DX)=P(CY|DX)/P(CY)=(1/6)/(11/97)=1.47
The other potential pairing for cellular phone DX 252 is the pairing of cellular phone DX 252 with laptop computers CX 251. To calculate an association score for the pair of CX and DX the following information from the table in
The above observations are then used to calculate Association(CX→DX) as follows:
P(CX)=# of occurrence (CX)/total sample size=(3+10)/90=13/97
P(CX|DX)=co-occurrences(CX, DX)/occurrences(DX=(3+8)/(4+8)=11/12
Association(CX→DX)=P(CX|DX)/P(CX)=(11/12)/(13/97)=6.34
When comparing the two association scores, the higher association score is selected. In this case, the Association(CX→DX)=6.34 score is much higher than the Association(CY→DX)=1.47 score such that the pairing of laptop computer system CX 251 and cellular phone DX 252 are deemed to be a high-probability digital ideality pair. The support and confidence scores for this pairing are as follows:
Support=P(CX, DX)=co-occurrences(CX, DX)/total sample size
Support=(3+8)/97=11/97
Confidence=P(CX|DX)=co-occurrences(CX, DX)/occurrences(DX)
Confidence=(3+8)/(4+8)=11/12
The Support metric may he used to filter out observations that do not have enough statistical significance. In one embodiment the minimum Support metric is calibrated based on a desired Precision/Recall measurement score. The Confidence metric is a value calculated as part of the Association score.
In household B 280, the only possible digital identity pairing is of laptop computer system CZ 281 and cellular phone DZ 282. To calculate an association score for CZ and DZ the following information from the table in
The above observations are then used to calculate Association(CZ→DZ) as follows:
Since CZ and DZ is the only possible digital identity pairing of digital identities at household B 280 there are no other association scores to directly compare it against. Thus, some threshold value may be used to determine whether that association score is large enough to determine that the two devices should be paired. The 19/97 support score and 19/22 confidence score may also be used to help determine if CZ and DZ will be considered as a digital identity pair.
The association scores calculated for a potential digital identity pair will vary over time depending on the specific digital identity usage observations that are being used. To eliminate this ‘noise’ in the data, various techniques may be used to smooth out the data and provide more consistent results. Thus, as set forth in stage 360 of
One simple means of post-processing association scores to improve the results is to discard association scores that fall below a particular designated threshold level. Association scores that fall below a designated threshold level may simply be irrelevant noise.
To further reduce the noise in the association score data, a digital identity pairing system may post-process association scores to eliminate some of the outlying data samples. For example, if a person goes on a vacation then that person's digital device usage pattern may vary dramatically and thus eliminate a digital identity pair that was discovered. Proper post-processing of the association scores may prevent a temporary change in usage patterns from eliminating an accurately made digital identity pairing.
In one embodiment, the digital identity pairing system collects a number of association scores and calculates a statistical mode of the association scores that have been gathered over time. The statistical modes of different association scores are then compared against each other to determine the high-probability pairs (instead of comparing recently calculated association scores directly). Using statistical modes may effectively reduce the noise in the sampled digital identity usage data.
One method of calculating a statistical mode involves first creating a set of different association score range buckets. The width of the association score range buckets will vary depending on the data density. Then, the collected set of association scores (such as the scores from
Various other post-processing methods may also be used to smooth out the association scores. For example, instead of using a statistical mode, other implementations may use a median, a mean, or another method of smoothing out the association scores.
As set forth in the previous sections, one embodiment of digital identity pairing system operates by counting the frequency of observations of a digital identity (such as a cookie or a device identifier) at a particular source/destination address (such as an IP address). The internet usage frequency are then used to identify a set of potential digital identity pairs that are then processed to determine a set of association scores for each of the potential digital identity pairs.
The internet usage frequency data can be heavily biased by the number of times that a digital identity is linked to a particular source/destination identifier. For example, a user that spends a very large amount of time using a laptop computer system at work but only rarely uses that laptop computer system while at home will have results that are heavily biased by the large number of data samples collected from the user's work location. To reduce this heavy biasing that may occur; some embodiments employ a Boolean based association score system that does not operate by counting the number of times (frequency) a digital identity is observed at a source/destination in a specified time period. Instead, the Boolean association score system only counts whether a particular digital identity was observed or not at a source/destination identifier during the relevant time period.
P(cookie)=number of occurrences(cookie)/possible locations
To fully explain the Boolean association score calculation methodology, as example is presented with reference to the Boolean usage data disclosed in
The above observations arc then used to calculate Association(CX→DY) as follows:
P(CX)=# of occurrence (CX)/total sample size=(1+1)/4=1/2
P(CX|DY)=co-occurrences(CX, DY)/occurrences(DY)=1/2
Association(CX→DY)=P(CX|DY)/P(CX)=(1/2)/(1/2)=1
The other potential pairing for CX is the pairing of CX and DX. To calculate a Boolean system association score of CX and DX the following information from the table in
The above observations, are then used to calculate Association(CX→DX) as follows:
P(CX)=# of occurrence (CX)/total sample size=(1+1)/4=1/2
P(CX|DX)=co-occurrences(CX, DX)/occurrences(DX)=(1+1)/(2)=1
Association(CX→DX)=P(CX|DX)/P(CX)=(1)/(1/2)=2
As with the frequency count based system, when comparing the two Boolean system association scores, the higher association score may be selected. In this case, the Association(CX→DX)=2 score is higher than the Association(CX→DY)=1 score such that the pairing of laptop computer system (CX) 251 and cellular phone (DX) 252 are deemed to be a high-probability digital identity pair (not the pairing CX and DY).
Both the frequency counting system and the Boolean counting system have their own advantages. To benefit from the advantages in both approaches, some embodiments of the digital pairing system employ a weighted average of the frequency counted association score and the Boolean counted association score.
Initially, at stage 710, the digital identity internet usage data is collected in the same manner as previously described. Next, at stage 720, the digital identity internet usage data is analyzed to select a set of potential digital identity pairings that will further be analyzed. After this point, the internet usage data is then analyzed with the two different scoring systems: a frequency counting system and a Boolean counting system.
Along a first path on the left side of
Along the second path on the right side of
At stage 770, the processed association scores from the two different association scoring systems arc then combined. Various different methods of combining the two different association scores may be used. In one embodiment, the frequency counting association scores and the Boolean counting association scores are combined in a weighted manner with the following basic equation:
ScoreComb=αScoreFreq+βScoreBool
Various different methods may be used to determine the best α and β weighting factors that are used to combine the two different association scores. In one embodiment, class validation was used to calculate the best α and β weighting factors. Specifically, digital identity usage data was collected for several pairs of digital devices where each pair of devices was known to belong to a single user. Since those pairs of digital devices were known to be actually associated with a single user, the values of the α and β weighting factors were selected to maximize the combined association scores those known paired devices. Again, it must be emphasized that additional association scores may also be considered such that a weighted score may be created from several different association scores.
In some embodiments linear regression analysis may be used to determine how to combine the two association scores. Specifically, with both frequency counting association scores and the Boolean counting association scores there are two different predictors. Thus, using a set of known accurate digital identity pairings, linear regression may be used to determine how to combine the two different predictors in a manner that provides accurate results.
After creating a combined association score at stage 770, the digital pairing system may use the combined association score to select high-probability digital identity pairs at stage 780. The high-probability digital identity pairs may be selected from competing potential digital identity pairs. Finally, at stage 790. the combined profile information from the two digital identities in a digital identity pair may be used to accurately select targeted advertisements for both digital identities in the digital identity pair.
In the preceding examples, the common source/destination identifier used was a source IP address that a digital device was using to access the internet. However, as the name implies the common source/destination identifier may identify a destination that is accessed by a particular client device. An example of the destination identifier is presented with reference to
As illustrated in
The same techniques disclosed in the previous sections can be used to calculate association scores for the destination addresses. Once again, the pairing of laptop computer CX 251 to one of the two cellular phones DX 252 or DY 262 will be performed. To calculate an association score of CX and DY the following information from the table in
The above observations are then used to calculate Association(CX→DY) as follows:
P(CX)=# of occurrence (CX)/total sample size=(5+2+4))/56=11/56
P(CX|DY)=co-occurrences(CX, DY)/occurrences(DY)=2/(9+5)=1/7
Association(CX→DY)=P(CX|DY)/P(CX)=(1/7)/(11/56)=0.73
The other potential digital identity pairing for laptop computer CX 251 is the pairing of CX and DX. To calculate an association score for the pair of CX and DX the following information from the table in
The above observations are then used to calculate Association(CX→DX) as follows:
P(CX)=# of occurrence (CX)/total sample size=(5+2+4))/56=11/56
P(CX|DX)=co-occurrences(CX,DX)/occurrences(DX)=(5+4)/(16)=9/14
Association(CX→DX)=P(CX|DX)/P(CX)=(9/14)/(11/56)=3.27
Again, the higher association score may be selected among competing digital identity pairings. In this case, the Association(CX→DX)=3.27 score is much higher than the Association(CX→DY)=0.73 score such that the pairing of laptop computer system CX 251 and cellular phone DX 252 are deemed to be a high-probability digital identity pair. Note that the Boolean Association score may be calculated in the same manner with regard to destination identifiers.
User destination information can also be used to help measure the accuracy of digital identity pairings made with common source identifiers. As set forth in earlier sections of tins document, digital identity pairings may be inferred by observing which digital identities are often seen using the same source IP address. However, inaccurate pairings may occasionally occur due to random coincidences and noise in the data. For example, if two co-workers often go to lunch together such that their mobile internet devices are seen at both a workplace and a lunch place together, a digital identity pairing system may mistakenly pair those two digital devices. Thus, to further verify the accuracy of a digital identity pairing made from common source identifiers, the destination addresses frequented by the digital identities may also be examined.
Given enough user history, a digital device user's visits to destination websites will show a stable pattern that can be recognized. Thus, if two paired digital identities share similar destination website visits then the pairing of the digital identities is probably accurate. However, if the two digital identities have very different destination website visits then a digital identity pairing may be discarded as inaccurate.
If two different computer systems are used by the same user then those two computer systems will generally not have identical web browsing histories since the user will not visit the exact same web pages that have already been viewed. However, the user's interests will generally be consistent such that the user will typically access web sites in the same general interest areas in the same proportions with both digital identities. Thus, if the two digital identities in a digital identity pairing visit web sites in the same general interest areas and in the same proportions that website viewing pattern is evidence supporting that the digital identity pairing is accurate. To quantify a digital identity's browsing patterns, a ‘user entropy’ value for digital identity may be defined as:
H
ID=Σ(Pi*log Pi)
Where Pi is the percentage of accesses to interest grouping i.
The same human user browsing on different computer systems will typically have the same level of user entropy. Thus, if two digital identities have very similar user entropy levels then this is evidence backing the assertion that the same user may be using both computer systems. To compare the user entropy values for different users, an ‘entropy gain’ metric may be defined as
H
Δ
=H
A,B−0.5*HA−0.5*HA
Thus, to test if a digital pairing that has been made is accurate, the user entropy levels for both digital identities may be calculated and then an entropy gain may be calculated. For example, referring back to
HCX=Σ(Pi*log Pi)
H
CX=(0.4*log 0.4+0.2*log 0.2+0.1*log 0.1+0.3*log 0.3)=6.03
H
TX=(0.35*log 0.35+0.25*log 0.25+0.15*log 0.15+0.25*log 0.25)=5.72
The other computer systems in the household (laptop CY 961 and desktop TY 963) have a different browsing history and thus different user entropy values.
H
CY=(0.3*log 0.3+0.3*log 0.3+0.4*log 0.4)=3.32
H
TY=(0.25*log 0.25+0.35*log 0.35+0.4*log 0.4)=3.35
To calculate the entropy gain between laptop CX 951 and desktop TX 953, first the combined entropy score HCX,TX is calculated as follows:
H
CX,TX=(0.375*log 0.375+0.225*log 0.225+0.125*log 0.125+0.275*log 0.275)=5.84
Then the entropy gain is calculated with:
H
Δ
=H
CX,TX−0.5*HCX−0.5*HTX
H
Δ=5.84−(0.5*6.03)−(0.5*5.72)=−0.035
This is a very small change in entropy thereby providing evidence that the digital pairing is correct. The same is true for the pairing of laptop CY 961 and desktop TY 963 where HCY,TY=3.33 and the entropy gain is
H
Δ
=H
CY,TY−0.5*HCY−0.5*HTY
H
Δ=3.33−(0.5*3.32)−(0.5*3.35)=−0.01
In both the laptop CX 951 and desktop TX 953 pairing and the laptop CY 961 and desktop TY 963 pairing, the entropy gains are only slightly negative, indicating a very small decrease in user activity diversity. In general, very small negative or positive entropy gains indicate correct matching. However, if one compares the user entropy of laptop CX 951 and laptop CY 961, the two systems have very different user entropy levels. First, the combined entropy score HCX,CY is calculated as follows:
H
CX,CY=(0.2*log 0.2+0.25*log 0.25+0.2*log 0.2+0.35*log 0.35)=5.65
The entropy gain value is
H
Δ
=H
CX,CY−0.5*HCX−0.5*HCY=5.65−(0.5*6.03)−(0.5*3.32)=0.97.
This relatively large gain in entropy indicates a significant gain in the diversity of the aggregated access pattern, thus suggesting that a pairing of laptop CX 951 and laptop CY 961 is incorrect.
The entropy gain metric can also be used to evaluate the accuracy of matching devices across different platforms. As illustrated in
Instead of analyzing the web browsing history on a cellular phone, a system may instead analyze the usage of application programs on the cellular smartphones, Apple iOS, Android, RIM Blackberry, and other cellular smartphone and tablet systems have thousands of small application programs for accomplishing a wide variety of tasks. These different application, programs can be categorized based on what features and tools the application programs provide. However, the usage of smartphone and table application programs cannot be directly compared to web browsing histories. To handle this, a large user history of web browsing and smartphone/tablet application program usage has been analyzed. There is a correlation between smartphone/table application program usage and web browsing patterns such that one can infer a likely web browsing pattern from a known smartphone/tablet application program usage history. The inferred web browsing pattern can then be used in user entropy comparisons.
For example, the user of cellular phone DX 952 is a light mobile application user that has the following application program usage pattern: 30% on casual games, 30% on life style, 20% on news, 10% on finance, and 10% on other application programs. The user of cellular phone DY 962 has the following application program usage pattern: 50% on action games, 20% on news, 20% on finance and 10% on other apps. With a large amount of historical information web browsing and application usage on known users, a system for inferring web browsing patterns application usage patterns. For example, from the application usage pattern of cellular phone DX 952 an inference system may determine that such a user may allocate their web browsing as follows: 20% on fashion, 25% on news, 15% on finance websites, and 40% on others. In the same manner, the user of cellular phone DY 962 may web browse 25% on news, 20% on finance, 20% on games and 35% on others.
From these inferred web browsing patterns, the user entropy metrics may be calculated. The inferred user entropy values would be:
H
DIX=(0.20*log 0.20+0.25*log 0.25+0.15*log 0.15+0.4*log 0.4)=5.81
H
DIY=(0.25*log 0.25+0.20*log 0.20+0.20*log 0.20+0.35*log 0.35)=5.65
Assuming that laptop CX 951 and desktop TX 953 were correctly paired then one can compare the inferred web browsing history of cellular phone DX 952 with the group laptop CX 951 and desktop TX 953.
H
DIX,CX,TX=(0.2875*log 0.2875+0.2375*log 0.2375+0.1375*log 0.1375+0.3375*log 0.3375)=5.75
The entropy gain may then be calculated as follows:
Such a relatively small entropy gain would provide evidence supporting that the devices are used by the same user. If one attempts to match cellular phone DX 952 with laptop CY 961 and desktop TY 963 then the entropy gain is 1.42 (HΔ=HDIX,CY,TY−0.5*HDIX−0.5*HCY,TY=1.42). Note that the web browsing inference process may not always be accurate such that higher entropy gains may appear. Thus, inference data may be better used for helping select from a set of possible matches. For example, the entropy gain of matching cellular phone DY 962 with laptop CX 951 and desktop TX 953 is 2.64 whereas the entropy gain of matching cellular phone DY 962 with laptop CY 961 and desktop TY 963 is 1.46 such that it would be better to match cellular phone DY 962 with laptop CY 961 and desktop TY 963 Instead of with laptop CX 951 and desktop TX 953. Thus, even with an imperfect inferred website access pattern (which is quite different from an actual recorded website pattern) the system can still determine that cellular phone DX 952 matches laptop CX 951 and desktop TX 953 better than cellular phone DY 962; while cellular phone DY 962 matches laptop CY 961 and desktop TY 963 better than cellular phone DX 952.
Occasionally, this system may encounter two different users have similar website access patterns, and the entropy gain metric will erroneously consider digital identities from the two different users to be from the same user. However, even when this mismatch occurs this mismatch will not decrease the power of targeting advertisement because these two users have the same interests.
In the previous sections, the source/destination identifier was the main distinguishing factor used to identify high-probability pairs. However, the timestamp can also be used to help identify digital identity pairs. Specifically, different digital devices that are often used at the same location around the same time may have a higher probability of being related to the same user. Thus, some embodiments of a digital pairing system use the timestamps in the triads of observed usage data to help calculate association scores.
Various different methods may be used to calculate the session threshold amount of time. In one embodiment, an analysis of time gaps between digital identity usage observations is performed to calculate a session threshold amount of time. Referring to
After determining a method of splitting observations into temporally distinct sessions, the session data may be processed with the same association score methodology set forth in the previous sections. For example,
The techniques disclosed in the previous sections may be used to calculate association scores for the four different sessions depicted in
The above Internet usage observations arc then used to calculate Association (CX→DY) as follows:
P(CX=# occurrence (CX)/total sample size=(4+1+14))/74=19/74
P(CX|DY)=co-occurrences(CX, DY)/occurrences(DY)=2/(12+7+1)=1/10
Association(CX→DY)−P(CX|DY)=P(CX)−(1/10)/(19/74)−0.39
The other potential digital identity pairing for laptop computer CX 251 is the pairing of CX and DX. To calculate an association score for the pair of CX and DX following information from the table in
The above observations are then used to calculate Association(CX→DX) as follows:
P(CX)=# of occurrence (CX)/total sample size=(5+2+4))/56=11/56
P(CX|DX)=co-occurrences(CX,DX)/occurrences(DX)=(2+9)/(11)=1
Association(CX→DX)=P(CX|DX)/P(CX)=1/(11/56)=5.09
With data depicted in
In the preceding sections of this document various techniques have been disclosed for identifying digital identities that are likely to belong to the same human user. When such digital identity pairs are determined, the advertising to both digital identities can be synergistically improved by combining digital profile information collected on both digital identities. However, the process does not have to end with simple pairings of digital identities. The technique can be extended to combine multiple digital identities together thereby further improving the accuracy of advertising targeting. The technique of combining together the digital profiles collected from multiple digital identities may be referred to as ‘digital identity chaining’.
Referring back to
Digital identity chaining can synergistically improve the targeting of internet advertisements. For example, the accumulated information collected from laptop computer 251, cellular telephone 252, and video game console 259 may be used to create a very detailed digital profile of user X. This detailed digital profile of user X may then he used whenever an internet advertisement must be selected for laptop computer 251, cellular telephone 252, or video game console 259.
Note that in the above example, the cellular telephone 252 and video game console 259 were linked together using a specific application program installed onto cellular telephone 252. However, the two devices could have been pair together using the association score systems described in earlier sections. Thus, digital identity chaining may use many different methods of linking together different digital devices.
The preceding technical disclosure is intended to he illustrative, and not restrictive. For examples, the above-described embodiments (or one or more aspects thereof) may be used in combination with each other. Other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the claims should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein.” Also, in the following claims, the terms “including” and “comprising” are open-ended, that is, a system, device, article, or process that includes elements in addition to those listed after such a term in a claim is still deemed to fall within the scope of that claim. Moreover, in the following claims, the terms “first,” “second,” and “third,” etc. are used merely as labels, and are not intended to impose numerical requirements on their objects.
The Abstract is provided to comply with 37 C.F.R. §1.72(b), which requires that it allow the reader to quickly ascertain the nature of the technical disclosure. The abstract is submitted with the understanding that it will not he used to interpret or limit the scope or meaning of the claims. Also, in the above Detailed Description, various features may be grouped together to streamline the disclosure. This should not be interpreted as intending that an unclaimed disclosed feature is essential to any claim. Rather, inventive subject matter may lie in less than all features of a particular disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment.
The present Nonprovisional U.S. Patent Application claims the benefit of the previous U.S. Provisional Patent Application entitled “System and Method for Determining Related Digital Identities” filed with the U.S. Patent Office on May 10,2012 and having Ser. No. 61/645,549.
Number | Date | Country | |
---|---|---|---|
61645549 | May 2012 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13891764 | May 2013 | US |
Child | 14992893 | US |