This is related to U.S. Non-Provisional patent application Ser No. 10/717,441, filed Nov. 18, 2003, naming Banister et al. as inventors, which claims domestic priority under 35 U.S.C. 119 from prior U.S. Provisional Patent application No. 60/428,134, filed Nov. 20, 2002, naming Banister et al. as inventors, and 60/482,883, filed Jun. 25, 2003 naming Banister et al. as inventors, the entire contents of which are hereby incorporated by reference for all purposes as if fully act forth herein.
This application is related to U.S. Provisional patent application No. 60/545,609, filed Feb. 17, 2004, entitled “C
This application is related to U.S. Provisional patent application No. 60/574,530, filed May 25, 2004, entitled “C
This application is related to U.S. patent application No. 10/856,693, filed May 28, 2004, entitled “E
The present invention generally relates to electronic message delivery in a networked system. The invention relates more specifically to techniques for determining the reputation of a message sender.
The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
The use of electronic message communication systems has increased significantly in the recent past. However, numerous users of such systems, whether they are message senders or receivers, find such systems inconvenient and cumbersome to use. Similar problems are associated with telephone, facsimile, and e-mail communications, and others.
In the e-mail context, in one past approach, senders marketing commercial products or services would acquire or develop lists of e-mail addresses and then periodically send mass unsolicited e-mail messages (“spam”) to all addresses in the lists. Using modern electronic systems, the cost of sending millions of such messages has been negligible, and a response rate of even less than one percent has been considered worthwhile. Thus, successful delivery of unsolicited messages to valid in-boxes of recipients normally translates into income for the sender.
Unfortunately, this approach causes receivers to receive unwanted messages. The direct and indirect costs of receiving “spam” are high. In response, receivers have adopted a variety of approaches to prevent receipt or viewing of unwanted messages.
In one approach, receivers use filtering, marking, or blocking technologies that attempt to classify messages as “spam” or not spam by examining various aspects of the message. For example, some filters look for keywords in the message subject line and reject or quarantine messages that contain keywords matching a list of prohibited words. In another approach, receivers use “blacklists” to identify and prohibit or less easily admit messages from suspect senders of unsolicited messages. Some receivers augment these technologies with personal “white lists” of friends or other acceptable senders; messages from senders in the white list are admitted or more easily admitted. The white lists and blacklists also may come from networked sources. Techniques for performing blacklist lookups are described at the “ip4r” HTML document that is available online at the time of this writing at the “support” subdirectory of the “junkmail” directory of the “declude” commercial domain of the World Wide Web, and at the “bill” section of the “scconsult” commercial domain of the World Wide Web. Example blacklists include the series of blacklists provided by the “njabl” organization domain of the World Wide Web. Example white lists could include lists of Fortune 500 companies and other reputable senders.
One problem with these approaches is that some messages that receivers want may not reach the intended receivers because they are identified as “spam” by the filtering or blocking technologies. Receivers who use filtering or blocking technologies regularly fail to receive some legitimate messages because the filtering and blocking technologies cannot always properly distinguish legitimate messages from unsolicited messages. For example, certain industry-standard terms or technical abbreviations may be identical to prohibited keywords, confusing the “spam” filter.
Further, receivers continue to receive large volumes of unwanted messages that are not properly trapped by the “spam” filter. As a result, many receivers now refuse to disclose their address except under limited circumstances. In response, many legitimate senders, such as reputable commercial enterprises, have developed “opt-in” procedures in which the addresses of receivers, such as customers, are not used at all unless the receiver affirmatively agrees to receive messages. Even when this is done, the filtering or blocking technologies may delete or quarantine even those messages from legitimate senders that are directed to receivers who have “opted in.” Consequently, the value of e-mail as a marketing tool for responsible communications directed to receivers who have “opted in” is decreasing. Many receivers remain essentially defenseless to the daily onslaught of “spam” arriving in their e-mail in-boxes. Whereas many states have enacted legislation that imposes civil or criminal penalties for sending “spam,” these remedies are time-consuming for receivers to pursue. In addition, while many Internet Service Providers (“ISPs”) actively identify and refuse to communicate or do business with those who send “spam,” however, policing such improper activity imposes a significant cost on the ISP. In addition, ISPs are burdened with the aggregated network and disk usage costs associated with the sending and receiving the unwanted messages. End users may also be burdened with bandwidth costs associated with downloading these messages.
ISPs also incur costs associated with processing messages directed to recipients who do not hold an account with the ISP. For these recipients, the ISPs mail system typically generates an automatic “bounce” message that states that the recipient is unknown. Indeed, a “double bounce” may occur when a message bears an invalid sender address, and is sent to an invalid recipient. Costs are associated with maintaining the equipment, network bandwidth, and software that generates the bounce messages and for dispatching the bounce messages back into the network to the sender. Thus, there is a need for a system or method that can reduce the number of “bounce” and “double bounce” events experienced by ISPs and derived from unwanted messages.
Thus, the problem of “spam” in the Internet e-mail context is essentially a war of attrition. There are legitimate marketing organizations that send promotional messages by bulk e-mail, and other senders who send valid bulk messages. In general, however, no one benefits from the activities of “spammers,” other than the “spammers” themselves. ISPs, business enterprises, and end users all suffer inconvenience, costs, and annoyances.
Even when ISPs and enterprises use anti-“spam” technologies, large numbers of “spam” messages may not be identified as spam, and many non-spam messages may be misclassified as spam. This costs e-mail marketers, and causes senders to lose confidence in the benefits of e-mail marketing. Moreover, end users are required to invest time in monitoring, checking, delivering, and negotiating blacklists, white lists, and similar mechanisms. The information from these lists can be conflicting, and therefore making a decision for a particular email sender based on the information in these lists can be difficult.
While the foregoing example problems exist in the context of e-mail, instant messaging, chat-room applications, Web message boards, telephone, and facsimile communications suffer from analogous problems.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
Techniques for determining the reputation of a message sender are described. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
Embodiments are described herein according to the following outline:
The needs identified in the foregoing Background, and other needs and objects that will become apparent for the following description, are achieved in the present invention, which comprises, in one aspect, a method for determining the reputation of a message sender. In other aspects, the invention encompasses a computer apparatus and a computer readable medium configured for determining the reputation of a message sender.
Generally, herein are provided techniques by which message receivers can determine the reputation of a message sender by obtaining two or more lists from two or more list providers; determining which lists of the two or more lists indicate the message sender; and determining the reputation score for the message sender based on which lists of the two or more lists indicate the message sender.
In a related feature, the techniques further include the step of storing information from the two or more lists in an aggregate list data structure, and where the step of determining what lists indicate the message sender includes the step of querying the aggregate list data structure. In a related feature, a particular list is one of the two or more lists and the particular list contains one or more entries, and where the step of storing information from the two or more lists in the aggregate list data structure includes the steps of determining the difference of the particular list with a previous version of the particular list; storing entries of the particular list that were not in the previous version of the particular list in the aggregate list data structure; and removing from the aggregate list data structure entries that are not in the particular list but were in the previous version of the particular list.
In a related feature, the step of determining the reputation score includes the steps of determining an individual score for each list of the two or more lists; and determining an output score based on the individual score for each list in the two or more lists. In a related feature, the step of determining the output score includes the steps of determining an aggregate score based on the individual score for each list of the two or more lists; determining a normalized score based on the aggregate score; and determining the output score based on the normalized score.
In a related feature, the individual score for each list in the two or more lists each includes an individual probability and a list of probabilities includes the individual probability for each list in the two or more lists, and where the step of determining the aggregate score based on the individual score for each list of the two or more lists includes performing a Chi Squared calculation on the list of probabilities. In a related feature, the techniques further include the step of receiving a request for the reputation of the message sender. In a related feature, the step of receiving the request for the reputation of the message sender includes receiving a request formatted as a DNS request. In a related feature, the message sender is associated with a particular IP address and the step of determining what lists of the two or more lists indicate the message sender includes determining for a particular list of the two or more lists whether the particular IP address of the message sender is contained in an IP address range indicated by the particular list. In a related feature, the techniques further include, if a particular list indicates an IP address range, setting a bit corresponding to the particular list in a particular list bit mask data structure corresponding to the IP address range.
In a related feature, the step of setting the bit corresponding to the particular list is performed for each list of the two or more lists, and where sender corresponds to a particular IP address, the particular IP address is contained within a first IP address range that has associated with it a first list bit mask, the IP address is contained within a second IP address range associated with a second list bit mask, and the method further includes the step of determining which lists of the two or more lists indicate the message sender by performing the steps of performing an or operation on the first list bit mask and second list bit mask to produce a third list bit mask; and determining what bits are set in the third list bit mask.
In another aspect techniques are provided for receiving a message from a message sender; obtaining a reputation score of the message sender, where the reputation score of the message sender was determined by performing the steps of obtaining two or more lists from two or more list providers; determining which lists of the two or more lists indicate the message sender; determining the reputation score for the message sender based on which lists of the two or more lists indicate the message sender; and if the reputation score is worse than a first predefined threshold, indicating that the message is unsolicited.
In a related feature, the techniques further include the step of, if the reputation score is better than a second predefined threshold, indicating that the message is valid, where the first predefined threshold is different from the second predefined threshold. In a related feature, the techniques further include the step of if the reputation score is better than the first predefined threshold and worse than the second predefined threshold, indicating that the message is not estimated as either valid or invalid. In a related feature, the techniques further include the step of sending a request for the reputation score of the message sender, and where the step of obtaining the reputation score of the message sender includes receiving a response to the request for the reputation score of the message sender. In a related feature, the step of sending the request for the reputation score of the message sender includes sending a particular request formatted as a DNS request.
In other aspects, the invention encompasses a computer apparatus and a computer-readable medium configured to carry out the foregoing steps.
2.0 Structural Overview
2.1 Example System Organization
A list aggregator unit 110 is communicatively coupled to two or more list providers 150. In the example shown, the list aggregator unit 110 is communicatively coupled to three list providers 150a, 105b, 150c. The list aggregator unit 110 is also communicatively coupled to a reputation provider unit 120. The reputation provider unit 120 is communicatively coupled to a network 130. A reputation requester 140 is also communicatively coupled to the network 130. In various embodiments, the network 130 is a wireless network, dial up access, the Internet, a LAN, a WAN, or any other communication network.
The list aggregator unit 110 and reputation provider unit 120 are each logical machines. Logical machines may comprise one or more computer programs or other software elements. Each logical machine may run on separate physical computing machines or may run on the same physical computing machine as one or more of the other logical machines. Various embodiments of computers and other physical computing machines are described in detail below in the section entitled Hardware Overview.
The reputation requester 140 can be any appropriate machine, user, or process capable of communicating a request over a network. For example, in one embodiment, a reputation requester 140 is a mail server running on a computer that has a network interface, and the mail server is capable of formulating a request for the reputation of an electronic message sender. In other embodiments, the reputation requestor 140 could be any mechanism requesting reputation information for a mail sender including an access server, gateway, firewall, mail transfer agent, mail client, mail filtering mechanism, etc.
The list providers 150a, 105b, 150c are any appropriate mechanism for providing lists 160a, 160b, 170 related to reputations of mail senders. For example, in one embodiment, the list providers 150a, 105b, 150c are modified domain name servers (DNSs) running on computers with network interfaces that are capable of providing lists 160a, 160b, 170 related to reputations of mail senders. In other embodiments, each of the list providers 150a, 105b, 150c is a FTP server, HTTP server, or any other appropriate mechanism capable of providing lists 160a, 160b, 170 related to reputations of mail senders.
2.1 Sample Data Structure
The aggregate list data structure 200 is an example of a data structure that can be used to efficiently store and provide information related to multiple mail senders. The techniques described herein are in no way limited to the use of this particular data structure. Any appropriate data structure or data set stored in a machine-readable medium could be used to store reputation information from multiple lists.
The aggregate list data structure 200 comprises a bit length hash table 210, an IP (Internet Protocol) address range hash table 220 as the value for each key in the bit length hash table 210, and a list bit mask 230 as the value for each key in the IP address range hash table 220. Although the example of
The use of the aggregate list data structure 200 is described in more detail below. However, a brief description is instructive as to its structure. The aggregate list data structure 200, as the name suggests, provides a single data structure in which reputation data from multiple reputation lists can be stored. In various embodiments, a reputation list can contain a positive or negative association with a single IP address or a range of IP addresses. In other embodiments, reputations are associated with something other than IP address, such as domain name, email address, geography, or any other appropriate value. For simplicity in explanation, in the examples given herein, reputations will be described as being associated with IP addresses and ranges.
A reputation list 160a, 160b, 170 from a reputation list provider 150a, 150b, 150c could take on any appropriate form such as a blacklist of IP addresses and ranges that indicate IP addresses from which electronic messages have a high likelihood of being unsolicited electronic messages, white lists of IP addresses that indicate IP addresses and ranges from which there is a low likelihood of an unsolicited electronic messages being sent, or any other appropriate types of lists.
The keys 212A, 212B, 212C, 212D in the bit length hash table 210 represent the length of defined significant digits of an IP address range associated with a reputation. Typically, IP addresses are 32 bits long, so the range of possible entries for a 32 bit IP address would be from /0 (no significant bits are defined) to /32 (all the bits are defined). For example, “/8” refers to a range where only the first eight bits are defined and is associated with key 212D. An example /8 entry could be “152.*.*.*” (where “*” represents a wildcard and signifies that the corresponding bits are not defined). IP addresses “152.2.128.152” and “152.123.234.4” would fall into the /8 range of “152.*.*.*”. The IP address “153.2.128.152” would not fall into the /8 range of “152.*.*.*”. In one embodiment, a key 212A, 212B, 212C, 212D is only added to the bit length hash table 210 if a range of IP addresses corresponding to that length is received in one of the reputation-related lists.
There is one IP address range hash table 220 for each key 212A-D in the bit length hash table 210. Each IP address range hash table 220 has a key 222A-N for each IP address range of the particular range length that is received from a list provider. For example, if two “/8” IP address ranges “152.*.*.*” and “159.*.*.*” were received from one or more list providers as part of one or more reputation lists, then two keys would be added to the IP address range hash table for /8: one corresponding to each of “152.*.*.*” and “159.*.*.*”.
There is a list bit mask 230 corresponding to each entry 222A-222N in the IP address range hash table 220. The list bit mask 230 records which black or white lists include the IP address or range value of the entry 222A-222N that reference the list bit mask 230. In one embodiment, each list provider 105a-150c a corresponding bit 232A-232N in the list bit mask 230. In another embodiment, two or more list providers 105a-150c correspond to a single bit 232A-232N. In yet another embodiment, one list provider 150a corresponds to one or more bits 232A-232N. For simplicity in explanation, in the examples herein each list provider 150a-150c corresponds to a single bit 232A-232N. In one embodiment, if a list indicates or includes a particular IP address range of an entry 222A-222N, then a bit corresponding to that list is set to “1”.
As an example, in the context of
3.0 Functional Overview
3.1 Maintaining Aggregate Lists
In various embodiments, one or more reputation lists are provided by reputation list providers. In one embodiment, system initialization includes determining at what interval updates to the lists will be obtained or determining what will trigger obtaining updates to the lists. In a related embodiment, determining when to obtain updates to the lists is based on how often a list is updated. For example, a blacklist of IP addresses could be known to be updated every few seconds, minutes, hours, days, weeks, etc., and obtaining updates to the list could be based on that known updating frequency. In various embodiments, the updating of the list is signaled by the list provider, is detectable by the list aggregator unit, or is otherwise signaled or detectable.
The steps of
In step 310, a particular list is obtained from a list provider. The particular list can be obtained in any number of ways. In various embodiments, the particular list is obtained using a DNS zone transfer; database export and later import; obtaining a file containing the list by file transfer protocol (FTP), hypertext transfer protocol (HTTP), secure HTTP (HTTPS), or the rsync protocol; or any other appropriate means. In various related embodiments, the step 310 of obtaining a list is initiated by a signal from the list processor or by the detection of the change in the list. In various embodiments, the step 310 of obtaining a list is initiated after a predefined period of time. In a related embodiment, the predefined period of time to wait before obtaining the list is based on a predetermined schedule of updates to the list.
A particular list obtained from a list provider can take any appropriate form. An example of an appropriate form could be a list of IP address ranges and IP addresses. For example, in the context of
In step 320, the difference between the current version of the particular list and any previous version of the particular list is determined. In one embodiment, if there is no previous version of the particular list then the difference between the particular list obtained in step 310 and “the previous list” is defined as the full list obtained in step 310. In various embodiments, if there is a previous version of the particular list, the difference between the version of the particular list obtained in step 310 and the previous version of the particular list is determined by using any appropriate tool, such as the Unix “diff” command, for example.
As noted above, there are numerous possible embodiments for the aggregate list and, therefore, there are numerous possible embodiments for steps 330 and 340. Steps 330 and 340, for sake of clarity of description, will be described in terms of data structures similar to the aggregate list data structure 200 of
In step 330, the new entries are added to the aggregate list data structure. An example method for adding entries to an aggregate list data structure is depicted in and described herein with respect to
When an entry is deleted from a particular list, its corresponding entries must be deleted from the aggregate list data structure as part of step 340. Deleting an entry from an aggregate list data structure can be accomplished by finding the IP address range hash table associated with the appropriate length entry in the bit length hash table; finding the list bit mask associated with the appropriate entry in the IP address range hash table; and setting the bit in the list bit mask corresponding to the particular list to “0”. For example, in the context of
Various embodiments of the techniques described in
In one embodiment, the process of determining the reputation of a message sender is initiated by receiving a request for the reputation of an electronic message sender. In various embodiments, the request is received in extensible markup language (XML), hypertext markup language (HTML), formatted as a DNS request, or in any appropriate format. In various embodiments, the request is received via HTTP, HTTPS, TCP (transaction control protocol)/IP sockets, Universal Datagram Protocol (UDP) or via any other appropriate means. For example, a request for the reputation of an email sender could come in the form of a DNS request using TCP/IP or UDP.
As noted above, in one embodiment and in the examples used herein senders are identified by IP address. However, in other embodiments any other sender identification values may be used.
In step 410, two or more lists are obtained from two or more list providers. In various embodiments, these lists are obtained using DNS zone transfers; database exports and later imports; obtaining files containing the lists via file transfer protocol (FTP), hypertext transfer protocol (HTTP), secure HTTP (HTTPS), or the rsync protocol; or any other appropriate means. For example, in the context of
In step 420, the lists that contain the sender are determined. In various embodiments, step 420 comprises parsing each list from each sender or querying an aggregate list, and aggregate list data structure, or other appropriate mechanism. For example, in the context of
In order to determine whether an IP address is contained in a range represented in the IP address range hash table 220, the first X significant bits of the IP address are compared to the first X significant bits of the IP address ranges in entries of the table, where X is the number of bits defined by the corresponding key 212A-212D of the bit length hash table 210. In one embodiment, determining whether there is a corresponding entry 222A-222N in the IP address range hash table 220 comprises determining whether a key 222A-222N exists in the IP address range hash table 220 for the first X bits of the IP address.
In one embodiment, in order to determine which lists contain the IP address, the steps above are performed for each individual list separately or all lists are checked at once. In a related embodiment, there are two or more list bit masks 230 corresponding to matching entries 222A-222N in two or more IP address range hash table 220 corresponding to two or more entries in the bit length hash table 210. Further, determining which lists contain the IP address comprises performing the “or” operation on the two or more bit masks to result in creating a result bit mask. The result bit mask will have “1”s in any place that any individual list bit mask 230 has a “1” and will have a “0” only at those bits where no list bit mask 230 has a “1”. In other embodiments, other logical or mathematical functions could be used to combine the list bit masks 230, such as addition, weighted addition, bitwise averaging, bitwise exclusive or, or any other appropriate function. In one embodiment, an aggregate list bit mask is used to store which lists indicate the IP address of the sender.
In step 430, a reputation score is determined based on which lists contain the sender. In various embodiments, the reputation score is determined as a weighted sum of the aggregate list bit mask or as a polynomial of the aggregate list bit mask. In one embodiment, determining the reputation score is based on which lists contain the IP address of the sender. Such an embodiment is depicted in and described with respect to
Various embodiments of
3.2 Adding Entries to an Aggregate Data Structure
In step 510, the next item in the list of items to be added is obtained. In one embodiment, the list of items to be added is associated with a particular list and the particular list is associated with a particular bit in each list bit mask. In one embodiment, if there are no more items in the list, then no more steps are taken. In various embodiments, obtaining the next item in the list comprises obtaining the next item from a structured list, obtaining the next item from a linked list, querying a data structure containing one or more items, or any appropriate means.
In step 520 a check is made to determine whether a corresponding entry exists in the bit length hash table. In various embodiments, this comprises determining the length of the item obtained in step 510. For example, in the context of
If a corresponding entry does not exist, then an appropriate entry is added in step 530. In various embodiments, adding an appropriate entry comprises adding an appropriate key to a bit length hash table or any appropriate action.
After an appropriate entry is added in step 530 or if an entry already exists for that range (step 520), then a check is performed to determine whether the IP address range for the new entry already exists in the IP address range hash table. For example, in the context of
If there is no corresponding entry 222A-222N in the IP address range hash table 220, then in step 550 an entry is added to the appropriate data structure corresponding to the item obtained in step 510. In one embodiment, adding an entry comprises setting all the bits in the corresponding list bit mask 230 to zeros. For example, in the context of
If an entry has been added or there is already a corresponding entry in the IP address range hash table, then in step 560, the list bit mask corresponding to the IP address range hash table entry for the added item is altered to indicate the particular list. For example, in the context of
Various embodiments of
3.3 Example Reputation Score Calculations
In step 610, a score is obtained corresponding to each list. In one embodiment, this score is obtained by determining, for each blacklist 160A, 160B, whether the sender's IP address is in the particular list. If the IP address is indicated in the particular list, then the score for the list represents a certain percentage likelihood that the message is an unsolicited electronic message (often higher than 50%). If the IP address is not indicated in the particular list, then the score for the list still represents a certain percentage likelihood that the message is an unsolicited message (often less than 50%).
In one embodiment, this score is obtained by determining, for each “white” list, whether the sender's IP address is in the particular list. A white list is a list of IP addresses and ranges that are believed to be associated with senders of legitimate electronic messages. If the IP address is indicated in the particular list, then the score for the list represents a certain percentage likelihood that the message is unsolicited (often less than 50%). If the IP address is not indicated in the particular list, then the score for the list represents a certain percentage likelihood that the message is unsolicited (often higher than 50%).
In other embodiments, a white list or blacklist will contain ranges of IP addresses and exceptions to those IP addresses, thereby including all IP addresses in a range except those that are excluded. In various embodiments, the white lists and blacklists contain integer or floating point values indicating scores for IP address ranges and IP addresses, and these scores are used to determine an aggregate score for an IP address with respect to the lists. In one embodiment, the aggregate list data structure 200 of
In step 620, an aggregate score is generated based on the scores for each list determined in step 610. In one embodiment, the score for each list is a percentage likelihood that a message is unsolicited and the aggregate score is an aggregate percentage likelihood that is generated based on the individual percentages likelihoods. In various embodiments, this aggregate percentage likelihood is based on a weighted average of the individual percentages likelihoods, a sum or product of the individual percentages likelihoods, a polynomial of the individual percentages likelihoods, or any appropriate calculation. In various embodiments, the aggregate percentage is based in part on the Chi Squared function over the probabilities, a Robinson calculation, a Bayes calculation, or any other appropriate mechanism. A particular embodiment of the Chi Squared function is depicted in the Python Programming Language (see the “python” commercial domain of the World Wide Web) code of Appendix A.
In step 630, the aggregate score is mapped to a normalized score. In one embodiment, the aggregate score is an aggregate percentage, and the normalized score is a mapped percentage that has the range from 0% to 100%, and step 630 is performed by mapping the aggregate percentage to the normalized range from 0% to 100%. In various embodiments, this mapping is linear, piecewise linear, cubic, polynomial, or uses any other appropriate function. In one embodiment, a piecewise linear method of mapping the aggregate function is used and comprises determining the known lowest possible probability (LP), the known average probability (AP), the known highest possible probability (HP), and linearly mapping percentages from LP to AP to 0% to 50% and percentages from AP to HP to 50% to 100%. In equation form, with aggregate probability represented as P, this can be represented as:
For example, if LP is 30%, AP is 40% and HP is 80%, then percentages from 30% to 40% would map to 0% to 50%; and percentages from 40% to 80% would map to 50% to 100%. In such an example, 35% would map to 25% and 60% would map to 75%.
In related embodiments, LP is determined by performing the calculations of step 620 using the lowest possible score (e.g. percentage) for each of the lists, and HP is determined by performing the calculations of step 620 using the highest possible score (e.g. percentage) for each of the lists, and AP is determined by performing the calculations of step 620 using a random sample of possible values and averaging the result.
In step 640, the normalized score is mapped to an output score. In one embodiment, a mapped percentage is mapped to an output (mapped) score. In various embodiments, this mapping is linear, piecewise liner, cubic, piecewise cubic, polynomial, or piecewise polynomial, exponential, piecewise exponential, or any appropriate mapping. In one embodiment, this mapping is performed by using a piecewise function such as:
where lo_k and hi_k are constants. It may be beneficial to use hi_k and lo_k values approximately in the range of 0.5 and 2.0. It may be beneficial to use hi_k and lo_k values approximately in the range of 0.6 and 1.0. Hi_k and lo_k may each have the same value or may have different values.
Various embodiments depicted in
3.4 Example Process for Estimating Whether a Message is Unsolicited
When a message arrives at a mail transfer agent or other system, it has a sender associated with it a. The sender can be defined by any appropriate identification mechanism. In various other embodiments, the sender is identified by IP address, domain name, email address, geographical location, or any other appropriate mechanism. In the examples used to described
In step 710, the reputation score of the message sender is obtained. In one embodiment, the process of
In step 720, the reputation score is compared to a first predefined threshold to determine whether it is worse than the predefined threshold. If the reputation score is worse than the predefined threshold, then the message is indicated as unsolicited in step 730. In various embodiments, if the message is indicated as unsolicited, the message is deleted, put in a trash folder, put in a “bulk mail folder”, flagged to indicate that it is estimated as unsolicited, or any other appropriate action. After step 730 is performed, the process completes.
If the reputation score is not worse than a certain predefined threshold (step 720), then a check is made to determine whether the reputation score is better than a second predefined threshold in step 740. If the reputation score is better than a certain predefined threshold, then in step 750, it is indicated that the message is estimated as valid. In various embodiments, indicating that the message is estimated as valid comprises sending the message to the recipient's inbox without further filtering, sending the message to the recipient's inbox after limited filtering, allowing the message to bypass to regular filtering, flagging the message as valid, or any appropriate action. After step 750 is performed, the process completes.
If the reputation score for the sender is not better than a second predefined threshold (step 740), then in step 760 it is indicated that the message is not estimated as either valid or invalid. In various embodiments, indicating that the message is not estimated as either valid or invalid comprises applying filters to the message, forwarding the message to the recipient, not flagging the message as either valid or invalid, or any appropriate action.
Various embodiments of
4.0 Implementation Mechanisms—Hardware Overview
Computer system 800 may be coupled via bus 802 to a display 812, such as a cathode ray tube (“CRT”), for displaying information to a computer user. An input device 814, including alphanumeric and other keys, is coupled to bus 802 for communicating information and command selections to processor 804. Another type of user input device is cursor control 816, such as a mouse, trackball, stylus, or cursor direction keys for communicating direction information and command selections to processor 804 and for controlling cursor movement on display 812. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
The invention is related to the use of computer system 800 for electronic message delivery approaches. According to one embodiment of the invention, electronic message delivery approaches are provided by computer system 800 in response to processor 804 executing one or more sequences of one or more instructions contained in main memory 806. Such instructions may be read into main memory 806 from another computer-readable medium, such as storage device 810. Execution of the sequences of instructions contained in main memory 806 causes processor 804 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
The term “computer-readable medium” as used herein refers to any medium that participates in providing instructions to processor 804 for execution. Such a medium may take many forms, including but not limited to, and non-volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 810. Volatile media includes dynamic memory, such as main memory 806.
Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 804 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 800 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. An infrared detector can receive the data carried in the infrared signal and appropriate circuitry can place the data on bus 802. Bus 802 carries the data to main memory 806, from which processor 804 retrieves and executes the instructions. The instructions received by main memory 806 may optionally be stored on storage device 810 either before or after execution by processor 804.
Computer system 800 also includes a communication interface 818 coupled to bus 802. Communication interface 818 provides a two-way data communication coupling to a network link 820 that is connected to a local network 822. For example, communication interface 818 may be an integrated services digital network (“ISDN”) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 818 may be a local area network (“LAN”) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 818 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
Network link 820 typically provides data communication through one or more networks to other data devices. For example, network link 820 may provide a connection through local network 822 to a host computer 824 or to data equipment operated by an Internet Service Provider (“ISP”) 826. ISP 826 in turn provides data communication services through the worldwide packet data communication network now commonly referred to as the “Internet” 828. Local network 822 and Internet 828 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 820 and through communication interface 818, which carry the digital data to and from computer system 800, are exemplary forms of carrier waves transporting the information.
Computer system 800 can send messages and receive data, including program code, through the network(s), network link 820 and communication interface 818. In the Internet example, a server 830 might transmit a requested code for an application program through Internet 828, ISP 826, local network 822 and communication interface 818. In accordance with the invention, one such downloaded application provides for electronic message delivery approaches as described herein.
The received code may be executed by processor 804 as it is received, and/or stored in storage device 810, or other non-volatile storage for later execution. In this manner, computer system 800 may obtain application code in the form of a carrier wave.
5.0 Extensions and Alternatives
In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
A.1 Function for Summing Terms for Chi Squared:
A.2 Function for Calculating Chi Squared Value for a List of Probabilities
Number | Name | Date | Kind |
---|---|---|---|
4956769 | Smith | Sep 1990 | A |
5319776 | Hile et al. | Jun 1994 | A |
5623600 | Ji et al. | Apr 1997 | A |
5802178 | Holden et al. | Sep 1998 | A |
5805810 | Maxwell | Sep 1998 | A |
5832208 | Chen et al. | Nov 1998 | A |
5889943 | Ji et al. | Mar 1999 | A |
5915087 | Hammond et al. | Jun 1999 | A |
5933416 | Schenkel et al. | Aug 1999 | A |
5958005 | Thorne et al. | Sep 1999 | A |
5966685 | Flanagan et al. | Oct 1999 | A |
5968176 | Nessett et al. | Oct 1999 | A |
5970149 | Johnson | Oct 1999 | A |
5983270 | Abraham et al. | Nov 1999 | A |
5983350 | Minear et al. | Nov 1999 | A |
5999967 | Sundsted | Dec 1999 | A |
6003084 | Green et al. | Dec 1999 | A |
6006329 | Chi | Dec 1999 | A |
6052709 | Paul | Apr 2000 | A |
6072942 | Stockwell et al. | Jun 2000 | A |
6073142 | Geiger et al. | Jun 2000 | A |
6073165 | Narasimhan et al. | Jun 2000 | A |
6131110 | Bates et al. | Oct 2000 | A |
6161130 | Horvitz et al. | Dec 2000 | A |
6161185 | Guthrie et al. | Dec 2000 | A |
6192114 | Council | Feb 2001 | B1 |
6195587 | Hruska et al. | Feb 2001 | B1 |
6212558 | Antur et al. | Apr 2001 | B1 |
6226670 | Ueno et al. | May 2001 | B1 |
6233618 | Shannon | May 2001 | B1 |
6266664 | Russell-Falla et al. | Jul 2001 | B1 |
6266692 | Greenstein | Jul 2001 | B1 |
6289105 | Murota | Sep 2001 | B1 |
6330590 | Cotten | Dec 2001 | B1 |
6334193 | Buzsaki | Dec 2001 | B1 |
6341309 | Vaid et al. | Jan 2002 | B1 |
6393568 | Ranger et al. | May 2002 | B1 |
6408336 | Schneider et al. | Jun 2002 | B1 |
6415313 | Yamada et al. | Jul 2002 | B1 |
6421709 | McCormick et al. | Jul 2002 | B1 |
6434600 | Waite et al. | Aug 2002 | B2 |
6453327 | Nielson | Sep 2002 | B1 |
6460050 | Pace et al. | Oct 2002 | B1 |
6484261 | Wiegel | Nov 2002 | B1 |
6502131 | Vaid et al. | Dec 2002 | B1 |
6507866 | Barchi | Jan 2003 | B1 |
6539430 | Humes | Mar 2003 | B1 |
6546416 | Kirsch | Apr 2003 | B1 |
6587550 | Council et al. | Jul 2003 | B2 |
6591291 | Gabber et al. | Jul 2003 | B1 |
6609196 | Dickinson, III et al. | Aug 2003 | B1 |
6650890 | Irlam et al. | Nov 2003 | B1 |
6654787 | Aronson et al. | Nov 2003 | B1 |
6675162 | Russell-Falla et al. | Jan 2004 | B1 |
6701440 | Kim et al. | Mar 2004 | B1 |
6708205 | Sheldon et al. | Mar 2004 | B2 |
6732157 | Gordon et al. | May 2004 | B1 |
6748422 | Morin et al. | Jun 2004 | B2 |
6785732 | Bates et al. | Aug 2004 | B1 |
6886099 | Smithson et al. | Apr 2005 | B1 |
6894981 | Coile et al. | May 2005 | B1 |
6941348 | Petry et al. | Sep 2005 | B2 |
7072942 | Maller | Jul 2006 | B1 |
7149778 | Patel et al. | Dec 2006 | B1 |
7171450 | Wallace et al. | Jan 2007 | B2 |
7181498 | Zhu et al. | Feb 2007 | B2 |
7184971 | Ferber | Feb 2007 | B1 |
7206814 | Kirsch | Apr 2007 | B2 |
7219148 | Rounthwaite et al. | May 2007 | B2 |
7272853 | Goodman et al. | Sep 2007 | B2 |
7331061 | Ramsey et al. | Feb 2008 | B1 |
7342906 | Calhoun | Mar 2008 | B1 |
7366761 | Murray et al. | Apr 2008 | B2 |
7409708 | Goodman et al. | Aug 2008 | B2 |
7475118 | Leiba et al. | Jan 2009 | B2 |
7490128 | White et al. | Feb 2009 | B1 |
7523168 | Chadwick et al. | Apr 2009 | B2 |
7627670 | Haverkos | Dec 2009 | B2 |
20010005885 | Elgamal et al. | Jun 2001 | A1 |
20010039593 | Hariu | Nov 2001 | A1 |
20020004908 | Galea | Jan 2002 | A1 |
20020016824 | Leeds | Feb 2002 | A1 |
20020023135 | Shuster | Feb 2002 | A1 |
20020059385 | Lin | May 2002 | A1 |
20020073240 | Kokkinen et al. | Jun 2002 | A1 |
20020116463 | Hart | Aug 2002 | A1 |
20020120600 | Schiavone et al. | Aug 2002 | A1 |
20020133469 | Patton | Sep 2002 | A1 |
20020143888 | Lisiecki et al. | Oct 2002 | A1 |
20020184315 | Earnest | Dec 2002 | A1 |
20020184533 | Fox | Dec 2002 | A1 |
20020198950 | Leeds | Dec 2002 | A1 |
20020199095 | Bandini et al. | Dec 2002 | A1 |
20030023875 | Hursey et al. | Jan 2003 | A1 |
20030050988 | Kucherawy | Mar 2003 | A1 |
20030069935 | Hasegawa | Apr 2003 | A1 |
20030079142 | Margalit et al. | Apr 2003 | A1 |
20030093689 | Elzam et al. | May 2003 | A1 |
20030097591 | Pham et al. | May 2003 | A1 |
20030110224 | Cazier et al. | Jun 2003 | A1 |
20030115485 | Milliken | Jun 2003 | A1 |
20030149726 | Spear | Aug 2003 | A1 |
20030158905 | Petry et al. | Aug 2003 | A1 |
20030167402 | Stolfo et al. | Sep 2003 | A1 |
20030172050 | Decime et al. | Sep 2003 | A1 |
20030172291 | Judge et al. | Sep 2003 | A1 |
20030185391 | Qi et al. | Oct 2003 | A1 |
20030191969 | Katsikas | Oct 2003 | A1 |
20030208562 | Hauck et al. | Nov 2003 | A1 |
20030212791 | Pickup | Nov 2003 | A1 |
20030225850 | Teague | Dec 2003 | A1 |
20030233418 | Goldman | Dec 2003 | A1 |
20040003255 | Apvrille et al. | Jan 2004 | A1 |
20040006747 | Tyler | Jan 2004 | A1 |
20040019651 | Andaker | Jan 2004 | A1 |
20040024632 | Perry | Feb 2004 | A1 |
20040054742 | Gruper et al. | Mar 2004 | A1 |
20040058673 | Irlam et al. | Mar 2004 | A1 |
20040064371 | Crapo | Apr 2004 | A1 |
20040068542 | Lalonde et al. | Apr 2004 | A1 |
20040073617 | Milliken et al. | Apr 2004 | A1 |
20040083230 | Caughey | Apr 2004 | A1 |
20040083408 | Spiegel et al. | Apr 2004 | A1 |
20040093384 | Shipp | May 2004 | A1 |
20040111381 | Messer et al. | Jun 2004 | A1 |
20040117648 | Kissel | Jun 2004 | A1 |
20040139165 | McMillan et al. | Jul 2004 | A1 |
20040139314 | Cook et al. | Jul 2004 | A1 |
20040167964 | Rounthwaite et al. | Aug 2004 | A1 |
20040167968 | Wilson et al. | Aug 2004 | A1 |
20040177110 | Rounthwaite et al. | Sep 2004 | A1 |
20040177120 | Kirsch | Sep 2004 | A1 |
20040181581 | Kosco | Sep 2004 | A1 |
20040186891 | Panec et al. | Sep 2004 | A1 |
20040215977 | Goodman et al. | Oct 2004 | A1 |
20040250127 | Scoredos et al. | Dec 2004 | A1 |
20040254990 | Mittal | Dec 2004 | A1 |
20040260922 | Goodman et al. | Dec 2004 | A1 |
20050005107 | Touboul | Jan 2005 | A1 |
20050060643 | Glass et al. | Mar 2005 | A1 |
20050064850 | Irlam et al. | Mar 2005 | A1 |
20050071432 | Royston et al. | Mar 2005 | A1 |
20050071485 | Ramagopal | Mar 2005 | A1 |
20050080855 | Murray | Apr 2005 | A1 |
20050080856 | Kirsch | Apr 2005 | A1 |
20050080857 | Kirsch et al. | Apr 2005 | A1 |
20050091319 | Kirsch | Apr 2005 | A1 |
20050108518 | Pandya | May 2005 | A1 |
20050182959 | Petry et al. | Aug 2005 | A1 |
20050193076 | Flury et al. | Sep 2005 | A1 |
20050193429 | Demopoulos et al. | Sep 2005 | A1 |
20050198518 | Kogan et al. | Sep 2005 | A1 |
20050203994 | Palmer et al. | Sep 2005 | A1 |
20050204005 | Purcell et al. | Sep 2005 | A1 |
20050246440 | Yu | Nov 2005 | A1 |
20050283837 | Olivier et al. | Dec 2005 | A1 |
20060031306 | Haverkos | Feb 2006 | A1 |
20060095410 | Ostrover et al. | May 2006 | A1 |
20060161988 | Costea et al. | Jul 2006 | A1 |
20080104186 | Wieneke et al. | May 2008 | A1 |
20080104187 | Wilson et al. | May 2008 | A1 |
20080256072 | Logan et al. | Oct 2008 | A1 |
20080270540 | Larsen | Oct 2008 | A1 |
20090019126 | Adkins | Jan 2009 | A1 |
Number | Date | Country |
---|---|---|
1509014 | Feb 2005 | EP |
WO 0167330 | Sep 2001 | WO |
WO 0219069 | Mar 2002 | WO |
WO 0225464 | Mar 2002 | WO |
WO 0239356 | May 2002 | WO |
WO 0219069 | Jul 2002 | WO |
WO 2005081477 | Sep 2005 | WO |
Number | Date | Country | |
---|---|---|---|
20060031314 A1 | Feb 2006 | US |