The advent of global communications networks such as the Internet has presented commercial opportunities for reaching vast numbers of potential customers. Electronic messaging (“email”) is becoming increasingly pervasive as a means for disseminating unwanted advertisements and promotions (also denoted as “spam”) to network users. The Radicati Group, Inc., a consulting and market research firm, estimates that as of August 2002, two billion junk email messages are sent each day—this number is expected to triple every two years. Individuals and entities (e.g., businesses, government agencies) are becoming increasingly inconvenienced and oftentimes offended by junk messages. As such, junk email is now or soon will become a major threat to trustworthy computing.
A key technique utilized to thwart junk email is employment of filtering systems/methodologies. One proven filtering technique is based upon a machine learning approach—machine learning filters assign to an incoming message a probability that the message is junk. In this approach, features typically are extracted from two classes of example messages (e.g., junk and non-junk messages), and a learning filter is applied to discriminate probabilistically between the two classes. Since many message features are related to content (e.g., words and phrases in the subject and/or body of the message), such types of filters are commonly referred to as “content-based filters.”
Some junk/spam filters are adaptive, which is important in that multilingual users and users who speak rare languages need a filter that can adapt to their specific needs. Furthermore, not all users agree on what is and is not, junk/spam. Accordingly, by employing a filter that can be trained implicitly (e.g., via observing user behavior) the respective filter can be tailored dynamically to meet a user's particular message identification needs.
One approach for filtering adaptation is to request a user(s) to label messages as junk and non-junk. Unfortunately, such manually intensive training techniques are undesirable to many users due to the complexity associated with such training let alone the amount of time required to properly effect such training. In addition, such manual training techniques are often flawed by individual users. For example, subscriptions to free mailing lists are often forgotten about by users and thus, can be incorrectly labeled as junk mail by a default filter. Since most users may not check the contents of a junk folder, legitimate mail is blocked indefinitely from the user's mailbox. Another adaptive filter training approach is to employ implicit training cues. For example, if the user(s) replies to or forwards a message, the approach assumes the message to be non-junk. However, using only message cues of this sort introduces statistical biases into the training process, resulting in filters of lower respective accuracy.
Despite various training techniques, spam or junk filters are far from perfect and, quite often, misclassify electronic messages. Unfortunately, this can result in a few junk messages appearing in the inbox and a few good messages lost in a junk folder. Users may mistakenly open spam messages delivered to their inbox and as a result expose them to lewd or obnoxious content. In addition, they may unknowingly “release” their email address to the spammers via web beacons. Improvements in spam filtering are highly desirable in order to facilitate in reducing or even eliminating these unwanted emails.
The following presents a simplified summary of the subject matter in order to provide a basic understanding of some aspects of subject matter embodiments. This summary is not an extensive overview of the subject matter. It is not intended to identify key/critical elements of the embodiments or to delineate the scope of the subject matter. Its sole purpose is to present some concepts of the subject matter in a simplified form as a prelude to the more detailed description that is presented later.
The subject matter relates generally to email classification, and more particularly to systems and methods for detecting email spam. Decision trees populated with classifier models are leveraged to provide enhanced spam detection utilizing separate email classifiers for each feature of an email. This provides a higher probability of spam detection through tailoring of each classifier model to facilitate in more accurately determining spam on a feature-by-feature basis. Classifiers can be constructed based on linear models such as, for example, logistic-regression models and/or support vector machines (SVM) and the like. The classifiers can also be constructed based on decision trees. “Compound features” based on internal and/or external nodes of a decision tree can be utilized to provide linear classifier models as well. Smoothing of the spam detection results can be achieved by utilizing classifier models from other nodes within the decision tree if training data is sparse. This forms a base model for branches of a decision tree that may not receive substantial training data.
To the accomplishment of the foregoing and related ends, certain illustrative aspects of embodiments are described herein in connection with the following description and the annexed drawings. These aspects are indicative, however, of but a few of the various ways in which the principles of the subject matter may be employed, and the subject matter is intended to include all such aspects and their equivalents. Other advantages and novel features of the subject matter may become apparent from the following detailed description when considered in conjunction with the drawings.
The subject matter is now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the subject matter. It may be evident, however, that subject matter embodiments may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing the embodiments.
As used in this application, the term “component” is intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a server and the server can be a computer component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers. A “thread” is the entity within a process that the operating system kernel schedules for execution. As is well known in the art, each thread has an associated “context” which is the volatile data associated with the execution of the thread. A thread's context includes the contents of system registers and the virtual address belonging to the thread's process. Thus, the actual data comprising a thread's context varies as it executes.
The subject matter utilizes a class of models to detect whether or not an email is spam. In particular, a tree of classifiers is employed for email classification where the internal nodes of the tree correspond to features and each out-going edge from an internal node corresponds to a value for that feature. Within each leaf of the tree is a separate classifier. Thus, the decision tree facilitates in determining which classifier is utilized to determine if the email is likely to be spam. Overall classification accuracy is increased by dividing emails into a plurality of partitions and employing a classifier tailored for that specific partition. The partitioning is based on features of the emails such as, for example, length, font size, word choice, sender, and/or subject line and the like. For example, the length of an email message and the number of recipients can be utilized to dictate that different classifiers are employed to optimize the classification of each feature.
One skilled in the art can appreciate that the techniques described infra can also be augmented with preprocessing, for example, to remove commented and/or otherwise invisible text, etc. from HTML and other formatted emails to enhance spam detection. This facilitates to prevent spammers from hiding classification features inside the emails in order to sway detection results. User interfaces and/or automated techniques can also be incorporated to facilitate in personalizing and/or updating the infra techniques. This can be accomplished, for example, by collecting user data such as, for example, hand classified spam or desired emails as well as other data such as, for example, emails responded to and emails not responded to. This type of data can be utilized to facilitate the training of trees of classifiers. Personalization makes it much harder for spammers to defeat filtering techniques because every user has a different filter. The augmented techniques can also include white lists that contain, for example, email addresses that should not be categorized as spam regardless of classification. Automated data collection, features such as reverse IP lookups for address fraud, and/or challenges (requests of postage—computation and/or monetary) and the like are some additional techniques that can be utilized to augment the trees of classifiers to facilitate performance as well.
In
Looking at
The decision tree classification component 210 employs the decision tree to determine an optimum classifier to utilize for evaluation of the email 204. Once the email 204 is classified, the decision tree classification component 210 provides the classified email 206 as an output. The classified email 206 can represent a spam classified email and/or other types of classified email such as “move,” “delete,” and/or “ignore” email and the like. In other words, the email 204 can be processed by one instance of the email classification system 200 to determine if the email 204 is spam. In another personalized instance of the email classification system 200, the email 204 can be determined to not be spam, but can be categorized as unimportant to a user. In this scenario, the email 204 may be classified as spam, move, delete, and/or ignore and the like by the user.
Turning to
The leaves (i.e., classifiers) of the decision tree are be tailored to email containing specific features. The classifiers themselves can be different types such as, for example, linear classifiers and/or even other decision trees and the like. Linear classifiers utilized by the email classifier component 302 can include, but are not limited to, support vector machines (SVM) and/or logistic-regression classifiers and the like. The decision tree classification component 310 employs the decision tree to determine an optimum classifier to utilize for evaluation of the email 304. Once the email 304 is classified, the decision tree classification component 310 provides the classified email 306 as an output. The classified email 306 can represent a spam classified email and/or other types of classified email such as “move,” “delete,” and/or “ignore” email and the like. In other words, the email 304 can be processed by one instance of the email classifier component 302 to determine if the email 304 is spam. In another personalized instance of the email classification system 300, the email 304 can be determined to not be spam, but can be categorized as unimportant to a user. In this scenario, the email 304 may be classified as spam, move, delete, and/or ignore and the like by the user
Thus, this technology provides a simple generalization of spam classification where a decision tree is utilized to facilitate in the determination of which classifier to use for a given feature. Separate classifiers are then employed based on a feature-by-feature basis, increasing the accuracy of the spam detection. For example, suppose that depending on the length of an email message and the number of recipients, the best classifier to use may be different. An example decision tree 400 with a root node 402, child node 404, and leaf nodes 406-410 for this situation is illustrated in
To classify a particular email, the process starts at the root node 402 of the decision tree 400. Because the root node 402 is an internal node, the process looks at the corresponding feature of the email, namely its length. Suppose the email has 1000 characters, and therefore the process traverses to the right child node 404 of the root node 402. The process then looks at the number of recipients on the email, and assuming there is exactly one, the process traverses the left child and ends at the “Classifier 2” leaf node 406. The process then classifies the email using the classifier corresponding to this leaf node 406. The subject matter allows any type of classifier to be used in the leaf nodes 406-410, including a decision tree classifier. One instance utilizes a logistic-regression classifier in the leaves, which is of the same form as an SVM but is trained differently.
Data sparsity can become a problem with a decision tree of classifiers. For example, the bottom nodes might be trained on only one fourth as many examples as a normal classifier would be trained on, generally meaning that there is not enough data to do an accurate job. Some words might not appear at all in many messages in a leaf node. This can be alleviated by smoothing across nodes. For example, a linear classifier corresponding to an internal node of the tree can be trained based on all data relevant to that node and can be utilized as a “prior” for the nodes below it. The final classification of an email would be a function of all of the classifiers within the tree that are on the path from the root node to the leaf.
If a decision tree consists of linear classifiers, the entire tree can often be converted into a single linear model by creating “compound” features that incorporate the structure of the tree. For example, for the tree 400 above, the weight for the feature word “click” in Classifier 1408 might be C1. For Classifier 2406 it would be C2 and for Classifier 3410, it would be C3. The process can create the compound features: ≦400 characters AND click” with weight C1. This is a feature that occurs only if the word click appears in a message with ≦400 characters. Similarly, the process can have “Number of recipients≦2 and >400 characters and word click”with weight C2 which would be true if the word click occurred in a short message with at most 2 recipients. Similarly, the process can create a compound feature for “click” corresponding with weight C3. If this is done for all words, then the resulting model is a linear model, with the same performance as the decision tree of linear classifiers.
This equivalency can be employed to build even more interesting models. For example, the process can build a decision tree in the way described supra. The process can then create compound features for all nodes, including internal nodes. A single model can then be learned for this set of compound features. This is a similar way of alleviating the data sparsity problem. The resulting model is a linear model at least as powerful as the decision tree model.
Decision Trees for Spam Filtering
Machine learning techniques can be utilized to construct a decision tree that partitions messages into sets and then learns a separate filter on the messages in each of these partitions. For example, HTML (hyper text markup language) messages can be partitioned from plain text ones and a separate model learned for each of them. Utilizing this process can increase spam detection by as much as 20% over current spam filters that utilize text features alone. Employing decision trees can also increase the accuracy of sets of messages when different properties can be found, specifically where things learned from messages of one set do not generalize well to messages of the other types. This can happen for messages of different languages, for messages of different sizes, and/or for messages with different encodings, etc. Decision trees also allow training with more email because a large set of email can be partitioned into many smaller sets, and MaxEnt training only needs to be applied on these partitions (and not on the whole data set).
Feature Description
Decision trees are a well known and much studied technique for classifying data (see, Breiman, Friedman, Olshen, and Stone; Classification and Regression Trees Wadsworth & Brooks; Monterey Calif.; 1984 and Quinlan; C4. 5: programs for machine learning; Morgan Kaufmann; San Francisco, Calif.; 1993). However, the subject matter herein employs a decision tree to partition email messages into sets that have similar feature properties and then train separate models using classifiers on each of these sets. By employing classifier models built using decision trees, for example, a 20% increase in spam email detection can be achieved over spam detection by a single monolithic model.
In
The power of the decision tree approach is that it allows separation of groups of messages that might otherwise confuse the classifier learning process. For example, if the text ‘font’ appears in an HTML message there is a good chance it is being used to make the display font very tiny—a common trick spammers use to try to confuse spam filters. The same text, ‘font’, in a plain text message cannot possibly change the display font, and thus is probably not a spammer trick. Separating plain text from HTML messages thus allows a more refined way to treat message content, and, thus, do a better job at filtering spam.
The distinction between HTML and plain text messages is intuitively clear, but there are many other potentially useful distinctions (see TABLE 1 infra for a list of example properties/features that can be utilized to partition email into sets). Other existing spam filters can employ data gathering techniques and machine learning algorithms to automatically determine how to partition messages to best filter spam. The algorithm takes as an input a set of email messages that have been labeled as spam or non-spam and a set of properties which should be considered as ways to partition the set of e-mail. The algorithm works roughly as follows. The utility of each partitioning property is estimated by (1) using it to partition the training data, (2) learning classifiers on the data in each of the resulting sets, and (3) recording how well the resulting models distinguish spam from good email. The partitioning property that resulted in the best spam classification is selected and added as a test to the decision tree, and then the learning algorithm is called recursively on the resulting data partitions. The recursion terminates when no further partitioning results in a better model than learning a single model on all the remaining data.
Algorithm Description
TABLE 2 contains an example pseudo-code for the algorithm described supra. The algorithm starts with a tree that has a single leaf node. It recursively grows the tree by adding splits on message properties, partitioning the data according to the splits, and recurring on the partitions until some termination conditions are met.
Partitionings are evaluated based on their classification accuracy on holdout data. At the beginning of a learning run, each training message is randomly assigned to be part of the TreeTrain set with 70% probability; messages not in TreeTrain are placed into a Holdout set. Each potential partitioning is scored by (1) using it to split the data (both TreeTrain and Holdout), (2) learning a classifier on the training data in each partition, and (3) computing the classification accuracy of model on the partition's holdout data (at the 50% probability threshold). The final score for partitioning the data by one of the properties is the weighted average of the scores of the models built on the data partitions; the weight for each model is simply the fraction of messages used to train that model. The partition function with the highest score is compared to a single classifier model built on all of the training data and evaluated on all of the holdout data. If the single model is better, the recursion along that path is terminated. If one of the partition functions is better, a split based on that partition function is added to the tree, the data is partitioned, and the learning algorithm is called recursively on each partition.
For example, consider the property ‘Message contains an attachment’. To evaluate a split based on this property, all the training messages (from the Train Tree set) that do contain an attachment are put into a set, call it contain_yes, and all the training messages that do not contain an attachment are put into a set, call it contain_no. For the sake of the example, suppose 40% of the data is in contain_yes and 60% of the data is in contain_no. A classifier is then built on the data in each of these sets and one is also built on all of the data in TreeTrain. The holdout data is then split into sets in a similar manner; call these sets h_contain_Yes and h_contain_no. The three models are evaluated by determining their classification accuracy on their respective holdout sets. Suppose that the model built on contain_yes achieves 80% accuracy on the data in h_contain_yes, the model built on contain_no achieves 100% accuracy on the data in h_contain_no, and the model built on TrainTree achieves 90% accuracy on the data in Holdout. The score for the ‘Message contains an attachment’ partitioning property is the weighted average of the model built on contain_yes and the one build on contain_no, and is 0.92 (that is: 0.4* 0.8 +0.6*1.0). The score for the partition is better than the score for the single monolithic model and so a split is put in the tree and the algorithm is called recursively on the data in contain_yes and contain_no. The algorithm evaluates partitions based on all of the properties and selects the best (while this example only evaluated a partition on a single property).
This algorithm has two types of features for each message: the partition properties and the Text/IP/hostname features that are used by the classifier. Partition properties are used as tests in internal nodes of the decision tree, and the Text/IP/hostname features are used in the classifier models at the leaves. The Text/IP/hostname features can be utilized as tests in the decision tree, but doing so introduces additional complexities.
Some of the message properties utilized for partitioning take numeric values (e.g., message size is expressed in bytes). For such attributes the learning algorithm considers partitioning data based on a series of thresholds and selects the threshold that works best. Thresholds are chosen empirically based on the distribution of the properties values in the training data. For example, for the size attribute, the algorithm can consider: splitting messages<1000 bytes from those≧1000 bytes; splitting messages<4000 bytes from those ≧4000 bytes; etc.
Some of the message properties utilized for partitioning have more than two values (e.g., the character set of the message). For such properties the learning algorithm considers a binary split for each possible value. For example, it considers: splitting all messages in US-ASCII from all other messages; splitting all messages in ANSI/OEM Japanese from all other messages; etc.
Decision Tree Based Examples
In one example experiment, training data was utilized from the 300,000 messages before a certain date and the testing data consisted of 40,000 messages from a period occurring two weeks later. Tree training was carried out using 50,000 features to score the MaxEnt models at the leaves. Once the structure of the decision tree was fixed, 500,000 features were evenly divided among the leaves of the decision tree and learned models that utilized MaxEnt code. For example, if the learned decision tree had 10 leaves, a MaxEnt model with 50,000 features is learned for each of them.
TABLE 2 shows the attributes that were considered for splitting on in the experiments. Notice that this is a subset of the attributes listed in TABLE 1. All of the attributes shown in TABLE 2 were used for learning a depth 5 tree, except for the length of the messageID. The features listed in TABLE 2 are roughly in order of value that they added to the decision tree.
TABLE 3 shows summary information about the decision trees learned with different depth limits. For example, depth 4 appeared to give the best results for this scenario.
Alternative Implementations
There is an alternative implementation strategy that results in a single model (instead of a decision tree with one model at each leaf). This can be easier to display to clients and, in some situations, it can have better results than the implementation described supra. Models can be built utilizing data from very different time periods and a determination made as to how much the structure of the learned decision tree changes over time. If it changes a substantial amount, it is indicative of the necessity to build new decision trees regularly. If it does not, then less frequent updates to the decision tree can be adequate to maintain performance. Models can also be built utilizing up-to-date IP features. An application can be written to extract all the features from, for example, TABLE 1 to determine if they facilitate the process. However, if they do not help, the learning algorithm automatically determines this and does not use them.
Splits can be evaluated based on their classification accuracy at thresholds of concern. Models can be learned using different sigma squared values. It is possible that a value determined to be best for text data (e.g., 0.001) can be magnitudes off of a best value for decision trees (e.g., 1.0). A set of holdout users can be constructed instead of a set of holdout data. Two messages to the same person can have important hints inside the messages (e.g., the person's name) which a learner can key off of, and build a model that does not generalize well across users.
Words and/or phrases can be utilized to partition data. For example, a model can be learned for all messages with the phrase ‘order confirmation’ in it, and one model for all other messages. Partitioning can be based on the presence of any token output by a cracking process, or a list of candidate words and phrases, or both can be handcrafted.
Utilizing Linear Models with Trees
Different types of linear classifiers can be utilized in a decision tree for detecting email spam. In general, weighted sums of email features are determined and compared to a threshold value, forming a linear model. For example, lists of “spam-like” and “non-spam-like” words can be created. A good example of a spam-like word is “wet” and a good example of a non-spam-like word is “weather.” Weights—numbers—to be associated with each of these words are then learned. The more spam-like a word is, the larger the weight. In addition to word-based features, other features can be utilized as well, such as the time of day when the message was sent (spam is typically more likely to be sent at night). Weights are associated with these other features as well. Generally, any message information can have an associated weight.
When a new email is partitioned via a decision tree to a linear classifier leaf, the list of features in that message is determined (i.e., what words are present, etc.) and then all the weights associated with those features are summed. If the total weight is more than a threshold value, the email is labeled as spam. If it is less than that value, the email is labeled as good. The total weight can also be converted into a probability and then compared to a threshold. The probability is derived from the total weight, and so is representative of utilizing the weights directly.
Feature sets can be relatively simple. Specific words in an email tend to be among the most important features. Typically, features also tend to be binary—that is, either the feature is present, or it is not. For example, a feature can be whether the word “wet” occurs in an email and whether the word “weather” occurs in an email. Phrases can also be utilized as features. For example, “SEE YOU,” “NEXT WEEK,” and “SCHEDULED FOR” phrases tend to indicate that an email is not spam, while the phrases like “MAKE MONEY” and “DEAR FRIEND” tend to indicate that an email is spam. Features can also include distinguishing between whether a word occurs in the subject or the body of an email. For example, if the word “QUESTION” occurs in the subject of an email, it tends to indicate not-spam, while if it occurs in the body, it tends to indicate spam. Additional types of features can be utilized as well. These include, for example, whether the email has attachments, how large the email is, whether there are non-alphanumeric words in the email body or in the subject, how many recipients of the email there are, the time the email was sent, whether there are all uppercase words in the email body or subject, the percentage of uppercase words in the body, and/or the percentage of non-alphanumeric characters in the body and the like.
In general, it is not practicable to utilize, for example, every word that occurs in every email as a feature—there are simply too many. However, there are a number of techniques for determining which words to use. One technique is called “mutual information” which measures how much information a word gives about whether or not an email is spam. A word that occurs only in spam or only in good messages has high mutual information; a word that occurs equally often in spam and non-spam has zero mutual information. Frequently, mutual information favors rare words (a word that only occurs a few times but always in good mail or spam has high mutual information).
The values or weights assigned to these features can also be determined by applying specific algorithms such as with support vector machine (SVM) techniques. The basic idea of SVM techniques is to find a set of weights such that all the spam email is above a threshold and all the good email is below that threshold. SVM techniques have the property that spam emails are as far above the threshold as possible and the non-spam is as far below as possible. A perfect separator between spam and good emails can be difficult to determine due to such factors as, for example, mislabeled training data and/or the problem is inherently very hard (e.g., a friend forwards spam email to a user which can be associated as non-spam, but the contents looks substantially like spam).
The SVM techniques determine the best weights and determine a maximal separation of good and spam emails. SVM algorithms that are robust to misclassifications can also be employed even when there is not a perfect separation of email types. In that case, the SVM algorithm finds a line that does as good a job as possible. However, users typically do not want to miss any of their wanted email, so the threshold that the SVM finds is generally too aggressive for most users. For emails that are right on the threshold, there is about a 50% chance that an email has been misclassified. Fortunately, the further the emails are from the threshold, the less likely it is that the email is misclassified. Thus, a separation line can be selected with the same slope (same weights) but a different, more conservative threshold. Some spam may reach a user, but the probability of filtering out legitimate emails is substantially reduced.
The distance from the optimal split is a useful way to estimate the probability that an email is spam. When a message is substantially near this threshold, it is difficult to determine if the email spam or not—the probability is 50/50. As the emails get further and further from this line, the probability increases. The distance from this line is the sum of the weights, minus the threshold value—when the distance is zero (right on the threshold), the probability is 50/50. An example function for mapping this distance is:
exp(a×distance+b)/(1+exp(a×distance+b)) (Eq. 1)
where exp is the exponentiation function, and a and b are constants. The constants can be determined, for example, by utilizing an algorithm that considers how many points in the training data are on the wrong side of the line and how far various points are from the line. The threshold can be set in terms of the probability, rather than in terms of any particular distance. For example, a probability threshold of 96.5% can classify approximately 60% of the email as spam, and misses almost none of the wanted mail. User interfaces can be utilized to allow users to control how aggressive the filter is. Thus, the user can adjust the probability threshold, with a default setting of 96.5% being good. But, users who want to be more aggressive or more conservative can adjust it in either direction.
In view of the exemplary systems shown and described above, methodologies that may be implemented in accordance with the embodiments will be better appreciated with reference to the flow charts of
The embodiments may be described in the general context of computer-executable instructions, such as program modules, executed by one or more components. Generally, program modules include routines, programs, objects, data structures, etc., that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various instances of the embodiments.
In
Turning to
Looking at
In order to provide additional context for implementing various aspects of the embodiments,
As used in this application, the term “component” is intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and a computer. By way of illustration, an application running on a server and/or the server can be a component. In addition, a component may include one or more subcomponents.
With reference to
The system bus 1108 may be any of several types of bus structure including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of conventional bus architectures such as PCI, VESA, Microchannel, ISA, and EISA, to name a few. The system memory 1106 includes read only memory (ROM) 1110 and random access memory (RAM) 1112. A basic input/output system (BIOS) 1114, containing the basic routines that help to transfer information between elements within the computer 1102, such as during start-up, is stored in ROM 1110.
The computer 1102 also may include, for example, a hard disk drive 1116, a magnetic disk drive 1118, e.g., to read from or write to a removable disk 1120, and an optical disk drive 1122, e.g., for reading from or writing to a CD-ROM disk 1124 or other optical media. The hard disk drive 1116, magnetic disk drive 1118, and optical disk drive 1122 are connected to the system bus 1108 by a hard disk drive interface 1126, a magnetic disk drive interface 1128, and an optical drive interface 1130, respectively. The drives 1116-1122 and their associated computer-readable media provide nonvolatile storage of data, data structures, computer-executable instructions, etc. for the computer 1102. Although the description of computer-readable media above refers to a hard disk, a removable magnetic disk and a CD, it should be appreciated by those skilled in the art that other types of media which are readable by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, and the like, can also be used in the exemplary operating environment 1100, and further that any such media may contain computer-executable instructions for performing the methods of the embodiments.
A number of program modules may be stored in the drives 1116-1122 and RAM 1112, including an operating system 1132, one or more application programs 1134, other program modules 1136, and program data 1138. The operating system 1132 may be any suitable operating system or combination of operating systems. By way of example, the application programs 1134 and program modules 1136 can include an email classification scheme in accordance with an aspect of an embodiment.
A user can enter commands and information into the computer 1102 through one or more user input devices, such as a keyboard 1140 and a pointing device (e.g., a mouse 1142). Other input devices (not shown) may include a microphone, a joystick, a game pad, a satellite dish, a wireless remote, a scanner, or the like. These and other input devices are often connected to the processing unit 1104 through a serial port interface 1144 that is coupled to the system bus 1108, but may be connected by other interfaces, such as a parallel port, a game port or a universal serial bus (USB). A monitor 1146 or other type of display device is also connected to the system bus 1108 via an interface, such as a video adapter 1148. In addition to the monitor 1146, the computer 1102 may include other peripheral output devices (not shown), such as speakers, printers, etc.
It is to be appreciated that the computer 1102 can operate in a networked environment using logical connections to one or more remote computers 1160. The remote computer 1160 may be a workstation, a server computer, a router, a peer device or other common network node, and typically includes many or all of the elements described relative to the computer 1102, although for purposes of brevity, only a memory storage device 1162 is illustrated in
When used in a LAN networking environment, for example, the computer 1102 is connected to the local network 1164 through a network interface or adapter 1168. When used in a WAN networking environment, the computer 1102 typically includes a modem (e.g., telephone, DSL, cable, etc.) 1170, or is connected to a communications server on the LAN, or has other means for establishing communications over the WAN 1166, such as the Internet. The modem 1170, which can be internal or external relative to the computer 1102, is connected to the system bus 1108 via the serial port interface 1144. In a networked environment, program modules (including application programs 1134) and/or program data 1138 can be stored in the remote memory storage device 1162. It will be appreciated that the network connections shown are exemplary and other means (e.g., wired or wireless) of establishing a communications link between the computers 1102 and 1160 can be used when carrying out an aspect of an embodiment.
In accordance with the practices of persons skilled in the art of computer programming, the embodiments have been described with reference to acts and symbolic representations of operations that are performed by a computer, such as the computer 1102 or remote computer 1160, unless otherwise indicated. Such acts and operations are sometimes referred to as being computer-executed. It will be appreciated that the acts and symbolically represented operations include the manipulation by the processing unit 1104 of electrical signals representing data bits which causes a resulting transformation or reduction of the electrical signal representation, and the maintenance of data bits at memory locations in the memory system (including the system memory 1106, hard drive 1116, floppy disks 1120, CD-ROM 1124, and remote memory 1162) to thereby reconfigure or otherwise alter the computer system's operation, as well as other processing of signals. The memory locations where such data bits are maintained are physical locations that have particular electrical, magnetic, or optical properties corresponding to the data bits.
It is to be appreciated that the systems and/or methods of the embodiments can be utilized in email classification facilitating computer components and non-computer related components alike. Further, those skilled in the art will recognize that the systems and/or methods of the embodiments are employable in a vast array of electronic related technologies, including, but not limited to, computers, servers and/or handheld electronic devices, and the like.
What has been described above includes examples of the embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of the embodiments are possible. Accordingly, the subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the term “includes” is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.
Number | Name | Date | Kind |
---|---|---|---|
5377354 | Scannell et al. | Dec 1994 | A |
5459717 | Mullan et al. | Oct 1995 | A |
5619648 | Canale et al. | Apr 1997 | A |
5638487 | Chigier | Jun 1997 | A |
5704017 | Heckerman et al. | Dec 1997 | A |
5805801 | Holloway et al. | Sep 1998 | A |
5835087 | Herz et al. | Nov 1998 | A |
5884033 | Duvall et al. | Mar 1999 | A |
5905859 | Holloway et al. | May 1999 | A |
5911776 | Guck | Jun 1999 | A |
5930471 | Milewski et al. | Jul 1999 | A |
5999932 | Paul | Dec 1999 | A |
6003027 | Prager | Dec 1999 | A |
6023723 | McCormick et al. | Feb 2000 | A |
6041321 | Fabbrizio et al. | Mar 2000 | A |
6041324 | Earl et al. | Mar 2000 | A |
6047242 | Benson | Apr 2000 | A |
6052709 | Paul | Apr 2000 | A |
6072942 | Stockwell et al. | Jun 2000 | A |
6074942 | Lou | Jun 2000 | A |
6101531 | Eggleston et al. | Aug 2000 | A |
6112227 | Heiner | Aug 2000 | A |
6122657 | Hoffman, Jr. et al. | Sep 2000 | A |
6128608 | Barnhill | Oct 2000 | A |
6144934 | Stockwell et al. | Nov 2000 | A |
6157921 | Barnhill | Dec 2000 | A |
6161130 | Horvitz et al. | Dec 2000 | A |
6167434 | Pang | Dec 2000 | A |
6192114 | Council | Feb 2001 | B1 |
6192360 | Dumais et al. | Feb 2001 | B1 |
6195698 | Lillibridge et al. | Feb 2001 | B1 |
6199102 | Cobb | Mar 2001 | B1 |
6199103 | Sakaguchi et al. | Mar 2001 | B1 |
6249807 | Shaw et al. | Jun 2001 | B1 |
6266692 | Greenstein | Jul 2001 | B1 |
6308273 | Goertzel et al. | Oct 2001 | B1 |
6314421 | Sharnoff et al. | Nov 2001 | B1 |
6321267 | Donaldson | Nov 2001 | B1 |
6324569 | Ogilvie et al. | Nov 2001 | B1 |
6327617 | Fawcett | Dec 2001 | B1 |
6330590 | Cotten | Dec 2001 | B1 |
6332164 | Jain | Dec 2001 | B1 |
6351740 | Rabinowitz | Feb 2002 | B1 |
6370526 | Agrawal et al. | Apr 2002 | B1 |
6393465 | Leeds | May 2002 | B2 |
6421709 | McCormick et al. | Jul 2002 | B1 |
6424997 | Buskirk, Jr. et al. | Jul 2002 | B1 |
6427141 | Barnhill | Jul 2002 | B1 |
6434600 | Waite et al. | Aug 2002 | B2 |
6449635 | Tilden, Jr. et al. | Sep 2002 | B1 |
6453327 | Nielsen | Sep 2002 | B1 |
6477551 | Johnson et al. | Nov 2002 | B1 |
6484197 | Donohue | Nov 2002 | B1 |
6484261 | Wiegel | Nov 2002 | B1 |
6505250 | Freund et al. | Jan 2003 | B2 |
6546390 | Pollack et al. | Apr 2003 | B1 |
6546416 | Kirsch | Apr 2003 | B1 |
6592627 | Agrawal et al. | Jul 2003 | B1 |
6615242 | Riemers | Sep 2003 | B1 |
6618747 | Flynn et al. | Sep 2003 | B1 |
6633855 | Auvenshine | Oct 2003 | B1 |
6643686 | Hall | Nov 2003 | B1 |
6654787 | Aronson et al. | Nov 2003 | B1 |
6684201 | Brill | Jan 2004 | B1 |
6691156 | Drummond et al. | Feb 2004 | B1 |
6701350 | Mitchell | Mar 2004 | B1 |
6701440 | Kim et al. | Mar 2004 | B1 |
6704772 | Ahmed et al. | Mar 2004 | B1 |
6728690 | Meek et al. | Apr 2004 | B1 |
6732149 | Kephart | May 2004 | B1 |
6732157 | Gordon et al. | May 2004 | B1 |
6732273 | Byers | May 2004 | B1 |
6742047 | Tso | May 2004 | B1 |
6748422 | Morin et al. | Jun 2004 | B2 |
6751348 | Buzuloiu et al. | Jun 2004 | B2 |
6757830 | Tarbotton et al. | Jun 2004 | B1 |
6768991 | Hearnden | Jul 2004 | B2 |
6775704 | Watson et al. | Aug 2004 | B1 |
6779021 | Bates et al. | Aug 2004 | B1 |
6785820 | Muttik et al. | Aug 2004 | B1 |
6842773 | Ralston et al. | Jan 2005 | B1 |
6853749 | Watanabe et al. | Feb 2005 | B2 |
6868498 | Katsikas | Mar 2005 | B1 |
6892193 | Bolle et al. | May 2005 | B2 |
6901398 | Horvitz et al. | May 2005 | B1 |
6915334 | Hall | Jul 2005 | B1 |
6920477 | Mitzenmacher | Jul 2005 | B2 |
6928465 | Earnest | Aug 2005 | B2 |
6957259 | Malik | Oct 2005 | B1 |
6971023 | Makinson et al. | Nov 2005 | B1 |
6990485 | Forman et al. | Jan 2006 | B2 |
7003555 | Jungck | Feb 2006 | B1 |
7032030 | Codignotto | Apr 2006 | B1 |
7039949 | Cartmell et al. | May 2006 | B2 |
7051077 | Lin | May 2006 | B2 |
7072942 | Maller | Jul 2006 | B1 |
7117358 | Bandini et al. | Oct 2006 | B2 |
7146402 | Kucherawy | Dec 2006 | B2 |
7155243 | Baldwin et al. | Dec 2006 | B2 |
7155484 | Malik | Dec 2006 | B2 |
7188369 | Ho et al. | Mar 2007 | B2 |
7206814 | Kirsch | Apr 2007 | B2 |
7219148 | Rounthwaite et al. | May 2007 | B2 |
7249162 | Rounthwaite et al. | Jul 2007 | B2 |
7263607 | Ingerman et al. | Aug 2007 | B2 |
7287060 | McCown et al. | Oct 2007 | B1 |
7293063 | Sobel | Nov 2007 | B1 |
7320020 | Chadwick et al. | Jan 2008 | B2 |
7321922 | Zheng et al. | Jan 2008 | B2 |
7359941 | Doan et al. | Apr 2008 | B2 |
7366761 | Murray et al. | Apr 2008 | B2 |
7574409 | Patinkin | Aug 2009 | B2 |
7711779 | Goodman et al. | May 2010 | B2 |
20010039575 | Freund et al. | Nov 2001 | A1 |
20010046307 | Wong | Nov 2001 | A1 |
20010049745 | Schoeffler | Dec 2001 | A1 |
20020016956 | Fawcett | Feb 2002 | A1 |
20020059425 | Belfiore et al. | May 2002 | A1 |
20020073157 | Newman et al. | Jun 2002 | A1 |
20020091738 | Rohrabaugh et al. | Jul 2002 | A1 |
20020124025 | Janakiraman et al. | Sep 2002 | A1 |
20020129111 | Cooper | Sep 2002 | A1 |
20020147782 | Dimitrova et al. | Oct 2002 | A1 |
20020174185 | Rawat et al. | Nov 2002 | A1 |
20020184315 | Earnest | Dec 2002 | A1 |
20020199095 | Bandini et al. | Dec 2002 | A1 |
20030009495 | Adjaoute | Jan 2003 | A1 |
20030009698 | Lindeman et al. | Jan 2003 | A1 |
20030016872 | Sun | Jan 2003 | A1 |
20030037074 | Dwork et al. | Feb 2003 | A1 |
20030041126 | Buford et al. | Feb 2003 | A1 |
20030088627 | Rothwell et al. | May 2003 | A1 |
20030149733 | Capiel | Aug 2003 | A1 |
20030167311 | Kirsch | Sep 2003 | A1 |
20030191969 | Katsikas | Oct 2003 | A1 |
20030200541 | Cheng et al. | Oct 2003 | A1 |
20030204569 | Andrews et al. | Oct 2003 | A1 |
20030229672 | Kohn | Dec 2003 | A1 |
20040003283 | Goodman et al. | Jan 2004 | A1 |
20040015554 | Wilson | Jan 2004 | A1 |
20040019650 | Auvenshine | Jan 2004 | A1 |
20040019651 | Andaker | Jan 2004 | A1 |
20040054887 | Paulsen, Jr. et al. | Mar 2004 | A1 |
20040059697 | Forman | Mar 2004 | A1 |
20040073617 | Milliken et al. | Apr 2004 | A1 |
20040083270 | Heckerman et al. | Apr 2004 | A1 |
20040093371 | Burrows et al. | May 2004 | A1 |
20040139160 | Wallace et al. | Jul 2004 | A1 |
20040139165 | McMillan et al. | Jul 2004 | A1 |
20040148330 | Alspector et al. | Jul 2004 | A1 |
20040177120 | Kirsch | Sep 2004 | A1 |
20040199585 | Wang | Oct 2004 | A1 |
20040199594 | Radatti et al. | Oct 2004 | A1 |
20040210640 | Chadwick et al. | Oct 2004 | A1 |
20040215977 | Goodman et al. | Oct 2004 | A1 |
20040255122 | Ingerman et al. | Dec 2004 | A1 |
20040260776 | Starbuck et al. | Dec 2004 | A1 |
20050015455 | Liu | Jan 2005 | A1 |
20050021649 | Goodman et al. | Jan 2005 | A1 |
20050041789 | Warren-Smith et al. | Feb 2005 | A1 |
20050050150 | Dinkin | Mar 2005 | A1 |
20050060643 | Glass et al. | Mar 2005 | A1 |
20050076084 | Loughmiller et al. | Apr 2005 | A1 |
20050080855 | Murray | Apr 2005 | A1 |
20050080889 | Malik et al. | Apr 2005 | A1 |
20050081059 | Bandini et al. | Apr 2005 | A1 |
20050091320 | Kirsch et al. | Apr 2005 | A1 |
20050091321 | Daniell et al. | Apr 2005 | A1 |
20050097174 | Daniell | May 2005 | A1 |
20050102366 | Kirsch | May 2005 | A1 |
20050108340 | Gleeson et al. | May 2005 | A1 |
20050114452 | Prakash | May 2005 | A1 |
20050120019 | Rigoutsos et al. | Jun 2005 | A1 |
20050159136 | Rouse et al. | Jul 2005 | A1 |
20050160148 | Yu | Jul 2005 | A1 |
20050165895 | Rajan et al. | Jul 2005 | A1 |
20050182735 | Zager et al. | Aug 2005 | A1 |
20050188023 | Doan et al. | Aug 2005 | A1 |
20050198270 | Rusche et al. | Sep 2005 | A1 |
20050204005 | Purcell et al. | Sep 2005 | A1 |
20050204006 | Purcell et al. | Sep 2005 | A1 |
20050204159 | Davis et al. | Sep 2005 | A1 |
20050228899 | Wendkos et al. | Oct 2005 | A1 |
20060015942 | Judge et al. | Jan 2006 | A1 |
20060031303 | Pang | Feb 2006 | A1 |
20060031306 | Haverkos | Feb 2006 | A1 |
20060036693 | Hulten et al. | Feb 2006 | A1 |
20060036701 | Bulfer et al. | Feb 2006 | A1 |
20060123083 | Goutte et al. | Jun 2006 | A1 |
20060168017 | Stern et al. | Jul 2006 | A1 |
20060265498 | Turgeman et al. | Nov 2006 | A1 |
20070101423 | Oliver et al. | May 2007 | A1 |
20070118759 | Sheppard | May 2007 | A1 |
20070130350 | Alperovitch et al. | Jun 2007 | A1 |
20070130351 | Alperovitch et al. | Jun 2007 | A1 |
20070133034 | Jindal et al. | Jun 2007 | A1 |
20070208856 | Rounthwaite et al. | Sep 2007 | A1 |
20080016579 | Pang | Jan 2008 | A1 |
20080104186 | Wieneke et al. | May 2008 | A1 |
20090157708 | Bandini et al. | Jun 2009 | A1 |
Number | Date | Country |
---|---|---|
1350247 | May 2002 | CN |
413 537 | Feb 1991 | EP |
413537 | Feb 1991 | EP |
0413537 | Feb 1991 | EP |
720 333 | Jul 1996 | EP |
720333 | Jul 1996 | EP |
1376427 | Mar 2003 | EP |
1300997 | Apr 2003 | EP |
1376427 | Jan 2004 | EP |
1376427 | Jan 2004 | EP |
10074172 | Mar 1998 | JP |
2000163341 | Jun 2000 | JP |
2001505371 | Apr 2001 | JP |
2002149611 | May 2002 | JP |
2002164887 | Jun 2002 | JP |
2002330175 | Nov 2002 | JP |
2002537727 | Nov 2002 | JP |
2003115925 | Apr 2003 | JP |
2003125005 | Apr 2003 | JP |
519591 | Feb 2003 | TW |
520483 | Feb 2003 | TW |
521213 | Feb 2003 | TW |
WO9635994 | Nov 1996 | WO |
WO9635994 | Nov 1996 | WO |
WO9910817 | Mar 1999 | WO |
WO9937066 | Jul 1999 | WO |
9967731 | Dec 1999 | WO |
WO9967731 | Dec 1999 | WO |
WO0146872 | Jun 2001 | WO |
WO0219069 | Mar 2002 | WO |
WO0223390 | Mar 2002 | WO |
WO0230054 | Apr 2002 | WO |
WO 02071286 | Sep 2002 | WO |
WO02071286 | Sep 2002 | WO |
WO02071286 | Sep 2002 | WO |
WO02082226 | Oct 2002 | WO |
03054764 | Jul 2003 | WO |
WO03054764 | Jul 2003 | WO |
WO 2004059506 | Jul 2004 | WO |
WO2004059206 | Jul 2004 | WO |
WO2004059506 | Jul 2004 | WO |
Number | Date | Country | |
---|---|---|---|
20070038705 A1 | Feb 2007 | US |