The invention relates to electronic communications and, in particular, to the classification and management of electronic messages.
The process of sending an electronic message can be broken down into a common set of steps. These steps are broadly true for text messages, but can also be applied to the preparation of purely audio (speech), visual (images/video), or multimedia and mixed content messages. As shown in
These steps occur in four distinct zones of control, ownership, or responsibility, also shown in
When e-mail originated, it was used primarily for informal, collaborative communications in a relatively small community. Most messages were desirable, and a premium was placed on the reliable delivery of messages through the system. E-mail is now used to carry a much wider range of messages between people in many organizations. It is used for transmitting confidential information to associates and for normal business and personal communications between individuals, individuals as representatives of organizations, and automated data processing systems. There is an increasing problem with the presence of undesirable messages being transmitted through the system including, but not limited to:
(1) Unsolicited messages sent to a recipient who is unwilling and unhappy to receive them (spam);
(2) Messages from one member of an organization to another member of the same organization which the recipient is unwilling and unhappy to receive (harassment, vicarious liability);
(3) Messages from a member of an organization to another member of the same organization which carry information that is inappropriate for the recipient (Chinese wall, insider information);
(4) Messages between members of separate organizations which carry content which is legally proscribed or controlled, such as under such regulations as Sarbanes-Oxley or HIPAA or SEC blackout periods;
(5) Messages between members of separate organizations which violate the policy or business practices of the sender's organization, such as sending confidential information to a competitor;
(6) Messages which are unclear, cryptic, or could be taken or construed as having a different meaning out of context; and
(7) Messages which are important to the sender, but which may be blocked by content or other mail filters during steps C, D, or E above.
Undesirable messages are often blocked by the recipient client or forwarding servers in steps C, D, and E above, using a variety of techniques such as, but not limited to, blacklisting, header analysis, and content analysis of the message. Messages that are undesirable from the sender's point of view are occasionally blocked during step C, but much less frequently.
Managing messages while they are still under the control of the sender is in many cases the best solution. In particular, it is frequently better to block undesirable messages during step A, while control of the message is still in zone 1. However, while email policies may be created by organizations and users may be trained about what is appropriate to send in an email message, there usually is not an enforcement or advisory mechanism to see that policy is being followed during step A. Once a message has completed step A, it becomes difficult or impossible to recall an injudicious, inappropriate, or unlawful message. Once a message has been sent, it becomes part of a set of electronic records that might be recalled by investigating parties in both civil and legal cases. Further, many company processes that are applied to mail going in and out of the company in steps C or D are not applied to mail inside a company. In addition, many of the policies that need to be implemented by an organization will vary by the organizational role of the user. Rules that are appropriate for a legal department may not be appropriate for the engineering department, for example, and rules that are appropriate for an office worker may not be appropriate for the CEO.
What has been needed, therefore, is a method and system that allows the management of the content of electronic messages before they leave the client email or other electronic messaging application.
The present invention is system that allows senders to manage electronic messaging content at the point of origin by analyzing messages before they leave the client application. The system of the invention integrates with the client application being used to prepare the message for sending. In general, it can be invoked when the user hits the “send” button requesting a message transmission, when the user hits a “check compliance” button, or, as the user enters new text in the message, the system can automatically track the content of the message as it changes, analyze it in real-time, and offer advice.
In one aspect of the present invention, a send request is intercepted inside the email client. The system runs a series of message analysis steps, in parallel or in sequence, that analyze the sender, recipient, message, any attachments to the message, and/or related content and information. The output of the message analysis steps is made available for use with rules that can specify the performance of a number of actions including, but not limited to, refusing to send the message, offering the user a chance to edit the message, warning the user, automatically removing specific content, filing the content in a user accessible folder, file, or database, filing the content in a non-user accessible folder, file, or database, forwarding a copy of the message to another person for other action, adding user- or company-determined text to the top or bottom of the message or to the message subject, and allowing the administrator or implementer of the system to add application specific functionality as appropriate, such as playing audible sounds using a multimedia device or setting off inaudible alarms. The content analysis steps and the actions taken may be determined by the sender, or they may be centrally managed and determined by the organization, or a combination of the two.
The present invention is a method and system that allows senders to manage electronic messaging content at the point of origin. The present invention analyzes messages and then advises and interacts with the sender in order to prevent undesirable email from completing the step of preparing the message for transmission inside a client application (step A) and entering step B (sending the message).
The system of the invention integrates with the client application being used to prepare the message for sending before it enters step B. In general, it can be invoked in one of three ways:
In an example embodiment for analyzing a message for action and advice, either during step A or at the time that step B has been requested by the user, the rules and actions can be resident on the sender system, can be centrally located and centrally managed, or can be some combination of the two. For convenience, the system of this embodiment is now described in terms of analysis and advice provided at the time that step B has been requested. Extrapolation of these steps to the alternative scenarios will be clear to one of ordinary skill in the art.
First, the system intercepts a message at the moment that the request to send it has been made. In a preferred embodiment, the request is intercepted in the client email application using standard programming interfaces offered by the client application. In alternate embodiments, the request is intercepted inside the email client using at least one of the many other techniques known in the art such as, but not limited to, code injection, event hooking, and reverse engineering.
Next, the system runs a series of message analysis steps, in parallel or in sequence, that analyze the sender, recipient, message, any attachments to the message (documents, images, video, and audio), and/or related content and information. These analysis steps may be performed on the local machine, or may be requested from a remote server. These analyses may include, but are not limited to:
In the case of probabilistic classifiers, the output of each classifier is separated into three ranges that are configurable using two numbers: a numerical score below which a message is assumed not to be in the category and a numerical score above which in message is assumed to be in the category. The range of scores between these two values is treated as an indicator that the classifier is not sure. This third range can be used to trigger an interactive request for classification by the user, as well as being used for triggering further actions after message classification. The ability to request the user to make an auditable decision about the classification of the message allows a system to continue to train to make more accurate unassisted classifications and also offers the opportunity to catch additional data that can be used in a centralized database or distributed to other designated users in order to improve the automatic classification of messages that they send.
The output of the message analysis steps is made available for use with rules that can specify the performance of a number of actions including, but not limited to:
In this example, the rules then generate the warning dialog shown in
A currently preferred implementation of the invention is a program written in Python. However, the program can be constructed in any ordinary programming language. Additional programming languages that would be highly suitable include, but are not limited to, Perl, Java, C++, Lisp, Visual Basic, and C#. The currently preferred client email program is Outlook 2003, however, extensions to other versions of Outlook, and to other email clients such as Notes, Eudora, and other clients known or creatable in the art are ordinary extensions of the program shown here. Extension to web-mail clients including, but not limited to, Hotmail and Gmail, is also possible using ordinary browser-based extensions such as Internet Explorer Browser Helper Objects.
The example code in Table 1 defines a probabilistic classifier for analyzing whether a message is personal mail, according to one implementation of an embodiment of the present invention.
The example code in Table 2 defines a regular expression of classifier for detecting confidential personal information in the form of a Social Security number, according to one implementation of an embodiment of the present invention.
The example code in Table 3 defines a set of keywords for detecting references to competitive products or companies, according to one implementation of an embodiment of the present invention.
The example code in Table 4 defines a rule which sends a blind carbon copy of the e-mail that is being sent to a compliance officer for review when the e-mail has been identified as having either confidential information detected by the Social Security number pattern above, or when a probabilistic classifier has determined that the message is probably confidential, according to one implementation of an embodiment of the present invention.
The example code in Table 5 defines a rule, according to one implementation of an embodiment of the present invention, which prevents the user from sending an e-mail message if it contains a set of keywords comprising the dirty words made famous by George Carlin.
These processes can be applied to a variety of messages including, but not limited to, email, instant messaging, SMS, IRC, and other forms of communication which involve text message composition followed by message delivery. These techniques can also be applied to image, video, and audio messaging systems so long as the system meets two provisions: (1) there is a message which is recorded or composed before it is transmitted (as opposed to a live transmission) and (2) there is a process which will extract text or descriptive information from the image, video, or audio message. Examples include, but are not limited to, OCR for images and video, and speech recognition for audio.
In the preferred embodiment, the interface to the client program is a class of type MessagePlugin instantiated by a plugin manager inside the client program. An instance of each outbound message is passed to the method outbound. A list of requested actions is passed back to the plugin manager, which uses the native facilities of the client email program to fulfill the requests. The latter part of the listing has test code suitable for testing the class and its dependent code outside the framework of the client program.
For each message handled by the outbound method, a set of rules are loaded by rulesRoot, any attachments to the message are made available to subsequent processing, and the message is processed by a call to runrules. Any requested actions are returned to the client plugin manager.
Table 6 is an embodiment of code for an example definition of the top-level plugin class.
The code listing in Table 7 is an example implementation of a module that implements the loading, managing, and execution of the rules. Two exported procedures perform the core functionality used by the calling code: rulesRoot and runrules. Procedure rulesRoot loads definitions of classifiers, patterns, actions, and rules from an external file in XML format. Procedure runrules applies those rules to a specific message, generating interactive dialogs as needed, and returning a requested set of actions to the caller.
In the embodiment shown, objects listed in the external rules file are transformed into Python objects in a way that can be referenced naturally by the rules implementor. This transformation is straightforward in scripting languages such as Python, Perl, Lisp, and C# and more difficult, but still a matter of ordinary programming, in languages such as C++, Visual Basic, and C. The external rules file is comprised of three kinds of lists: patterns, actions, and rules. Each one is loaded by the corresponding procedures, as shown in Table 8, which is a listing of an example implementation of the module which loads and embodies lists. Each list is returned as a first class Python object.
In this embodiment, each element of a list is a first class Python object derived from a definition in an external XML file. Although the current embodiment shows loading from a single file resident on the clients machine, the embodiment generalizes straightforwardly to inclusion of secondary files on the user's machine and to referencing other files from other locations including, but not limited to, remote file systems, databases, web servers, and other forms of referenceable storage. Table 9 shows an example implementation of the mapping between a parsed element of an XML file and a Python object.
Individual patterns in the system are used to identify possible messages that require specific actions. It is straightforward to add additional pattern types to the system. The ones shown here are essential to the operation of the system, but may be extended regularly. Probabilistic classifiers include an “unsure” state which can optionally display a dialog that requires the sender to decide in which category the message actually belongs. The preferred embodiment offers all such decisions as part of a single dialog, but alternate embodiments can offer such decisions sequentially or defer them until they are required as part of the decision making process. Care is taken to make sure that the classifier is executed only once per message. Table 10 shows an example implementation of the patterns included in the preferred embodiment.
The rules file represents the set of patterns, actions, and policies that are being implemented on behalf of the client. In a preferred embodiment, this file is an ordinary XML file and can be generated, manipulated, parsed, and managed using any set of XML tools. There is no preferred rules file, as the contents are entirely dependent on the requirements of the sender and the sender's organization. Table 11 is an example rules file, according to one embodiment of the present invention.
The rules file has a grammar that may be described in an ordinary DTD file, such as the example embodiment shown in Table 12. The grammar is an ordinary XML grammar and could be replaced with any comparable grammar that can be straightforwardly parsed with standard XML parsing tools.
While a preferred software embodiment is disclosed, many other implementations will occur to one of ordinary skill in the art and are all within the scope of the invention. The currently preferred implementation of the invention is as a software component plug-in to an email client, but any other implementation known in the art would be suitable including, but not limited to: (a) a complete email client, with integrated functionality; (2) a complete web application, with integrated functionality; (3) a software component plug-in to other document generation programs, such as Microsoft Word; (4) an entire document generating program; and (5) a server service, providing centralized handling, like a central document comparison system.
Each of the various embodiments described above may be combined with other described embodiments in order to provide multiple features. Furthermore, while the foregoing describes a number of separate embodiments of the apparatus and method of the present invention, what has been described herein is merely illustrative of the application of the principles of the present invention. Other arrangements, methods, modifications, and substitutions by one of ordinary skill in the art are therefore also considered to be within the scope of the present invention, which is not to be limited except by the claims that follow.
This application claims the benefit of U.S. Provisional Application Ser. No. 60/652,569, filed Feb. 14, 2005, and claims the benefit under 35 U.S.C. 371 of PCT International Application Ser. No. PCT US2006/005256, filed Feb. 14, 2006, the entire disclosures of which are herein incorporated by reference.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2006/005256 | 2/14/2006 | WO | 00 | 8/14/2007 |
Number | Date | Country | |
---|---|---|---|
60652569 | Feb 2005 | US |