Employees of enterprise organizations receive a variety of types of electronic messages. Some of these messages may be wanted (e.g., legitimate communications made among employees of a given enterprise, or made between employees and entities outside of the enterprise). Others of these messages may be malicious (e.g., attempting to compromise computing infrastructure or defraud the recipient) or otherwise unwanted. Unfortunately, differentiating between various types of messages can be a daunting task, particularly as the number of electronic messages an individual receives on a given day increases. Accordingly, there is an ongoing need for improvements to techniques for managing electronic messages.
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
I. Introduction
The term “graymail” refers to solicited bulk email messages that do not fit the conventional definition of spam, typically because the recipient “opted into” receiving those bulk email messages. Recipient interest in this type of mailing tends to diminish, however, so the likelihood that recipients will report graymail as spam often increases over time. In comparison to spam, graymail can be identified by one or more of the following traits (though need not have all three of the following traits):
In addition to the above traits, there is often a timeliness component to graymail. That is, the utility of an email message determined to be representative of graymail will normally expire or become less useful after a period of time. Notifications of sales or upcoming events are examples of graymail with timeliness components, since these email messages are only valid for a period of time even though recipients may (and often do) read these email messages after the period of time has expired.
While all recipients of graymail “opted in,” either knowingly or unknowingly, to receiving bulk email messages, these email messages have varying value to different recipients. Example categories of graymail include (1) promotional messages (e.g., sale notifications, new product notifications, etc., typically targeted to a topic/product the recipient previously indicated an interest in), (2) newsletters, (3) event invitations (e.g., to conferences, fireside chats, etc.), and (4) cold calls from potential vendors/service providers, typically directed to those with purchasing authority (e.g., generated based on the sender reviewing the recipient's website profile or a profile on LinkedIn). In enterprises (also referred to as “businesses” or “organizations”), graymail can be difficult to manage as some recipients will want to receive at least some of these email messages while other recipients will perceive these email messages as spam. For example, a recipient in the purchasing department may want to receive advertisements for product pricing while a recipient in the marketing department may view these advertisements as spam. Similarly, a recipient in the engineering department may want to receive invitations to conferences/speaking opportunities, while a person in the human resources department may view such types of messages as spam. And, while cold calls may appear to be spam to certain employees, employees engaged in external business relations may consider such messages to be of high value (e.g., in establishing new partnerships). Further, different graymail folders can have differently configurable default expirations, e.g., to reflect the time sensitivity such messages typically have. For example, any graymail moved into a “newsletter” folder of a user can have a default expiration of 90 days, while any graymail moved into a “promotions” folder of the user can have a different default expiration if desired (e.g., 30 days).
Described herein are various computer programs and associated computer-implemented techniques for discovering graymail in the incoming email messages received by employees of an enterprise. For the purpose of illustration, assume that an enterprise receives an incoming email message (“email”) addressed to a recipient, such as an employee of the enterprise (or one or more accounts shared by multiple such employees, etc.). To establish the risk posed by the incoming email, its content or context can be analyzed by a scoring module. For example, the scoring module can examine the incoming email to identify the alleged identity of the sender, email address of the sender, content of the subject line, content of the body, attachments, etc. Further information on example ways incoming emails can be examined can be found in U.S. Pat. No. 10,911,489, which is incorporated by reference herein in its entirety. Generally, if the incoming email is determined to represent a risk to the security of the enterprise, then the incoming email is quarantined and not permitted to reach the inbox of the recipient. However, if the scoring module determines that the incoming email does not represent a risk, then the scoring module can determine what actions, if any, are appropriate for dealing with the incoming email.
As part of this risk determination process, the scoring module may determine the likelihood that the incoming email is representative of graymail. If the scoring module determines that the incoming email is not representative of graymail, then the incoming email can be permitted to proceed to the inbox of the recipient. However, if the incoming email is determined to represent graymail, then the scoring module may indicate as much to a remediation module that is responsible for handling the incoming email. As further discussed below, the remediation module can be responsible for implementing a graymail remediation service (“remediation service”) for automatically moving (e.g., using an application programming interface provided by the enterprise's mail system) incoming emails (that are representative of graymail) to appropriate folders, e.g., using native folder infrastructure of a user's mail account (whether those folders are created automatically on behalf of the user, or at the user's direction). A benefit of this approach is that, irrespective of which mail client the user chooses to access their electronic mail, graymail will be automatically sorted into an appropriate folder, without, for example, requiring modification to the mail client. Thus, a user can choose to view messages using a native mail client (e.g., provided by a phone or computer manufacturer, or an operating system provider) or third party mail client (e.g., software provided by an entity other than a device or operating system manufacturer) and receive inbox decluttering benefits described herein.
Referring again to the scenario described above, assume that the scoring module determines that the incoming email is representative of graymail but poses no risk to the security of the enterprise. In this situation, the scoring module can transmit a request to the remediation module to handle the incoming email. Initially, the remediation module may confirm that a graymail folder (also referred to herein as a “promotions folder”) has been created for the recipient. Generally, the graymail folder is accessible through the employee's mailbox similar to other folders, such as sent folders, spam folders, etc. If a graymail folder was previously created, then the remediation module can transfer the incoming email into the graymail folder. However, if a graymail folder was not previously created, then the remediation module can create a graymail folder into which the incoming email is transferred. If the user subsequently moves a message in the promotions folder back to the user's inbox (e.g., as reported by a cloud-based email suite to a threat detection platform), in some embodiments, the remediation module automatically generates a rule (e.g., based on sender domain, address, etc.) to prevent future messages sent to that user by that sender from being re-routed to the user's promotions folder by the remediation module. Similarly, if the user subsequently moves a message in the promotions folder to the user's spam folder, the remediation module can automatically generate a rule to route future messages from that sender to the user's spam folder instead of to the user's promotions folder (or inbox).
Note that more than one graymail folder can be created for a given recipient. For example, the remediation module can create separate folders for different types of graymail (or as subfolders under a more general graymail folder). Thus, a single recipient may have different graymail folders for promotions, newsletters, invitations/speaking engagements, cold calls, etc. The remediation module can determine the appropriate graymail folder based on an output produced by the scoring module. For example, the remediation module may determine the appropriate graymail folder based on whether the scoring module has labeled an incoming email as representative of a promotion, advertisement, or newsletter based on analysis of its content. The scoring module can use a set of heuristics/rules and/or machine learning models to identify graymail. In an example implementation, different types of graymail can be used as ground truth training data to develop a set of models that can collectively identify graymail and further classify the graymail into one of a variety of subcategories. For example, a set of invitations to conferences/other speaking opportunities/etc. can be used to train a graymail subcategory related to such events. As another example, a set of newsletters can be used to train a graymail subcategory related to such content. The set of models (e.g., including ones specific to specific types of graymail and/or ones trained generally on graymail) can be used to classify incoming messages as graymail (and, as applicable, subcategories of graymail) and ultimately place such messages in an appropriate folder or subfolder.
Embodiments may be described herein with reference to certain types of graymail or certain features of incoming email. However, features of those embodiments may be similarly applicable to other types of graymail and other features of incoming email. As an example, while embodiments may be described in the context of a scoring module that determines risk based on the email address of the sender, the scoring module could consider other feature(s) of the incoming email instead of, or in addition to, the email address of the sender.
While embodiments may be described in the context of computer-executable instructions, aspects of the technologies described herein can be implemented via hardware, firmware, or software. As an example, the scoring module and remediation module may be embodied as instruction sets executable by a computer program that offers support for discovering, classifying, and then remediating security threats.
A. Terminology
References in this description to “an embodiment” or “one embodiment” means that the feature, function, structure, or characteristic being described is included in at least one embodiment of the technology. Occurrences of such phrases do not necessarily refer to the same embodiment, nor are they necessarily referring to alternative embodiments that are mutually exclusive of one another.
Unless the context clearly requires otherwise, the terms “comprise,” “comprising,” and “comprised of” are to be construed in an inclusive sense rather than an exclusive or exhaustive sense (i.e., in the sense of “including but not limited to”). The term “based on” is also to be construed in an inclusive sense rather than an exclusive or exhaustive sense. Thus, unless otherwise noted, the term “based on” is intended to mean “based at least in part on.”
The terms “connected,” “coupled,” or any variant thereof is intended to include any connection or coupling between two or more elements, either direct or indirect. The connection/coupling can be physical, logical, or a combination thereof. For example, objects may be electrically or communicatively coupled to one another despite not sharing a physical connection.
The term “module” refers broadly to software components, firmware components, or hardware components. Modules are typically functional components that generate output(s) based on specified input(s). A computer program may include one or more modules. Thus, a computer program may include multiple modules responsible for completing different tasks or a single module responsible for completing all tasks.
When used in reference to a list of multiple items, the term “or” is intended to cover all of the following interpretations: any of the items in the list, all of the items in the list, and any combination of items in the list.
The sequences of steps performed in any of the processes described here are exemplary. However, unless contrary to physical possibility, the steps may be performed in various sequences and combinations. For example, steps could be added to, or removed from, the processes described here. Similarly, steps could be replaced or reordered. Thus, descriptions of any processes are intended to be open-ended.
B. Conventional Filtering Services
Basic filtering services are offered by most email platforms.
Generally, the anti-spam filter 104 is designed to quarantine malicious emails using blacklists of senders, sender email addresses, and Uniform Resource Locators (URLs) that have been detected in past unsolicited emails or defined in policy frameworks created by the enterprise. The term “anti-spam filter,” as used herein, can refer to any legacy email security mechanism capable of filtering incoming emails, including secure email gateways (SEGs) (also referred to as “gateways”). For example, the enterprise (or the email service) can maintain a list of sender email addresses from which malicious email has been received in the past. As another example, an enterprise may decide to implement a policy that prohibits employees from receiving emails originating from a given domain. Malicious emails that are caught by the anti-spam filter 104 can be quarantined so as to remain hidden from the intended recipients, while non-malicious emails may be stored on an email server 106 for subsequent access by the intended recipients. Email servers (also referred to as “mail servers”) facilitate the delivery of emails from senders to recipients. Normally, an email will be transferred amongst a series of email servers as it travels toward its intended destination. This series of email servers allows emails to be sent between dissimilar address domains.
Because of the manner in which anti-spam filters are deployed, however, these filters struggle to handle graymail in an appropriate manner. As discussed above, graymail generally is not considered spam by the anti-spam filter 104 (e.g., since those email messages are transmitted by a legitimate source, contain legitimate content, etc.). And, in contrast with malicious emails (for example), which are generally universally unwanted, different users may ascribe varying degrees of value to a particular piece of graymail. Accordingly, new approaches are needed in order to appropriately handle graymail.
II. Threat Detection Platform
Threat detection platform 200 can acquire data related to digital activities performed with email accounts and then determine, based on an analysis of the data, how to handle graymail in a personalized manner. As shown in
Threat detection platform 200 can be implemented, partially or entirely, within an enterprise network 212, a remote computing environment (e.g., through which emails, or information related to those emails, can be routed for analysis), a gateway, or another suitable location. The remote computing environment can belong to, or be managed by, the enterprise or another entity. In some embodiments, threat detection platform 200 is integrated into the enterprise's email system (e.g., at the SEG) as part of an inline deployment. In other embodiments, threat detection platform 200 is integrated into the enterprise's email system via an application programming interface (API) such as the Microsoft Outlook® API. In such embodiments, threat detection platform 200 can obtain email data via the API. Thus, the threat detection platform 200 can supplement and/or supplant other security products employed by the enterprise.
In a first variation, threat detection platform 200 is maintained by a threat service (also referred to herein as a “security service”) that has access to multiple enterprises' data. In this variation, threat detection platform 200 can route data related to incoming email to a computing environment managed by the security service. The computing environment can be, for example, an instance on Amazon Web Services® (AWS). Threat detection platform 200 can maintain one or more databases for each enterprise it services that include, for example, organizational charts (and/or other user/group identifiers/memberships, indicating information such as “Alice is a member of the Engineering group” and “Bob is a member of the Marketing group”), attribute baselines, communication patterns, etc. Additionally or alternatively, threat detection platform 200 can maintain federated databases that are shared among multiple entities. Examples of federated databases include databases specifying vendors and conferences for which graymail may be transmitted. The security service can maintain different instances of threat detection platform 200 for different enterprises, or the security service can maintain a single instance of the threat detection platform 200 for multiple enterprises, as applicable. The data hosted in these instances can be obfuscated, encrypted, hashed, depersonalized (e.g., by removing personal identifying information), or otherwise secured or secreted as applicable. Accordingly, in various embodiments, each instance of threat detection platform 200 is only able to access/process data related to the incoming emails addressed to email accounts associated with the corresponding enterprise(s).
In a second variation, threat detection platform 200 is maintained by the enterprise whose emails are being monitored—either remotely or on premises. In this variation, all relevant data related to incoming emails may be hosted by the enterprise itself, and any information to be shared across multiple enterprises can be transmitted to a computing system maintained by the security service or a third party, as applicable.
As shown in
Enterprise network 212 can be a mobile network, wired network, wireless network, or some other communication network (or combination of networks) maintained by the enterprise or an operator on behalf of the enterprise. As noted above, the enterprise can use a security service to examine emails (among other things) to discover possible instances of graymail. The enterprise may grant permission to the security service to monitor the enterprise network 212 by examining emails (e.g., incoming emails or outgoing emails), identifying emails that are representative of graymail, and then performing appropriate remediation actions for those emails. In some embodiments, the enterprise further grants permission to the security service to obtain data regarding other digital activities involving the enterprise (and, more specifically, employees of the enterprise) in order to build a profile that specifies communication patterns, behavioral traits, normal content of emails, etc. For example, threat detection platform 200 may identify the filters created by each employee to infer which incoming emails are representative of graymail and/or which graymail is no longer desired (and thus should be diverted). Such filters may comprise rules manually specified by the user (e.g., by the user explicitly interacting with tools made available by cloud-based email suite 308) and/or may also be inferred based on users' interactions with their mail (e.g., by obtaining from cloud-based email suite 308 log data indicating which messages the user has moved from an inbox to a promotions folder or spam folder, or vice versa) and automatically generating rules for automatically moving messages on behalf of the user in the future (without the user having to manually create such rules).
Threat detection platform 200 can manage one or more databases in which data can be stored. Examples of such data include enterprise data (e.g., email data and mail filter data), remediation policies, communication patterns, behavioral traits, and the like. The data stored in the database(s) can be determined by the threat detection platform 200 (e.g., learned from data available on the enterprise network 212), provided by the enterprise, and/or retrieved from an external database (e.g., associated with LinkedIn® or Microsoft Office 365®) as applicable. Threat detection platform 200 can also store outputs produced by the various modules, including machine- and human-readable information regarding discovered instances of graymail and any remediation actions that were taken.
As shown in
An example profile includes a number of behavioral traits associated with a given email account. For example, profile generator 202 can determine behavioral traits based on email data and mail filter data obtained from the enterprise network 212. The email data may include information on the senders of past emails received by a given email account, content of those past emails, frequency of those past emails, temporal patterns of those past emails, topics of those past emails, geographical location from which those past emails originated, formatting characteristics (e.g., usage of HTML, fonts, styles, etc.), and more. Thus, profile generator 202 can attempt to build a profile for each email account that represents a model of normal behavior of the corresponding employee. As further discussed below, the profiles can be helpful in identifying the emails that are likely representative of graymail, as well as establishing how each employee handles graymail (including different types of graymail).
Monitoring module 206 is responsible for monitoring emails handled by enterprise network 212. These emails can include both incoming emails (e.g., external and internal emails) received by email accounts associated with employees of the enterprise and outgoing emails (e.g., external and internal emails) transmitted by those email accounts. Monitoring module 206 is able to monitor incoming emails in near real time so that appropriate action can be taken, in a timely fashion, if graymail is discovered. For example, if an incoming email is determined to be representative of graymail (e.g., based on an output produced by scoring module 208), the incoming email can be transferred into a dedicated folder by remediation module 210. In some embodiments, monitoring module 206 is able to monitor emails only upon threat detection platform 200 being granted permission by the enterprise (and thus given access to enterprise network 212).
Scoring module 208 can be responsible for examining emails to determine the likelihood that each email is representative of graymail. For example, scoring module 208 can examine each incoming email to determine how its characteristics compare to past emails received by the intended recipient. In such embodiments, scoring module 208 may determine whether characteristics such as timing, formatting, and location of origination (e.g., in terms of sender email address or geographical location) match a pattern of past emails that have been determined to represent graymail. For example, scoring module 208 may determine that an email is highly likely to be graymail if its formatting and content are similar to past emails received at a consistent periodic basis (e.g., daily or weekly).
Scoring module 208 can make use of heuristics, rules, neural networks, or other trained machine learning (ML) approaches such as decision trees (e.g., gradient-boosted decision trees), logistic regression, and linear regression. Accordingly, scoring module 208 can output discrete outputs or continuous outputs, such as a probability metric (e.g., specifying the likelihood that an incoming email is graymail), a binary output (e.g., graymail or not graymail), or a sub-classification (e.g., specifying the type of graymail such as promotions, newsletters, events, and cold calls).
Remediation module 210 can perform one or more remediation actions in response to scoring module 208 determining that an incoming email is likely representative of graymail. The remediation action(s) can be based on whether past instances of graymail have been handled for the same employee, the nature of the graymail, the policies implemented by the enterprise or employee, etc. These policies can be predefined or dynamically generated based on inference, analysis, or the data obtained from enterprise network 212. Additionally or alternatively, remediation action(s) may be based on the outputs produced by the models employed by the various modules, as further discussed below. Examples of remediation actions include creating a graymail folder into which emails that are representative of graymail can be transferred, transferring emails into a graymail folder, and/or transferring emails into another folder such as a quarantine folder. Generally, the graymail folder is accessible through a mail client as other folders, such as sent folders, draft folders, spam folders, etc. Accordingly, while remediation module 210 may redirect graymail before it would otherwise populate into the inbox of the intended recipient, the remediation module 210 may not make graymail inaccessible to the recipient. Stated another way, the transfer of graymail into dedicated folders can be used to declutter the inboxes of employees of the enterprise. Conversely, some graymail (e.g., those emails that may represent a threat) may be transferred to a hidden folder (also referred to as a “quarantine folder”) for further analysis. Emails transferred to the hidden folder may remain inaccessible until the threat detection platform 200 has determined whether to release those emails (e.g., into the inbox or graymail folder), or other applicable event or set of events has occurred to either release it from quarantine (e.g., into an inbox or other folder) or delete it (e.g., if it is determined to represent a threat).
In some embodiments, remediation module 210 provides results produced by scoring module 208 or some other output (e.g., a notification summarizing the graymail that has been found) to an electronic device 214. Electronic device 214 may be managed by the employee associated with the email account under examination, an individual associated with the enterprise (e.g., a member of the information technology department), an individual associated with a security service, etc. In some embodiments, remediation module 210 sends the output in a human-readable format for display on an interface accessible via the electronic device 214. As an example, remediation module 210 can generate a summary of emails that were transferred to the graymail folder. This summary can be provided to the employee to whom these emails were addressed. Through electronic device 214, the employee can specify whether the appropriate action was taken. For instance, the employee may indicate that an email should not have been classified as graymail, or the employee may be able to indicate that an email should have been instead classified as spam. Such indications can be used to improve the treatment of messages sent to that employee in the future (or, as applicable, sent to others, such as sent to other members of the same organizational unit/group, enterprise, etc.).
Various embodiments of threat detection platform 200 include a training module 204 that operates to train the models employed by the other modules. As an example, training module 204 may train the models applied by scoring module 208 to the email data and mail filter data by feeding training data into those models. The training data could include emails that have been labeled as attacks or non-attacks, policies related to attributes of emails (e.g., specifying that emails originating from certain domains should not be considered graymail), etc. The training data may be employee-, group-, or enterprise-specific so that the model(s) are able to perform personalized analysis. In some embodiments, the training data ingested by the model(s) includes emails that are known to be representative of graymail. These emails may have been labeled as such during a training process, or these emails may have been labeled as such by other employees.
A. Graymail Discovery, Classification, and Remediation
Generally, remediation module 210 interacts with two forms of storage while implementing graymail remediation services. First, remediation module 210 may interact with an object-relational-mapping (ORM) model 304 for recording actions performed by graymail remediation service 302. ORM model 304 can create objects that map to relational data that define actions taken. Second, remediation module 210 may interact with a memory cache (also referred to herein as a “cache”) 306 that stores a mapping of employee identifiers to folders and accompanying metadata. As an example, cache 306 can associate email accounts of employees of an enterprise with the folders that can be found in those employees' email accounts. The cache can be read by remediation module 210 in order to find each account's graymail folder(s). If no graymail folder exists and one is created by the remediation module, then the remediation module can update the cached state to indicate that a graymail folder was created.
Though training of scoring module 208 and remediation module 210 may be supervised, graymail remediation service 302 can be implemented in an entirely automated manner. Thus, in various embodiments, remediation module 210 may not require any input from the employees or enterprise whose emails are being monitored.
In various embodiments, a small collection of email messages are labeled to measure live performance of graymail remediation service 406. These email messages can be stored fully in ORM model 408 so that full analysis can be performed. In other embodiments, ORM model 408 maintains one or more data structures (e.g., tables) in which information regarding graymail can be stored. For example, remediation module 410 may transmit a log of email messages moved to graymail folder 412 (also referred to as a “promotions folder”) so that ORM model 408 includes a data structure that reflects the results of graymail remediation. In some embodiments, more detailed information regarding the graymail is stored in ORM model 408. For example, as shown in
Since it is dedicated to remediating graymail, graymail remediation service 406 can be implemented in a less resource-intensive manner than a service for addressing a broad variety of security threats. Nonetheless, graymail remediation service 406 can provide various (including all) of the following guarantees in various embodiments:
Note that the number of guarantees may depend on the amount of resources available to the remediation module (and the threat detection platform as a whole) and the amount of insight into incoming emails that is desired.
In some embodiments, threat detection platform 400 tracks how email messages moved to promotions folder 412 are subsequently handled by a user. For example, threat detection platform 400 can employ an ML approach that tracks whether email messages moved to the promotions folder by graymail remediation service 406 are subsequently deleted by the recipient or moved to another folder (e.g., an inbox, or other folder, such as “online shopping” or “travel deals”) by the user. The insights gained by this ML approach can be used in further training the remediation module to identify graymail and also in automatically handling future received messages differently (e.g., when subsequent graymail of a particular type is received, moving it to the user's custom folder, such as “travel deals”).
Moreover, threat detection platform 400 can maintain a list of email addresses corresponding to employees that have explicitly or implicitly opted out of graymail remediation services. For example, threat detection platform 400 can maintain a list of email addresses corresponding to employees who have deleted or renamed the promotions folders made for them by the remediation module. This information may be useful to the enterprise, for example, to identify those employees who have opted not to have graymail automatically filtered. Further follow up (e.g., from the enterprise or a security service) may indicate that these employees find too many non-graymail emails have been transferred to the promotions folder, or that these employees prefer the mail filters that have been manually created to capture graymail originating from certain sources, etc.
B. Managing Graymail at Scale
Many individuals will receive upwards of one hundred times more graymail than emails related to sophisticated attacks (e.g., phishing). A threat detection platform could store the same amount of data per graymail message as per “attack” message discovered by monitoring inbound email. However, such an approach could result in too much data being stored. Because the storage system (also referred to herein as “storage infrastructure”) is shared across different services supported by the threat detection platform, overloading the storage system could impact production of the threat detection platform as a whole. As an example, making too many email messages available for labeling (and training/model creation) may result in overflow of the queue for review. To handle data at such scales, in various embodiments, threat detection platform 400 includes, or has access to, a tiered storage system (“tiered data persistence”) in which graymail occupies only a fraction of the total storage space. Moreover, graymail can be processed by distinct lightweight modules (e.g., those described above with reference to
An example way of implementing tiered data persistence is to control the percentage of graymail for which data is stored in a tiered storage system (e.g., for use as training data, for use in verification of system reliability, etc.). As an example, threat detection platform 400 can store minimal information (e.g., only that information needed for identification purposes) and metadata for most graymail, and threat detection platform 400 can store complete information for a small subset of the rest of the graymail. An example goal is to store complete information for less than 1, 3, or 5 percent of all graymail. In some embodiments, an administrator or other appropriate individual is able to specify the applicable percentage through an interface provided by the threat detection platform (e.g., an administrative web frontend). Additionally or alternatively, threat detection platform 400 can automatically determine and/or manage the percentage based on predetermined parameters, such as the amount of available computing resources, and/or the rate at which graymail is being received.
In various embodiments, threat detection platform 400 implements a dedicated series of interconnected modules (referred to collectively as a “pipeline”) for handling only graymail. Assume, for example, that threat detection platform 400 includes a scoring module that is responsible for making an initial determination as to whether each incoming email message should be classified as safe, unsafe (i.e., representative of an attack), or graymail. Threat detection platform 400 can include logic to ensure that the module(s) responsible for subsequently handling emails determined to be attacks and the module(s) responsible for subsequently handling graymail do not operate on the same message. The former can be referred to as “attack modules,” while the latter can be referred to as “graymail modules.” While an initial verdict of whether a given email is in fact unsafe may change due to subsequent analysis (e.g., by a human or machine), an initial classification of a message as graymail can be treated (e.g., by graymail modules) as final, because graymail is, in large part, readily confirmable. As such, simplifications can be made that are not possible with the attack modules. Example benefits of using a dedicated graymail pipeline include:
C. Passive Mode for Graymail Discovery Service
An enterprise might desire to initially observe how threat detection platform 400 will handle messages in accordance with techniques described herein before fully implementing graymail handling services (e.g., during a trial period of days, weeks, or months). During that time, threat detection platform 400 can detect and report graymail (and actions that would have been otherwise taken) without actually moving those email messages or otherwise changing recipients' mailboxes. Because the threat detection platform is passively monitoring incoming email messages without impeding those email messages from reaching the intended destination, this mode can also be referred to as “passive mode” for the graymail services.
One benefit of passive mode is that it permits entities to experience a risk-free trial during which confidence in the graymail discovery service can be established. Entities can observe/confirm whether graymail services are behaving as intended (e.g., by identifying graymail, or as applicable, particular subcategories of graymail that should be filtered from recipients' inboxes). A second benefit of passive mode is that it allows adjustments to be made to threat detection platform 400 (e.g., adjustments to be made to settings/configurations of graymail remediation service 406) to adjust performance as necessary. As an example, tuning can be performed for each entity whose email messages are being monitored to account for differences (e.g., in the senders, content, or relevance of incoming messages) between entities. Since the email landscape is different for each entity, this trial period allows changes to be learned by, or implemented in, models employed by the threat detection platform. In some embodiments, during passive mode, employees are encouraged to forward examples of graymail that they receive to dedicated training data collection email addresses (e.g., newsletters@examplecompany.com or events@examplecompany.com) to help customize/tune models/heuristics more specifically to that organization/its users. Further, either during and/or after operating in passive mode, employees can be encouraged to manually move graymail from their inboxes to a graymail folder, or from a spam folder to their inboxes (or a graymail folder), etc. Such user actions, observed by a threat detection platform (e.g., using API calls/log data provided by cloud-based email suite 308) can be used by the threat detection platform to fine-tune graymail handling based on individual preferences. As an example, a first employee may wish to send all airline-related promotions to a graymail folder, while a second employee may wish to send the same messages to a spam folder (or some to a spam folder and some to an inbox, etc.). The threat detection platform can automatically generate different rules for future handling of such messages on behalf of the two different users' actions/preferences.
D. Example Processing System
Processing system 500 includes a processor 502, main memory 506, non-volatile memory 510, network adapter 512 (e.g., a network interface), video display 518, input/output device 520, control device 522 (e.g., a keyboard, pointing device, or mechanical input such as a button), drive unit 524 that includes a storage medium 526, or signal generation device 530 that are communicatively connected to a bus 516. Bus 516 is illustrated as an abstraction that represents one or more physical buses and/or point-to-point connections that are connected by appropriate bridges, adapters, or controllers. Bus 516, therefore, can include a system bus, Peripheral Component Interconnect (PCI) bus, PCI-Express bus, HyperTransport bus, Industry Standard Architecture (ISA) bus, Small Computer System Interface (SCSI) bus, Universal Serial Bus (USB), Inter-Integrated Circuit (I2C) bus, and/or a bus compliant with Institute of Electrical and Electronics Engineers (IEEE) Standard 1394, etc.
While main memory 506, non-volatile memory 510, and storage medium 526 are shown to be a single medium, the terms “storage medium” and “machine-readable medium” should be taken to include a single medium or multiple media that store one or more sets of instructions 528. The terms “storage medium” and “machine-readable medium” should also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the processing system 500. Further examples of machine- and computer-readable media include recordable-type media such as volatile and non-volatile memory devices 510, removable disks, hard disk drives, optical disks (e.g., Compact Disk Read-Only Memory (CD-ROMS) and Digital Versatile Disks (DVDs)), cloud-based storage, and transmission-type media such as digital and analog communication links.
In general, the routines executed to implement embodiments described herein can be implemented as part of an operating system or a specific application, component, program, object, module, or sequence of instructions (collectively referred to as “computer programs”). The computer programs typically comprise one or more instructions (e.g., instructions 504, 508, and/or 528) set at various times in various memories and storage devices in an electronic device. When read and executed by processor 502, the instructions cause processing system 500 to perform operations to execute various aspects of techniques described herein.
Network adapter 512 allows processing system 500 to mediate data in a network 514 with an entity that is external to the processing system 500 through any communication protocol supported by the processing system 500 and the external entity. Examples of network adapter 512 include a network adaptor card, a wireless network interface card, a switch, a protocol converter, a gateway, a bridge, a hub, a receiver, a repeater, and/or a transceiver that includes an integrated circuit (e.g., enabling communication over Bluetooth or Wi-Fi), etc.
Techniques introduced here can be implemented using software, firmware, hardware, or a combination of such forms. For example, various aspects can be implemented using special-purpose hardwired (i.e., non-programmable) circuitry in the form of application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), and the like.
E. Example Process
Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.
This application claims priority to U.S. Provisional Patent Application No. 63/105,020 entitled DISCOVERING GRAYMAIL THROUGH REAL-TIME ANALYSIS OF INCOMING EMAIL filed Oct. 23, 2020 which is incorporated herein by reference for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
6088717 | Reed | Jul 2000 | A |
6161130 | Horvitz | Dec 2000 | A |
6816884 | Summers | Nov 2004 | B1 |
7451487 | Oliver | Nov 2008 | B2 |
7546351 | Horstmann | Jun 2009 | B1 |
7577709 | Kolcz | Aug 2009 | B1 |
7610344 | Mehr | Oct 2009 | B2 |
8131655 | Cosoi | Mar 2012 | B1 |
8191138 | Emigh | May 2012 | B1 |
8214437 | Alspector | Jul 2012 | B1 |
8805934 | Dargahi | Aug 2014 | B2 |
9009824 | Chen | Apr 2015 | B1 |
9021560 | Emigh | Apr 2015 | B1 |
9177293 | Gagnon | Nov 2015 | B1 |
9245225 | Winn | Jan 2016 | B2 |
9537880 | Jones | Jan 2017 | B1 |
9756007 | Stringhini | Sep 2017 | B1 |
10104029 | Chambers | Oct 2018 | B1 |
10243989 | Ding | Mar 2019 | B1 |
10277628 | Jakobsson | Apr 2019 | B1 |
10362057 | Wu | Jul 2019 | B1 |
10397272 | Bruss | Aug 2019 | B1 |
10601865 | Mesdaq | Mar 2020 | B1 |
10616272 | Chambers | Apr 2020 | B2 |
10749818 | Bikumala | Aug 2020 | B1 |
10834127 | Yeh | Nov 2020 | B1 |
10911489 | Chechik | Feb 2021 | B1 |
20020002520 | Gatto | Jan 2002 | A1 |
20020147988 | Nakano | Oct 2002 | A1 |
20020161782 | Bozionek | Oct 2002 | A1 |
20030009698 | Lindeman | Jan 2003 | A1 |
20030050988 | Kucherawy | Mar 2003 | A1 |
20030212745 | Caughey | Nov 2003 | A1 |
20030224760 | Day | Dec 2003 | A1 |
20040003283 | Goodman | Jan 2004 | A1 |
20040024719 | Adar | Feb 2004 | A1 |
20040039786 | Horvitz | Feb 2004 | A1 |
20040083270 | Heckerman | Apr 2004 | A1 |
20040088359 | Simpson | May 2004 | A1 |
20040117450 | Campbell | Jun 2004 | A1 |
20040167964 | Rounthwaite | Aug 2004 | A1 |
20040186848 | Kobashikawa | Sep 2004 | A1 |
20040267557 | Liu | Dec 2004 | A1 |
20040267886 | Malik | Dec 2004 | A1 |
20050039019 | Delany | Feb 2005 | A1 |
20050044423 | Mellmer | Feb 2005 | A1 |
20050055414 | Laakkonen | Mar 2005 | A1 |
20050076240 | Appleman | Apr 2005 | A1 |
20050101306 | Zabawskyj | May 2005 | A1 |
20050102366 | Kirsch | May 2005 | A1 |
20050165895 | Rajan | Jul 2005 | A1 |
20050204006 | Purcell | Sep 2005 | A1 |
20050288961 | Tabrizi | Dec 2005 | A1 |
20060161989 | Reshef | Jul 2006 | A1 |
20060168024 | Mehr | Jul 2006 | A1 |
20060253581 | Dixon | Nov 2006 | A1 |
20060277259 | Murphy | Dec 2006 | A1 |
20070016613 | Foresti | Jan 2007 | A1 |
20070038705 | Chickering | Feb 2007 | A1 |
20070112830 | Danas | May 2007 | A1 |
20070276851 | Friedlander | Nov 2007 | A1 |
20080052398 | Elshishiny | Feb 2008 | A1 |
20080059586 | Keohane | Mar 2008 | A1 |
20080059590 | Sarafijanovic | Mar 2008 | A1 |
20080114684 | Foster | May 2008 | A1 |
20080127345 | Holtmanns | May 2008 | A1 |
20080133526 | Haitani | Jun 2008 | A1 |
20080147669 | Liu | Jun 2008 | A1 |
20080162651 | Madnani | Jul 2008 | A1 |
20080172468 | Almeida | Jul 2008 | A1 |
20080201401 | Pugh | Aug 2008 | A1 |
20080294730 | Oral | Nov 2008 | A1 |
20080319932 | Yih | Dec 2008 | A1 |
20090037350 | Rudat | Feb 2009 | A1 |
20090100073 | Dargahi | Apr 2009 | A1 |
20090125528 | Choi | May 2009 | A1 |
20090132490 | Okraglik | May 2009 | A1 |
20090149203 | Backholm | Jun 2009 | A1 |
20090181651 | Klassen | Jul 2009 | A1 |
20090287618 | Weinberger | Nov 2009 | A1 |
20100017476 | Shue | Jan 2010 | A1 |
20100036786 | Pujara | Feb 2010 | A1 |
20100145900 | Zheng | Jun 2010 | A1 |
20100211641 | Yih | Aug 2010 | A1 |
20100223349 | Thorson | Sep 2010 | A1 |
20100250579 | Levow | Sep 2010 | A1 |
20100318614 | Sager | Dec 2010 | A1 |
20110035451 | Smith | Feb 2011 | A1 |
20110173142 | Dasgupta | Jul 2011 | A1 |
20110179126 | Wetherell | Jul 2011 | A1 |
20110282948 | Vitaldevara | Nov 2011 | A1 |
20110282954 | Flake | Nov 2011 | A1 |
20120042017 | Fried | Feb 2012 | A1 |
20120110672 | Judge | May 2012 | A1 |
20120143962 | Bank | Jun 2012 | A1 |
20120191716 | Omoigui | Jul 2012 | A1 |
20120215861 | Smith | Aug 2012 | A1 |
20120233662 | Scott-Cowley | Sep 2012 | A1 |
20120253924 | Giese | Oct 2012 | A1 |
20130024910 | Verma | Jan 2013 | A1 |
20130086180 | Midgen | Apr 2013 | A1 |
20130191469 | Dichiu | Jul 2013 | A1 |
20130191759 | Bhogal | Jul 2013 | A1 |
20130304742 | Roman | Nov 2013 | A1 |
20140032589 | Styler | Jan 2014 | A1 |
20140181223 | Homsany | Jun 2014 | A1 |
20140365303 | Vaithilingam | Dec 2014 | A1 |
20140379825 | Speier | Dec 2014 | A1 |
20150128274 | Giokas | May 2015 | A1 |
20150143456 | Raleigh | May 2015 | A1 |
20150149921 | Hariharan | May 2015 | A1 |
20150161609 | Christner | Jun 2015 | A1 |
20150188862 | Ghafourifar | Jul 2015 | A1 |
20150234831 | Prasanna Kumar | Aug 2015 | A1 |
20150237068 | Sandke | Aug 2015 | A1 |
20150244657 | Ghafourifar | Aug 2015 | A1 |
20150295942 | Tao | Oct 2015 | A1 |
20150319157 | Sherman | Nov 2015 | A1 |
20150381544 | Geva | Dec 2015 | A1 |
20160014151 | Prakash | Jan 2016 | A1 |
20160036829 | Sadeh-Koniecpol | Feb 2016 | A1 |
20160057167 | Bach | Feb 2016 | A1 |
20160072749 | Lu | Mar 2016 | A1 |
20160227367 | Alsehly | Aug 2016 | A1 |
20160262128 | Hailpern | Sep 2016 | A1 |
20160321243 | Walia | Nov 2016 | A1 |
20160328526 | Park | Nov 2016 | A1 |
20170041296 | Ford | Feb 2017 | A1 |
20170048273 | Bach | Feb 2017 | A1 |
20170063869 | Treleaven | Mar 2017 | A1 |
20170111506 | Strong | Apr 2017 | A1 |
20170142056 | Ganin | May 2017 | A1 |
20170186112 | Polapala | Jun 2017 | A1 |
20170223046 | Singh | Aug 2017 | A1 |
20170230323 | Jakobsson | Aug 2017 | A1 |
20170230403 | Kennedy | Aug 2017 | A1 |
20170237776 | Higbee | Aug 2017 | A1 |
20170289191 | Thioux | Oct 2017 | A1 |
20170324689 | Baek | Nov 2017 | A1 |
20170324767 | Srivastava | Nov 2017 | A1 |
20180026926 | Nigam | Jan 2018 | A1 |
20180084003 | Uriel | Mar 2018 | A1 |
20180091453 | Jakobsson | Mar 2018 | A1 |
20180091476 | Jakobsson | Mar 2018 | A1 |
20180109480 | Syrowitz | Apr 2018 | A1 |
20180131652 | Smith | May 2018 | A1 |
20180159808 | Pal | Jun 2018 | A1 |
20180188896 | Ghafourifar | Jul 2018 | A1 |
20180189347 | Ghafourifar | Jul 2018 | A1 |
20180205691 | Osipkov | Jul 2018 | A1 |
20180219817 | Zang | Aug 2018 | A1 |
20180219823 | Mohan | Aug 2018 | A1 |
20180232441 | Lin | Aug 2018 | A1 |
20180295146 | Kovega | Oct 2018 | A1 |
20180324297 | Kent | Nov 2018 | A1 |
20180375814 | Hart | Dec 2018 | A1 |
20190028509 | Cidon | Jan 2019 | A1 |
20190052655 | Benishti | Feb 2019 | A1 |
20190068616 | Woods | Feb 2019 | A1 |
20190087428 | Crudele | Mar 2019 | A1 |
20190141183 | Chandrasekaran | May 2019 | A1 |
20190166161 | Anand | May 2019 | A1 |
20190166162 | Anand | May 2019 | A1 |
20190205511 | Zhan | Jul 2019 | A1 |
20190349400 | Bruss | Nov 2019 | A1 |
20200034752 | Luo | Jan 2020 | A1 |
20200044851 | Everson | Feb 2020 | A1 |
20200053111 | Jakobsson | Feb 2020 | A1 |
20200067861 | Leddy | Feb 2020 | A1 |
20200076825 | Vallur | Mar 2020 | A1 |
20200097911 | Padmanaban | Mar 2020 | A1 |
20200125725 | Petersen | Apr 2020 | A1 |
20200127962 | Chuhadar | Apr 2020 | A1 |
20200204572 | Jeyakumar | Jun 2020 | A1 |
20200250527 | Zhao | Aug 2020 | A1 |
20200344251 | Jeyakumar | Oct 2020 | A1 |
20210092154 | Kumar | Mar 2021 | A1 |
20210126944 | Lesperance | Apr 2021 | A1 |
20210273950 | Lawson | Sep 2021 | A1 |
20210352093 | Hassanzadeh | Nov 2021 | A1 |
20210385181 | Parkinson | Dec 2021 | A1 |
20210400008 | Khan | Dec 2021 | A1 |
20210406836 | Bar-on | Dec 2021 | A1 |
20220021700 | Devlin | Jan 2022 | A1 |
20220131821 | Habal | Apr 2022 | A1 |
Entry |
---|
Barngrover, Adam, “Vendor Access Management with IGA”, Saviynt Inc. Apr. 24, 2019 (Apr. 24, 2019) Retrieved on Apr. 17, 2021 (Apr. 17, 2021) from <https://saviynt.com/vendor-access-management-with-iga/> entire document, 2 pp. |
International Search Report and Written Opinion dated Apr. 24, 2020 of PCT/US2019/067279 (14 pages). |
Mahjajan, Sonal, et al., “Finding HTML Presentation Failures Using Image Comparison Techniques”, ASE' 14, pp. 91-98 (Year: 2014). |
Mont, Marco Casassa, “Towards accountable management of identity and privacy: Sticky policies and enforceable tracing services”, 14th International Workshop on Database and Expert Systems Applications, 2003. Proceedings. IEEE, 2003. Mar. 19, 2003 (Mar. 19, 2003), Retrieved on Apr. 17, 2021 (Apr. 17, 2021) from <https://ieeexplore.ieee.org/abstract/documenV1232051 > entire document, Mar. 19, 2003, 17 pp. |
Proofpoint (Proofpoint Closed-Loop Email Analysis and Response, Aug. 2018, 2 pages) (Year: 2018). |
Number | Date | Country | |
---|---|---|---|
20220131821 A1 | Apr 2022 | US |
Number | Date | Country | |
---|---|---|---|
63105020 | Oct 2020 | US |