Aspects of this document relate generally to cybersecurity.
Systems and methods exist in the art to provide cybersecurity protections for computing systems and web domains. However, many cybersecurity threats persist. There exists a need to address ongoing and growing security threats to websites and to users of websites from various attacks including phishing attacks, malicious software attacks, and so forth.
Implementations of systems for categorizing and visualizing web domain details may include: one or more processors configured to automatically determine domain variants, using a provided seed domain, based on a level of similarity with the seed domain, the one or more processors further configured to categorize the domain variants into a plurality of categories; and one or more servers communicatively coupled with one or more computing devices and configured to provide one or more user interfaces for display on the one or more computing devices, the one or more user interfaces including: a visual display of the categories; and for each category, an indicator indicating a total number of the domain variants within that category.
Implementations of systems for categorizing and visualizing web domain details may include one or more or all of the following:
The domain variants may be associated with a plurality of top-level domains (TLDs).
The one or more processors may be configured to determine a registration status for each domain variant. The one or more user interfaces may include a visual display of the registration status of at least some of the domain variants.
The one or more processors may be configured to determine, for each of the domain variants, a score related to a potential maliciousness of the domain variant.
If the domain variant is registered, the score may be based on one or more of: a determined intended use for the domain variant; a number of malicious sites previously accessible using the domain variant; a number of malicious pages previously accessible using the domain variant; a number of malicious sites previously hosted on an internet protocol (IP) address of the domain variant; a number of malicious pages previously hosted on the IP address of the domain variant; Security Sockets Layer (SSL) certificate details associated with the domain variant; a determined score for a top-level domain (TLD) of the domain variant; and a determination of likely deception related to a known brand name.
If the domain variant is unregistered, the score may be based on one or more of: an average domain registration price associated with a top-level domain (TLD); a price for registration of the domain variant; a determined TLD maliciousness; one or more terms in the domain variant determined to be suspicious; and the level of similarity with the seed domain.
The one or more processors may be configured to, based on the score, determine whether the domain variant should be recommended for acquisition and, if so, initiate display of an acquisition recommendation on the one or more user interfaces.
The one or more processors may be further configured to determine, for each of the domain variants which is registered, whether a website associated with the domain variant includes malicious content.
The one or more processors may be further configured to, in response to determining that the website includes malicious content, initiate display of a takedown recommendation on the one or more user interfaces.
The one or more processors may be further configured to monitor content of the website after a takedown and, in response to determining that the website again includes malicious content, initiate display of another takedown recommendation on the one or more user interfaces.
The categories may include at least: a category for unregistered domains recommended for acquisition; a category for registered domains recommended for monitoring; and a category for registered domains recommended for takedown.
The category for registered domains recommended for monitoring may include a plurality of subcategories including at least a category including parked domains.
The visual display of the categories may include, for each category, a displayed container.
Implementations of methods for categorizing and visualizing web domain details may include: using one or more processors: determining domain variants, using a provided seed domain, based on a level of similarity with the seed domain; determining a registration status for each domain variant; categorizing the domain variants into a plurality of categories; and using one or more servers communicatively coupled with one or more computing devices, providing one or more user interfaces for display on the one or more computing devices, the one or more user interfaces including: a visual display of the categories; a visual display of the registration status of at least some of the domain variants; and for each category, an indicator indicating a total number of the domain variants within that category.
Implementations of methods for categorizing and visualizing web domain details may include one or more or all of the following:
The method may include, using the one or more processors, determining, for each of the domain variants, a score related to a potential maliciousness of the domain variant, wherein the score is based on one or more of: a determined intended use for the domain variant; a number of malicious sites previously accessible using the domain variant; a number of malicious pages previously accessible using the domain variant; a number of malicious sites previously hosted on an internet protocol (IP) address of the domain variant; a number of malicious pages previously hosted on the IP address of the domain variant; Security Sockets Layer (SSL) certificate details associated with the domain variant; a determined score for a top-level domain (TLD) of the domain variant; a determination of likely deception related to a known brand name; an average domain registration price associated with a top-level domain (TLD); a price for registration of the domain variant; a determined TLD maliciousness; one or more terms in the domain variant determined to be suspicious; and the level of similarity with the seed domain.
The categories may include at least: a category for unregistered domains recommended for acquisition; a category for registered domains recommended for monitoring; and a category for registered domains recommended for takedown.
The method may further include, using the one or more processors, determining, for each of the domain variants which is registered, whether a website associated with the domain variant includes malicious content and, in response to determining that the website includes malicious content, initiating display of a takedown recommendation on the one or more user interfaces.
Implementations of systems for categorizing and visualizing web domain details may include: one or more processors; one or more non-transitory computer-readable media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the system to train one or more machine learning (ML) modules to: automatically determine domain variants, using a provided seed domain, based on a level of similarity with the seed domain; and categorize the domain variants into a plurality of categories, wherein the categories include at least: a category for unregistered domains recommended for acquisition; a category for registered domains recommended for monitoring; and a category for registered domains recommended for takedown; and one or more servers communicatively coupled with one or more computing devices and configured to provide one or more user interfaces for display on the one or more computing devices, the one or more user interfaces comprising: a visual display of the categories; and for each category, an indicator indicating a total number of the domain variants within that category.
Implementations of systems for categorizing and visualizing web domain details may include one or more or all of the following:
The instructions, when executed, may cause the system to train the one or more ML modules to determine, for each of the domain variants, a score related to a potential maliciousness of the domain variant, wherein the score is based on one or more of: a determined intended use for the domain variant; a number of malicious sites previously accessible using the domain variant; a number of malicious pages previously accessible using the domain variant; a number of malicious sites previously hosted on an internet protocol (IP) address of the domain variant; a number of malicious pages previously hosted on the IP address of the domain variant; Security Sockets Layer (SSL) certificate details associated with the domain variant; a determined score for a top-level domain (TLD) of the domain variant; a determination of likely deception related to a known brand name; an average domain registration price associated with a top-level domain (TLD); a price for registration of the domain variant; a determined TLD maliciousness; one or more terms in the domain variant determined to be suspicious; and the level of similarity with the seed domain.
The instructions, when executed, may cause the system to train the one or more ML modules to determine, for each of the domain variants which is registered, whether a website associated with the domain variant includes malicious content.
General details of the above-described implementations, and other implementations, are given below in the DESCRIPTION, the DRAWINGS, the CLAIMS and the ABSTRACT.
Implementations will be discussed hereafter using reference to the included drawings, briefly described below, wherein like designations refer to like elements. The drawings are not necessarily drawn to scale.
Implementations/embodiments disclosed herein (including those not expressly discussed in detail) are not limited to the particular components or procedures described herein. Additional or alternative components, assembly procedures, and/or methods of use consistent with the intended systems and methods for categorizing and visualizing web domain details may be utilized in any implementation. This may include any materials, components, sub-components, methods, sub-methods, steps, and so forth.
Implementations of systems and methods disclosed herein relate to systems and methods for categorizing and visualizing web domain details, including lifecycles of fraudulent web domains, from before they are registered to after they are taken down. Systems described herein facilitate processes for automatically categorizing complete lifecycles of suspicious and fraudulent web domains at large scale and visualizing them in a diagram that provides both high-level metrics and technical details of the domains. In implementations the system(s) generate a list of several or all possible variants of a given seed web domain, categorizes them based on content and Domain Name Server (DNS) records of the website into lifecycle stages such as “Monitor for Acquisitions,” “Monitor Pre-malicious” and “Post-malicious,” and provides one or more user interfaces for a user to interact with the data using a lifecycle diagram (such as that seen in
Referring now to
One or more or all of the aforementioned elements of system 100 may also be communicatively coupled with one or more of the following: web server 114 for providing access to the systems and methods through one or more websites; one or more application servers 116 for allowing the admin or users to access elements and/or services of system 100 through one or more software applications, such as through one or more mobile applications; one or more other servers 118 for processing data and/or executing tasks; and one or more remote server racks 112 (or a portion thereof) for processing data and/or executing tasks (such as, by non-limiting example, AMAZON WEB SERVICES (AWS) servers). One or more end user computing devices, such as computing device (device) 120 (having display 122) and computing device (device) 124 (having display 126), may be communicatively coupled with any other elements of system 100. Device 120 is illustrated as a desktop computer and device 124 is illustrated as a mobile phone, but these are only representative examples. In implementations the computing devices 102, 120, 124 may be any type of device such as, by non-limiting example, a laptop, a personal computer (PC), a desktop computer, a tablet, a personal data assistant (PDA), a smart phone or mobile phone, a smart watch, smart glasses (such as GOOGLE GLASS), a smart speaker, and any other device capable of receiving a user input and providing information in visual and/or audio format.
One of more of the described servers could provide one or more user interfaces for display on one or more of the computing devices, such as by providing data and/or instructions configured to result in the display of the user interfaces on the one or more computing devices.
System 100 is only one representative example. In some simplified implementations many or all of the methods of system 100 could be carried out by a single server which includes one or more processors, data storage, one or more executables (code/instructions) stored in data storage or memory of the server for implementing the methods (including providing a website interface, software application and/or mobile application interface, etc.), and so forth. In other implementations multiple or many servers may be used to implement the methods. The system(s) may implement various tasks, including tasks not explicitly disclosed herein but which are inherent to accomplishing the methods and/or end goals described herein.
At any given time there may be any number of end user computing devices 120, 124 (and/or other end user computing devices) communicatively coupled with system 100, to allow for any number of end users. Likewise, there may be any number of administrators and associated administrator devices 102 coupled with system 100.
All of the method steps disclosed herein may be performed by one or more processors of one or more computing devices and/or servers of system 100 or 200 (or another system) (system 200 will be described further below). The one or more processors could include any combination of processors of any combination of computing devices/servers of system 100 or 200 or another system. For example, the methods could be implemented using one or more processors of a web server in conjunction with one or more processors of a remote data store server, in conjunction with one or more processors of another remote server, and so forth. The one or more processors could include processor 202 of system 200, shown in
Machine learning (ML) and/or artificial intelligence (AI) modules/engines may be included in any of the computing devices/servers of systems 100 or 200 or any other system. Although ML/AI modules/engines themselves are not explicitly shown in the drawings, computing devices and servers such as those shown in systems 100 and 200 are known to be capable of including ML/AI modules/engines, and the general abilities/functionalities of ML/AI modules/engines, and how to generally implement them, are understood by the practitioner of ordinary skill in the art, so that they do not need to be explicitly illustrated in the drawings, other than to say that they may be included in one or more of the computing devices/servers of the systems 100/200, to provide adequate disclosure to enable those skilled in the art to implement and use the systems and methods as claimed. ML/AI modules/engines, for example, could be included in instructions 204, 208 and/or 228 of processing system (system) 200 of
The ML/AI modules may be trained, using the system, to perform a variety of functions. For example user input, selections, actions and/or feedback could be used to train a machine learning module to: categorize web domains; categorize and/or determine web domain lifecycles; determine variants of a domain name including those similar in sight and/or sound and including/among several or all possible top-level domains (TLDs); score/rank the unregistered variants to determine potential maliciousness and, based on the score/rank, recommend purchase of one or more of the unregistered variants; determine the registered variants which do not yet have malicious content; score/rank the registered variants to determine potential maliciousness; determine the registered variants which already have malicious content or likely malicious content, and recommend takedown thereof; determine suspicious keywords in a domain; determine similarity between a domain in question and a seed domain; determine an intended use related to a domain (such as e-commerce, parked domain, directory, etc.); determine a score for a TLD itself (for example a high score or a low score for TLDs which are more likely to host malicious websites); and so forth. An ML/AI module or engine may further be trained to perform any of the other actions/methods disclosed herein which an ML/AI engine could feasibly perform. Any of the methods disclosed herein may, accordingly, further include training an ML/AI engine/module to perform any tasks or subtasks, and or training it to improve its effectiveness or accuracy in performing such tasks/subtasks.
One or more of the disclosed memories may include non-transitory computer readable media and may include instructions which, when executed, cause the system to train one or more machine learning (ML) modules to perform methods disclosed herein.
In various embodiments, the processing system 200 operates as part of a user device, although the processing system 200 may also be connected (e.g., wired or wirelessly) to the user device. In a networked deployment, the processing system 200 may operate in the capacity of a server or a client machine in a client-server network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
The processing system 200 may be a server computer, a client computer, a personal computer, a tablet, a laptop computer, a personal digital assistant (PDA), a cellular phone, a processor, a web appliance, a network router, a switch or bridge, a console, a hand-held console, a gaming device, a music player, a network-connected (“smart”) television, a television-connected device, or any portable device or machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by the processing system 200.
While the main memory 206, non-volatile memory 210, and storage medium 226 (also called a “machine-readable medium) are shown to be a single medium, the term “machine-readable medium” and “storage medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store one or more sets of instructions 228. The term “machine-readable medium” and “storage medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the computing system and that causes the computing system to perform any one or more of the methodologies of the presently disclosed embodiments.
In general, the routines executed to implement the embodiments of the disclosure may be implemented as part of an operating system or a specific application, component, program, object, module or sequence of instructions and may be referred to as one or more “computer programs.” The computer programs typically comprise one or more instructions (e.g., instructions 204, 208, 228) set at various times in various memory and storage devices in a computer, and that, when read and executed by one or more processing units or processors 202, cause the processing system 200 to perform operations to execute elements involving the various aspects of the disclosure.
Moreover, while embodiments have been described in the context of fully functioning computers and computer systems, those skilled in the art will appreciate that the various embodiments are capable of being distributed as a program product in a variety of forms, and that the disclosure applies equally regardless of the particular type of machine or computer-readable media used to actually effect the distribution. For example, the technology described herein could be implemented using virtual machines or cloud computing services.
Further examples of machine-readable storage media, machine-readable media, or computer-readable (storage) media include, but are not limited to, recordable type media such as volatile and non-volatile memory devices 210, floppy and other removable disks, hard disk drives, optical disks (e.g., Compact Disk Read-Only Memory (CD ROM, Digital Versatile Disks (DVDs)), and transmission type media, such as digital and analog communication links.
The network adapter 212 enables the processing system 200 to mediate data in a network 214 with an entity that is external to the processing system 200 through any known and/or convenient communications protocol supported by the processing system 200 and the external entity. The network adapter 212 can include one or more of a network adaptor card, a wireless network interface card, a router, an access point, a wireless router, a switch, a multilayer switch, a protocol converter, a gateway, a bridge, a bridge router, a hub, a digital media receiver, a repeater, and any other network adapter type. The network 214 may or may not be part of the system 200, in implementations. In implementations network 214 and network 110 are one and the same, and in other implementations they have some overlap in that at least a portion of one network is also at least a portion of the other network.
The network adapter 212 can include a firewall which can, in some embodiments, govern and/or manage permission to access/proxy data in a computer network and track varying levels of trust between different machines and/or applications. The firewall can include any number of modules having any combination of hardware and/or software components able to enforce a predetermined set of access rights between a particular set of machines and applications, machines and machines, and/or applications and applications, such as to regulate the flow of traffic and resource sharing between these various entities. The firewall may additionally manage and/or have access to an access control list which details permissions including, for example, the access and operation rights of an object by an individual, a machine, and/or an application, and the circumstances under which the permission rights stand (or in other words the circumstances under which the permissions/rights exist).
Reference is now made to
In implementations the system(s) 100/200 perform automated domain categorization in the following four stages:
URL construction refers to different ways the domains can be constructed: for example adding an extra letter to the end of a domain name (addition) such as WELLSFARGOL.COM, changing one of the valid characters with an invalid character that only changes the binary version by one bit (bit squatting) such as well3argo.com (where the American Standard Code for Information Interchange (ASCII) binary code for 3 and for lowercase s are only off by one bit), exchanging a valid Latin character with an identical or nearly identical non-Latin one (homoglyph) such as WE11SFARGO.COM, swapping a valid vowel with another character (vowel-swap) such as EWLLSFARGO.COM, using a subdomain to appear similar to another domain (subdomain) such as WELL.SFARGO.COM, subtracting a valid character (omission) such as WELLSFARO.COM, replacing a valid character with an invalid one (replacement) such as WELPSFARGO.COM, adding a hyphen (hyphenation) such as WELLS-FARGO.COM, repeating a valid character (repetition) such as WELLSSFARGO.COM, and so forth. These are only examples, and there may be other types of URL construction. If the URL construction states “scan” this means that the URL construction was obtained from a third party source and not by an algorithm or ML module or the like of the system 100. All of the URLs in
In the example of
Stage 3 - In this stage the ML/AI system/engine automatically determines which of the registered domains already include (or likely already include) malicious content such as phishing or scam content. The system may determine this by using information from third party sources (such as third party lists of sites with known malicious content) or by using one or more ML modules to determine whether any given domain has, or likely has, malicious content. Such domains are candidates for mitigation such as through takedown notices (to the hosting providers) or automated takedown actions. This stage is referred to as “Takedown Malicious” in the lifecycle diagram of
Stage 4 - In this domain, any domains that have been successfully taken down are put under continuous monitoring to ensure they don’t start hosting malicious content again. This monitoring is performed automatically by the ML/AI system/engine. This stage is referred to as “Monitor Post-malicious” in the lifecycle diagram of
The different category quadrilaterals may also be clicked to open lists such as those given above, in some cases for the different category types. For example the “Uncategorized” quadrilateral may be clicked on to open a list of uncategorized domains, the “Parked Domains” quadrilateral may be clicked on to open a list of parked domains, and so forth. The “Directory, E-Comm, Other” quadrilateral could be clicked on to open a list of these categories separated by headers (such as one continuous list which breaks the domains into categories with list headers such as “Directory,” “E-Commerce,” and so forth), or clicking on this quadrilateral could open an interface which shows another visual graph similar to that of
The system may include lists in the datastores or databases of the system(s) 100/200. For example, a list of all determined variants of the seed domain (referenced in the top left box of
The different stages discussed above do not need to always be done in the order described, but rather any possible/feasible order of operations could be accomplished. Referring to
The different categories for the user interface of
Referring now to
Referring now to
The right graph of
The left graph of
The right graph of
The left image of
The right image of
Any of the user interfaces shown in the drawings may be shown on displays of computing devices shown in the system diagram of
Systems and methods disclosed herein may be used for: determining variants of a domain name including similarities in sight and/or sound and including/among several or all possible top-level domains (TLDs); determining which of the variants are registered and which are unregistered; scoring/ranking the unregistered variants to determine potential maliciousness and, based on the scoring/ranking, recommending purchase of one or more of the unregistered variants; determining the registered variants which do not yet have malicious content; scoring/ranking the registered variants to determine potential maliciousness; determining the registered variants which already have malicious content or likely malicious content, and recommending takedown; and monitoring domains which have previously been taken down.
The scoring/ranking of the unregistered variants may be based on one or more of: TLD price (or price of the actual domain name itself); a determined or input TLD maliciousness; suspicious keywords in the domain (such as determined using a list of known suspicious keywords or using an ML/AI engine); similarity with the seed domain (as determined by the fuzz algorithm and/or by an ML/AI engine); and so forth.
The scoring/ranking of the registered variants may be based on one or more of: a determined intended use associated with the registered variants (such as Parked, Directory, e-commerce, etc.); number of past phishing sites/pages hosted on the domain and/or emanating from the IP address; domain rank obtained from Stage 1; SSL certificate details; a score of the TLD itself (for example a high or low score for TLDs which are more likely to host malicious websites); deceptive domain-name practices such as coupling a known brand name with another word in a domain name; and so forth.
One or more user interfaces may visually display the variants grouped into visually separated categories such as: recommended for acquisition; monitor pre-malicious (which may be visually broken into sub-categories such as uncategorized, parked, directory, e-commerce, other, etc.); takedown malicious (which may be visually broken into sub-categories such as BEC, sensitive, etc.); monitor post-malicious; and so forth.
Any of the systems and methods disclosed herein may be at least partly implemented using one or more mobile devices—for example all of the user interfaces could be displayed, and interacted with, using one or more mobile devices, in implementations.
In implementations the systems and methods disclosed herein improve the functioning of one or more computers or computing systems by making the computers or systems more resistant to malicious attacks. For example, by automatically alerting brand owners to potential sites/domains that may be used for malicious purposes, the brand owners are enabled to purchase such sites (or initiate their takedown) and remove them from circulation. This then may reduce the overall number of malicious attacks to end users in a network, including the Internet in general. Automatically determining, using ML/AI modules/engines and the like, which web domains are most likely to host malicious content, and allowing their quick removal from circulation by purchase or takedown, makes computers and computing systems (including the Internet in general) more resistant to malicious attacks by systematically removing likely avenues of attack. This improves the overall functioning of the computers and systems—for instance decreasing overall down time and damage from successful attacks.
It is also pointed out that training a machine learning module, or the like, as discussed herein, inherently improves the functioning of a computer or system because it improves the computer’s or system’s ability to correctly identify malicious web domains so that the user may take action by removing web domains through purchase or takedown or the like.
In places where the phrase “one of A and B” is used herein, including in the claims, wherein A and B are elements, the phrase shall have the meaning “A and/or B.” This shall be extrapolated to as many elements as are recited in this manner, for example the phrase “one of A, B, and C” shall mean “A, B, and/or C,” and so forth. To further clarify, the phrase “one of A, B, and C” would include implementations having: A only; B only; C only; A and B but not C; A and C but not B; B and C but not A; and A and B and C.
In places where the description above refers to specific implementations of systems and methods for categorizing and visualizing web domain lifecycles, one or more or many modifications may be made without departing from the spirit and scope thereof. Details of any specific implementation/embodiment described herein may, wherever possible, be applied to any other specific implementation/embodiment described herein. The appended claims are to encompass within their scope all such changes and modifications as are within the true spirit and scope of this disclosure.
Furthermore, in the claims, if a specific number of an element is intended, such will be explicitly recited, and in the absence of such explicit recitation no such limitation exists. For example, the claims may include phrases such as “at least one” and “one or more” to introduce claim elements. The use of such phrases should not be construed to imply that the introduction of any other claim element by the indefinite article “a” or “an” limits that claim to only one such element, and the same holds true for the use in the claims of definite articles.
Additionally, in places where a claim below uses the term “first” as applied to an element, this does not imply that the claim requires a second (or more) of that element— if the claim does not explicitly recite a “second” of that element, the claim does not require a “second” of that element. Furthermore, in some cases a claim may recite a “second” or “third” or “fourth” (or so on) of an element, and this does not necessarily imply that the claim requires a first (or so on) of that element—if the claim does not explicitly recite a “first” (or so on) of that element (or an element with the same name, such as “a widget” and “a second widget”), then the claim does not require a “first” (or so on) of that element.
As used herein, the term “of” may refer to “coupled with.” For example, in some cases displays are referred to as a display “of” a first computer or computing device, a display “of” a second computer or computing device, and so forth. These terms are meant to be interpreted broadly so that a display “of” a computing device may be a separate display that is, either by wired or a wireless connection, communicatively coupled with the computing device.
The phrase “computing device” as used herein is meant to include any type of device having one or more processors and capable of communicating information using one or more integrated or communicatively-coupled displays, such as a personal computer, a laptop, a tablet, a mobile phone, a smart phone, a personal data assistant (PDA), smart glasses, a tablet, a smart watch, a smart speaker, a robot, any other human interaction device, and so forth.
It is pointed out that the provider of a software application, to be installed on end user computing devices (such as, by non-limiting example, mobile devices) at least partially facilitates an at least intermittent communicative coupling between one or more servers (which host or otherwise facilitate features of the software application) and the end user computing devices. This is so even if the one or more servers are owned and/or operated by a party other than the provider of the software application.
Method steps disclosed anywhere herein, including in the claims, may be performed in any feasible/possible order. Recitation of method steps in any given order in the claims or elsewhere does not imply that the steps must be performed in that order (unless it is explicitly stated that they are required to be performed in that order)—such claims and descriptions are intended to cover the steps performed in any order except any orders which are technically impossible or not feasible. However, in some implementations method steps may be performed in the order(s) in which the steps are presented herein, including any order(s) presented in the claims.
This document claims the benefit of the filing date of U.S. Provisional Pat. Application No. 63/256,323, entitled “Systems And Methods For Categorizing And Visualizing Web Domain Lifecycles,” naming as first inventor Alain Mayer, which was filed on Oct. 15, 2021, the disclosure of which is hereby incorporated entirely herein by reference.
| Number | Date | Country | |
|---|---|---|---|
| 63256323 | Oct 2021 | US |