The present disclosure generally relates to predictive cyber technologies; and in particular to systems and methods for automated generation of computing device importance rankings that improve and optimize cyber threat defense measures.
An increasing number of software (and hardware) vulnerabilities are discovered and publicly disclosed every year. In 2016 alone, more than 10,000 vulnerability identifiers were assigned and at least 6,000 were publicly disclosed by the National Institute of Standards and Technology (NIST). Once the vulnerabilities are disclosed publicly, the likelihood of those vulnerabilities being exploited increases. With limited resources, organizations often look to prioritize which vulnerabilities to patch by assessing the impact it will have on the organization if exploited. Standard risk assessment systems such as Common Vulnerability Scoring System (CVSS), Microsoft Exploitability Index, Adobe Priority Rating report many vulnerabilities as severe and will be exploited to err on the side of caution. This does not alleviate the problem much since the majority of the flagged vulnerabilities will not be attacked.
NIST provides the National Vulnerability Database (NVD) which comprises of a comprehensive list of vulnerabilities disclosed, but only a small fraction of those vulnerabilities (less than 3%) are found to be exploited in the wild—a result confirmed in the present disclosure. Further, it has been found that the CVSS score provided by NIST is not an effective predictor of vulnerabilities being exploited.
It is with these observations in mind, among others, that various aspects of the present disclosure were conceived and developed.
The application file contains at least one photograph executed in color. Copies of this patent application publication with color photographs will be provided by the Office upon request and payment of the necessary fee.
Corresponding reference characters indicate corresponding elements among the view of the drawings. The headings used in the figures do not limit the scope of the claims.
Aspects of the present disclosure relate to embodiments of a computer-implemented system (
In some embodiments, the system includes an “Input Data Processing Unit”, “Graph database”, and “Query engine” which is shown in
It should be appreciated that features of the present embodiments may be common to one or more other embodiments; i.e., features of the embodiments are not mutually exclusive, and different variations of the embodiments are contemplated.
Network devices: A network device as referenced herein refers to one or more hardware devices or elements used to connect computing devices to a larger network and can include, by non-limiting examples, routers, switches, hubs, wireless access points, repeaters, modems, and the like.
Vulnerability: The term vulnerability as used herein may include a piece of software, hardware, or software/hardware combinations, that can be exploited by a hacking actor to perform unauthorized actions that are considered to be violating the confidentiality, integrity, or availability policies of a computing system hosting or executing the technology (software and/or hardware) having the vulnerability susceptible to exploit. Further, the term “vulnerability” can also be used to refer to a class of vulnerabilities and may not only include software flaws (may also include hardware or software/hardware combinations), but other flaws including but not limited to misconfigurations, to organizational practices, hardware, and physical security. It can also be used to describe a class of generalized computer issues that appeal to particular hackers or communities of hackers for purposes of compromising computer systems.
Vulnerability Exploitation: This term refers to an act of taking advantage of a software (and/or hardware) flaw within a computer system. Vulnerability exploitation is often performed using a piece of software, or a sequence of input data, known as an “exploit”.
Proof-Of-Concept (PoC) exploits: This term refers to non-malicious exploits that are developed only to demonstrate how hackers can take advantage of certain software (and/or hardware) flaws. Malicious hackers may leverage PoC exploits to craft weaponized, harmful exploits.
Hacking actors: This term refers to individuals who engage in activities related to software hacking, either with malicious (a.k.a., black-hat hackers) or non-malicious intent (a.k.a., white-hat hackers).
Online hacker communities: This term refers to online environments used by hackers around the globe, such as Chan sites, social media, paste sites, grey-hat communities, Tor, surface web, and even highly access-restricted sites.
Common Vulnerability and Exposure (CVE): This term refers to a unique identifier assigned to each software vulnerability in the National Vulnerability Database (NVD) maintained by the National Institute of Standards and Technology (NIST). The CVE numbering system associated with the NISD follows one of these two formats:
CVE-YYYY-NNNN; and
CVE-YYYY-NNNNNNN.
The “YYYY” portion of the identifier indicates the year in which the software flaw is reported, and the N′s portion is an integer that identifies a flaw (e.g., see CVE-2018-4917 related to https://nvd.nist.gov/vuln/detail/CVE-2018-4917, and CVE-2019-9896 related to https://nvd.nist.gov/vuln/detail/CVE-2019-9896).
Common Platform Enumeration (CPE): A Common Platform Enumeration, or CPE, relates to a list of software/hardware products that are vulnerable to a given CVE. The CVE and the respected platforms that are affected, i.e., CPE data, can be obtained from the NVD. For example, the following CPEs are some of the CPEs vulnerable to CVE-2018-4917:
cpe:2.3:a:adobe:acrobat_2017:*:*:*:*:*:*:*:*
cpe:2.3:a:adobe:acrobat_reader_dc:15.006.30033:*:*:*:classic:*:*
cpe:2.3:a:adobe:acrobat_reader_dc:15.006,30060:*:*:*:classic:*:*
Common vulnerability scoring system (CVSS): This term refers to a scoring system that captures the severity level of software vulnerabilities based on the technical characteristics such as the ease of exploitation and an approximation of impact it would leave if it is exploited. CVSS ranges from 0 to 10 (the most severe score). The CVSS base score is computed from the CVSS base vector, which is composed of two sub-scores, the Exploitability metrics and the Impact metrics. Each sub-score measures different technical characteristics related to the vulnerability. For example, the Exploitability metrics includes the Attack Vector metric, which explains how a vulnerability can be exploited. It can take one of the values: Network, Adjacent, Local, or Physical.
Multi-modal interaction graph (or simply “graph”): A graphical structure representing a set of entities (in this case, computer systems) and interactions of different types between them.
Node: Symbolic representation in a graph of computer systems.
Edge: A symbolic representation of an interaction between two nodes.
Path: A set of edges spanning two nodes that connects them.
Nodal measurement (or “node measurement”): A scalar value computed for a given node determined on the adjacent configuration of edges and edges of other nodes from which there is a path.
Graph database: A database in which a graph structure is stored.
Subgraph: A subset of a graph that includes certain nodes and edges from the full graph.
Technical Challenges: Information technology (IT) administrators lack sufficient technical means for efficiently identifying and practically addressing possible vulnerabilities of a technology configuration such as determining how to approach a given vulnerability (versus another). A given IT network or environment may be potentially susceptible to thousands of security vulnerabilities (at least those identifiable via the NVD). While the NVD and CVSS provide baseline information about some threats, there is insufficient technology presently available that might allow IT administrators to actually make sense of and intelligently leverage such information to apply responsive measures and prioritize patches or other fixes, and predict actual attacks based on the specifics of a given technology configuration.
In addition, it is technologically problematic and cumbersome to determine what elements of a network should be prioritized or otherwise deemed to be critical or important with respect to possible cyber threats. A given network may include thousands or more devices—many of which may be susceptible to cyber threats, yet, without sufficient technology it is problematic and technically challenging to rank or prioritize each of the devices. In short, security specialists simply cannot address all possible vulnerabilities, such that prioritization is needed.
Referring to
As further described herein, the computing device 102 is adapted to access information about a (target) network 112 associated with a plurality of computing elements 114, designated, by non-limiting examples, computing element 114A, computing element 114B, and computing element 114C. The plurality of computing elements 114 or assets may include, without limitation, physical devices such as a desktop computer, server, mainframe, laptop, tablet, or any mobile device such as a smartphone. The plurality of computing elements 114 may further include systems of devices, virtualized devices, or combinations of virtual and physical devices associated with the network 112.
In general, via the network interface 108 or otherwise, the computing device 102 is adapted to access input data 120 from one or more sources 122 that is helpful for ranking the plurality of computing elements 114, and the input data 120 may be generally stored/aggregated within a storage device (not shown) or locally stored within the memory 106 for further processing. The input data 120 may include, without limitation, information about interactions between the plurality of computing elements 114, information specific to each of the plurality of computing elements 114 (e.g., specific configuration, type, identifier, etc.), and the like. As indicated in
In addition, the computing device 102 is adapted to access threat data 130 from any number of devices 132, systems, or networks. The threat data 130 includes any information about hacker communications, information about cybersecurity events across multiple technology platforms referenced herein, information about known vulnerabilities associated with hardware and software components, any information from the NVD including updates, and the like. As shown, the computing device 102 may further be adapted to access the threat data 130 directly and/or indirectly from various sources, such that the devices 132 may be associated with the deep or dark web (D2web), or the general Internet including hacking actors, hacking communities, or any sources of information related to hacking). In some embodiments, the computing device 102 accesses the threat data 130 by engaging an application programming interface 134 to establish a temporary communication link with the device 132. Alternatively, or in combination, the computing device 102 may be configured to implement a crawler 136 (or spider or the like) to extract the threat data 130 from the devices 132. Further, the computing device 102 may access the threat data 130 from any number or type of devices associated with any number of threat data networks 138, e.g., the general Internet or World Wide Web, deep/dark web, as needed, with or without aid from a specific device.
In general, the threat data 130 may be leveraged by the computing device 130 to generate mappings between platform enumerations and vulnerabilities associated with such platform enumerations. For example, leveraging the threat data 130, the computing device 102 generates a database that links a particular piece of software or hardware device to a known vulnerability as discovered via the NISD, or otherwise discovered. Possible exploits may be linked to the same piece of software or hardware device. In this manner, the threat data 130 is informative as to what kinds of software and/or hardware configurations are susceptible to possible vulnerabilities and exploits thereof.
The input data 120 and the threat data 130 accessed may generally define or be organized into datasets or any predetermined data structures which may be aggregated or accessed by the computing device 102 and may be organized within a database 140 stored in the memory 106 or otherwise stored. Once this data is accessed and/or stored in the database 140, the processor 104 is operable to execute a plurality of services 142, encoded as instructions within the memory 106 and executable by the processor 104, to process the data so as to determine correlations and generate rules or predictive functions, as further described herein. The services 142 of the system 100 may generally include, without limitation, a filtering and preprocessing service 142A for, in general preparing the input data 120 and/or threat data 130 for machine learning or further use; an artificial service 142B comprising any number or type of artificial intelligence functions for modeling information (e.g., natural language processing, classification, neural networks, linear regression, etc.) and/or feature extraction and any other related methods; and a predictive/ranking functions/logic service 142C that formulates ranking or predictive cyber functions and outputs, and view of the input data 120, one or more values suitable for reducing risk or ranking the computing elements 114. The plurality of services 142 may include any number of components or modules executed by the processor 104 or otherwise implemented. Accordingly, in some embodiments, one or more of the plurality of services 142 may be implemented as code and/or machine-executable instructions executable by the processor 104 that may represent one or more of a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, an object, a software package, a class, or any combination of instructions, data structures, or program statements, and the like. In other words, one or more of the plurality of services 142 described herein may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks (e.g., a computer-program product) may be stored in a computer-readable or machine-readable medium (e.g., the memory 106), and the processor 104 performs the tasks defined by the code.
Multi-modal graphical representation
Referring to
As further described, the input data 120 may include information about the interactions among the computing elements 114. This can be based on layer 3 level traffic between the plurality of computing elements 114 (i.e., IP packets sent between two computer devices in the network 112), application layer information (i.e., HTTP requests), or higher-level information (i.e., Application Programming Interface (API) requests). The interactions among the plurality of computing elements 114 can be specified in a variety of possible formats, but at a minimum it contains information about the one or more of the computing elements 114 that have communicated with each other and ideally information concerning when the communication took place, the direction of the communication, the volume of the communication over a unit of time, applications involved, various pieces of metadata (i.e. header information), and even derived data (i.e. if the interaction is suspected to be malicious). As depicted in
Network log data such as NETFLOW
System log data
Security Information and Event Management (SIEM) data
Logs from various applications
Data from various security tools such as packet sniffers or deep packet inspection
As the input data 120 is collected via the input data processing unit 150, the various interactions between the plurality of computing elements 114 may be filtered based on predetermined criteria specified by the user (“Policy on input filter decision process” of
In some embodiments, interactions that meet the specified criteria are inputted into the graph database 152 by means of object relational mapping (ORM) which will map the resulting interaction to the graph database 152. The graph database 152 may be embodied in multiple ways. For example, the database may be designed to store graphical interactions (i.e. Neoj, Giraph, System G, etc.); may comprise a SQL database with relationship tables and optimizations for interactions (i.e. Postgres, Oracle, etc.); or may take the form of a document-based storage system (i.e. MongoDB). In either variation, the system 100 is configured in a suitable manner to store interactions and their associated metadata.
An example of a resulting graphical interaction structure is shown in
In addition, the system 100 may include the query engine 154 implemented by the computing device 102 or separately implemented. The query engine 154 is designed to support queries that lead to the calculation of nodal measurements (performed by the system 100 as described herein). These queries may include the ability to induce subgraphs based on the graphical structure (thereby limiting the size of the graph for a nodal measurement to be computed), metrics to be pre-computed to ease the computation or re-computation of nodal measures, or in some cases the computation of nodal measures themselves. The queries to be calculated, and how they will be calculated may also be specified by the “Specification on database queries” in
The output from the embodiment of the system 100 in
Graphical-driven ranking
As further shown in
Degree-based metric: Given a graph constructed as described herein (
Betweenness-based metric: This can be calculated as a function of the number of paths in the graph that contain the node. Again, this can be adjusted not only based on criteria of the paths (i.e. the path length, only the shortest paths, etc.) but also adjusted based on weight of interactions, edge type, direction, etc. For the classical definition, see MacDonald et al., 2012 (section 3).
Closeness Centrality-based metric: This can be calculated as a function of the number of paths emanating from the node—again with the variations as described above. For the classical definition, see MacDonald et al., 2012 (section 3).
PageRank-based metric: As per section 3.6 of MacDonald et al., 2012 (and the references within)—adjusted per the notes in the above measurements.
Eigenvector Centrality-based metric: As per section 3.5 of MacDonald et al., 2012 (and the references within)—adjusted per the notes in the above measurements.
K-Shell Decomposition metric: As per section 3.2 of MacDonald et al., 2012 (and the references within)—adjusted per the notes in the above measurements.
Metric based on logical rules: As per the methodology described in Shakarian et al. (2013) and the papers cited within.
Combinatorial based measurements: As per the combinatorial measurements specified in works such as Moores et al. (2014) and the papers cited within—also considering the modifications of the other measurements.
Ultimately, the output is a ranking of the computing elements 114 based on node measurement computations, as indicated in
In an embodiment of this system 100 that would produce such sample output, the nodal measurement used was degree centrality and clearly identifies important ones of the plurality of computing elements 114 based on that measurement.
In some embodiments, the system 100 includes the output module 158 shown in
Referring to block 704, the computing device 102 generates a graphical structure of the interactions, the graphical structure being multi-modal and including nodes representing the plurality of computing elements and edges visualizing predetermined interactions between the plurality of computing elements, the graphical structure providing improved cyber threat prioritization. In some embodiments, the query engine 154 is implemented at this stage and supports queries leading to graph query results and that further induces one or more subgraphs from the graphical structure.
Referring to block 706, a node measurement calculator of a graphical analysis processor implemented by the computing device 102 applies one or more nodal measurements to the graph query results or information associated with the graphical structure to output a ranking of the plurality of computing elements. As indicated in block 707, the rankings and graphical structure may be embodied within a report or visualization as desired.
Referring to
The computing device 1200 may include various hardware components, such as a processor 1202, a main memory 1204 (e.g., a system memory), and a system bus 1201 that couples various components of the computing device 1200 to the processor 1202. The system bus 1201 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. For example, such architectures may include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.
The computing device 1200 may further include a variety of memory devices and computer-readable media 1207 that includes removable/non-removable media and volatile/nonvolatile media and/or tangible media, but excludes transitory propagated signals. Computer-readable media 1207 may also include computer storage media and communication media. Computer storage media includes removable/non-removable media and volatile/nonvolatile media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules or other data, such as RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to store the desired information/data and which may be accessed by the computing device 1200. Communication media includes computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. For example, communication media may include wired media such as a wired network or direct-wired connection and wireless media such as acoustic, RF, infrared, and/or other wireless media, or some combination thereof. Computer-readable media may be embodied as a computer program product, such as software stored on computer storage media.
The main memory 1204 includes computer storage media in the form of volatile/nonvolatile memory such as read only memory (ROM) and random access memory (RAM). A basic input/output system (BIOS), containing the basic routines that help to transfer information between elements within the computing device 1200 (e.g., during start-up) is typically stored in ROM. RAM typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processor 1202. Further, data storage 1206 in the form of Read-Only Memory (ROM) or otherwise may store an operating system, application programs, and other program modules and program data.
The data storage 1206 may also include other removable/non-removable, volatile/nonvolatile computer storage media. For example, the data storage 1206 may be: a hard disk drive that reads from or writes to non-removable, nonvolatile magnetic media; a magnetic disk drive that reads from or writes to a removable, nonvolatile magnetic disk; a solid state drive; and/or an optical disk drive that reads from or writes to a removable, nonvolatile optical disk such as a CD-ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media may include magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The drives and their associated computer storage media provide storage of computer-readable instructions, data structures, program modules, and other data for the computing device 1200.
A user may enter commands and information through a user interface 1240 (displayed via a monitor 1260) by engaging input devices 1245 such as a tablet, electronic digitizer, a microphone, keyboard, and/or pointing device, commonly referred to as mouse, trackball or touch pad. Other input devices 1245 may include a joystick, game pad, satellite dish, scanner, or the like. Additionally, voice inputs, gesture inputs (e.g., via hands or fingers), or other natural user input methods may also be used with the appropriate input devices, such as a microphone, camera, tablet, touch pad, glove, or other sensor. These and other input devices 1245 are in operative connection to the processor 1202 and may be coupled to the system bus 1201, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). The monitor 1260 or other type of display device may also be connected to the system bus 1201. The monitor 1260 may also be integrated with a touch-screen panel or the like.
The computing device 1200 may be implemented in a networked or cloud-computing environment using logical connections of a network interface 1203 to one or more remote devices, such as a remote computer. The remote computer may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computing device 1200. The logical connection may include one or more local area networks (LAN) and one or more wide area networks (WAN), but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.
When used in a networked or cloud-computing environment, the computing device 1200 may be connected to a public and/or private network through the network interface 1203. In such embodiments, a modem or other means for establishing communications over the network is connected to the system bus 1201 via the network interface 1203 or other appropriate mechanism. A wireless networking component including an interface and antenna may be coupled through a suitable device such as an access point or peer computer to a network. In a networked environment, program modules depicted relative to the computing device 1200, or portions thereof, may be stored in the remote memory storage device.
Certain embodiments are described herein as including one or more modules. Such modules are hardware-implemented, and thus include at least one tangible unit capable of performing certain operations and may be configured or arranged in a certain manner. For example, a hardware-implemented module may comprise dedicated circuitry that is permanently configured (e.g., as a special-purpose processor, such as a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware-implemented module may also comprise programmable circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software or firmware to perform certain operations. In some example embodiments, one or more computer systems (e.g., a standalone system, a client and/or server computer system, or a peer-to-peer computer system) or one or more processors may be configured by software (e.g., an application or application portion) as a hardware-implemented module that operates to perform certain operations as described herein.
Accordingly, the term “hardware-implemented module” encompasses a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner and/or to perform certain operations described herein. Considering embodiments in which hardware-implemented modules are temporarily configured (e.g., programmed), each of the hardware-implemented modules need not be configured or instantiated at any one instance in time. For example, where the hardware-implemented modules comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different hardware-implemented modules at different times. Software may accordingly configure the processor 1202, for example, to constitute a particular hardware-implemented module at one instance of time and to constitute a different hardware-implemented module at a different instance of time.
Hardware-implemented modules may provide information to, and/or receive information from, other hardware-implemented modules. Accordingly, the described hardware-implemented modules may be regarded as being communicatively coupled. Where multiple of such hardware-implemented modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware-implemented modules. In embodiments in which multiple hardware-implemented modules are configured or instantiated at different times, communications between such hardware-implemented modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware-implemented modules have access. For example, one hardware-implemented module may perform an operation, and may store the output of that operation in a memory device to which it is communicatively coupled. A further hardware-implemented module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware-implemented modules may also initiate communications with input or output devices.
Computing systems or devices referenced herein may include desktop computers, laptops, tablets e-readers, personal digital assistants, smartphones, gaming devices, servers, and the like. The computing devices may access computer-readable media that include computer-readable storage media and data transmission media. In some embodiments, the computer-readable storage media are tangible storage devices that do not include a transitory propagating signal. Examples include memory such as primary memory, cache memory, and secondary memory (e.g., DVD) and other storage devices. The computer-readable storage media may have instructions recorded on them or may be encoded with computer-executable instructions or logic that implements aspects of the functionality described herein. The data transmission media may be used for transmitting data via transitory, propagating signals or carrier waves (e.g., electromagnetism) via a wired or wireless connection.
It should be understood from the foregoing that, while particular embodiments have been illustrated and described, various modifications can be made thereto without departing from the spirit and scope of the invention as will be apparent to those skilled in the art. Such changes and modifications are within the scope and teachings of this invention as defined in the claims appended hereto.
This is a U.S. Non-Provisional patent application that claims benefit to U.S. provisional patent application Ser. No. 63/111,890 filed on Nov. 10, 2020, which is herein incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63111890 | Nov 2020 | US |