This disclosure relates generally to code representations and, more particularly, to a methods and apparatus to construct program-derived semantic graphs.
In recent years, a desire to create graphical representations of computer programs has arose. Programmers wish to graphically represent programs to convey the processes and/or methods performed by the program. These representations may allow for Artificial Intelligence systems (e.g., deep learning systems) to perform various coding tasks like automatic software bug detection or code structure suggestions. Some examples of prior graphical representations of programs include decision trees, abstract syntax trees, Kripke structures, and computational tree logic diagrams.
The figures are not to scale. Instead, the thickness of the layers or regions may be enlarged in the drawings. Although the figures show layers and regions with clean lines and boundaries, some or all of these lines and/or boundaries may be idealized. In reality, the boundaries and/or lines may be unobservable, blended, and/or irregular. In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts.
Unless specifically stated otherwise, descriptors such as “first,” “second,” “third,” etc. are used herein without imputing or otherwise indicating any meaning of priority, physical order, arrangement in a list, and/or ordering in any way, but are merely used as labels and/or arbitrary names to distinguish elements for ease of understanding the disclosed examples. In some examples, the descriptor “first” may be used to refer to an element in the detailed description, while the same element may be referred to in a claim with a different descriptor such as “second” or “third.” In such instances, it should be understood that such descriptors are used merely for identifying those elements distinctly that might, for example, otherwise share a same name.
Machine Programming (MP) is concerned with the automation of software development. In recent years, the emergence of big data facilitates technological advancements in the field of MP. One of the core challenges in MP is code similarity, which aims to tell if two code snippets are semantically similar. An accurate code similarity system can enable various applications ranging from automatic software patching to code recommendation. Such systems can improve programmer productivity by assisting programmers in various programming stages (e.g., development, deployment, debugging, etc.). To build accurate code similarity systems, one core problem is to build an appropriate representation that can accurately capture the semantic fingerprint of a code.
Some common representations include graph representations (e.g., trees, sequence of program tokens, etc.). It has been demonstrated that tree representation of code can effectively capture code semantic information that can aid a learning system for learning code semantics. However, one of the issues of this work is that the representation, named the context-aware semantic structure (CASS), although effective in capturing code semantics, may not provide direct code explanations that can assist programmers in understanding and comparing codes. To provide better explanations for code, this application proposes the concept of program-derived semantic graphs, which is a graph representation of code that consists of different abstraction levels to accurately capture code semantics. Example approaches disclosed herein mix rule-based and learning-based approaches to identify and build the nodes of a program-derived semantic graph at various abstraction levels.
The second phase in these examples is Phase Two: Node Construction for First Abstraction Level 120. In these examples, a leaf node creator 124 accesses the syntactical nodes in the parse tree 116. The leaf node creator 124 sets the syntactical nodes in the parse tree 116 as leaf nodes 128 in the program-derived semantic graph.
The third and final phase in these examples is Phase Three: Node Construction for Higher Abstraction Levels 132. In these examples, Phase Three: Node Construction for Higher Abstraction Levels 132 determines one of three options to perform based on whether the current abstraction level is deterministic, and whether attention should be used for the current abstraction level. In these examples, the first option is a Rule-Based Construction for a Deterministic Abstraction Level 136. In the Rule-Based Construction for a Deterministic Abstraction Level 136, the program-derived semantic graph constructor determines that the current abstraction level is deterministic. For an abstraction level to be deterministic, the input nodes 137 to the current abstraction level have a single possible parent node in the set of possible nodes at the current abstraction level. The Rule-Based Mapper 138 accesses the set of input nodes 137 and determines a parent node for each input node from the set of possible nodes at the current abstraction level. The Rule-Based Mapper 138 saves the determined set of nodes at the current abstraction level 139 to the program-derived semantic graph.
The second option in Phase Three: Node Construction for Higher Abstraction Levels 132 is a Learning-Based Construction for Non-Deterministic Abstraction Levels without Attention 140. In the Learning-Based Construction for Non-Deterministic Abstraction Levels without Attention 140, the Learning-Based Mapper 142 accesses the set of input nodes 137 and determines the set of nodes for the current abstraction level 139 to include in the program-derived semantic graph at the current abstraction level. For an abstraction level that is non-deterministic, at least one input node in the set of input nodes 137 has at least two possible parent nodes in the set of possible nodes at the current abstraction level. In these examples, the Learning-Based Mapper 142 uses a probabilistic model to determine one of the at least two possible parent nodes to include in the set of nodes at the current abstraction level 139.
The third option in Phase Three: Node Construction for Higher Abstraction Levels 132 is a Learning-Based Construction for Non-Deterministic Levels with Attention 144. In the Learning-Based Construction for Non-Deterministic Levels with Attention 144, a Learning-Based Mapper 146 accesses a set of input nodes 137. The Learning-Based Mapper 146 determines a subset of input nodes 145 to utilize in determining the set of nodes to include at the current abstraction level 139. The Learning-Based Mapper 146 sets a weight for input nodes in the set of input nodes 137 based on the likelihood that a specified node has a parent in the current abstraction level. The Learning-Based Mapper 146 accesses the subset of input nodes 145 that meet a threshold value based on the weight of the input nodes. The Learning-Based Mapper 146 determines a set of nodes to include in the current abstraction level 139 from a set of possible nodes at the current abstraction level based on the subset of input nodes 145.
The example parse tree constructor 208 of the program-derived graph constructor 204 of the illustrated example of
The example syntactical node determiner 212 of the program-derived semantic graph constructor 204 of the illustrated example of
The example abstraction level modifier 216 of the program-derived semantic graph constructor 204 of the illustrated example of
The example abstraction level determiner 224 of the program-derived semantic graph constructor 204 of the illustrated example of
The abstraction level determiner 224 determines whether the current abstraction level is deterministic. In some examples, a deterministic abstraction level describes an abstraction level where nodes with a parent on the abstraction level only point to a single parent. For example, the nodes while, for, and do while will only map to the singular parent node loop. Also in these examples, a non-deterministic abstraction level describes an abstraction level where at least one node that points to a parent on the current abstraction level, points to at least two parents on the current abstraction level.
The example rule-based abstraction level creator 228 of the program-derived semantic graph constructor 204 of the illustrated example of
The example learning-based abstraction level creator 232 of the program-derived semantic graph constructor 204 of the illustrated example of
In some examples, the learning-based abstraction level creator 232 is a multi-label classification model (e.g., decision tree, deep neural network, etc.). In these examples, the learning-based abstraction level creator 232 determines which of the nodes in the set of possible nodes at the current abstraction level of the program-derived semantic graph to include in the set of nodes at the current abstraction level of the program-derived semantic graph. In these examples, the learning-based abstraction level creator 232 identifies which nodes could be included in the set of nodes at the current abstraction level of the program-derived semantic graph and determines which nodes to include in the set of nodes at the current abstraction level of the program-derived semantic graph.
In some examples, the input to the learning-based abstraction level creator 232 is the set of nodes at lower abstraction levels in the program-derived semantic graph. In other examples, a weight is applied to nodes at lower abstraction levels in the program-derived semantic graph. In these examples, the input to the learning-based abstraction level creator 232 is the set of nodes in lower abstraction levels of the program-derived semantic graph that satisfy a weight threshold. In some examples, the weight threshold is a weight value which compares the weight of a node to the weight value. For example, if the weight value is set to 0.8, nodes in lower abstraction levels of the program-derived semantic graph with a weight greater than 0.8 would be in the input to the learning-based abstraction level creator 232.
In other examples, the input to the learning-based abstraction level creator 232 could be a percentage or amount of the highest weight nodes in the lower abstraction levels of the program-derived semantic graph. For example, the learning-based abstraction level creator 232 could retrieve the 30 nodes with the highest weight in the set of nodes in the lower abstraction levels. In another example, the learning-based abstraction level creator 232 could retrieve the heaviest 30% of nodes in the set of nodes in the lower abstraction levels. For example, if there are 50 nodes in the lower abstraction levels of the program-derived semantic graph, the learning-based abstraction level creator 232 could grab the 15 nodes with the largest weights. After the learning-based abstraction level creator 232 creates the set of nodes to include in the program-derived semantic graph at the current abstraction level, the process proceeds to the next abstraction level.
The example node selector 304 of the rule-based abstraction level creator 228 of the illustrated example of
The example abstraction level node comparator 308 of the rule-based abstraction level creator 228 of the illustrated example of
If the abstraction level node comparator 308 identifies a node to include at the current abstraction level, the example abstraction level creator 312 of the rule-based abstraction level creator 228 adds the identified node to the current abstraction level of the program-derived semantic graph. In some examples, the abstraction level creator 312 adds the identified node to a data structure (e.g., set, array, etc.) containing nodes that have been identified to be included at the current abstraction level. The node selector 304 removes the selected input node from the data structure created by the rule-based abstraction level creator 228.
If the abstraction level node comparator 308 does not identify a node to include at the current abstraction level, the abstraction level creator 312 ignores the selected input node. The node selector 304 removes the selected input node from the data structure created by the rule-based abstraction level creator 228.
The example node selector 404 of the learning-based abstraction level creator 232 creates an input set, array, or other data structure containing the nodes in the program-derived semantic graph. In some examples, the node selector 404 selects nodes to include in the input set based on a weight of the nodes. In these examples, the nodes satisfying a weight threshold are included in the input set and the nodes not satisfying the weight threshold are not included in the input set. In other examples, the node selector 404 selects nodes in previous abstraction levels of the program-derived semantic graph to include in the input set. The nodes in the input set are considered input nodes. The node selector 404 selects one of the input nodes to compare to a set of possible nodes to include at the current abstraction level.
The node selector 404 determines whether there are remaining input nodes in the data structure. In response to determining the data structure contains input nodes, the node selector 404 selects one of the input nodes from the data structure.
The example probabilistic abstraction level node comparator 412 of the learning-based abstraction level creator 232 determines whether the selected input node maps to any of the possible nodes at the current abstraction level. In some examples, the probabilistic abstraction level node comparator 412 contains sets for the abstraction levels containing possible nodes at the specified abstraction level. For example, if the set of the possible nodes at the current abstraction level contains the node “Arithmetic Operations” and the set of nodes in lower abstraction levels of the program-derived semantic graph contains the node %, the learning-based abstraction level creator 232 adds the node “Arithmetic Operations” to the program-derived semantic graph at the current abstraction level.
In other examples, the selected input node maps to more than one node at the currently selected abstraction level. In these examples, the probabilistic abstraction level node comparator 412 identifies possible parent nodes of the selected node. If the probabilistic abstraction level node comparator 412 determines that the selected input node maps to at least one identified node within the set of possible nodes at the current abstraction level, learning-based probabilistic abstraction level node comparator 412 determines one of the at least one identified nodes to add to the current abstraction level. Else, the input node is ignored.
If the probabilistic abstraction level node comparator 412 identifies at least one node to add to the current abstraction level, the example model executor 408 of the learning-based abstraction level creator 232 determines one of the at least one identified nodes to add to the current abstraction level of the program-derived semantic graph. In some examples, a machine learning classification model (e.g., decision tree, deep neural network, etc.) is used to determine which of the at least one identified nodes to add to the current abstraction level.
The example abstraction level creator 416 of the learning-based abstraction level creator 232 adds the identified node to the current abstraction level of the program-derived semantic graph. In some examples, the abstraction level creator 416 adds the identified node to a data structure (e.g., set, array, etc.) containing nodes that have been identified to be included at the current abstraction level. The node selector 404 removes the selected input node from the data structure created by the learning-based abstraction level creator 232.
If the probabilistic abstraction level node comparator 412 does not identify a node to include at the current abstraction level, the abstraction level creator 416 ignores the selected input node. The node selector 404 removes the selected input node from the data structure created by the learning-based abstraction level creator 232.
While an example manner of implementing the program-derived semantic graph constructor 204 of
A flowchart representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the program-derived graph constructor 204 of
The machine readable instructions described herein may be stored in one or more of a compressed format, an encrypted format, a fragmented format, a compiled format, an executable format, a packaged format, etc. Machine readable instructions as described herein may be stored as data or a data structure (e.g., portions of instructions, code, representations of code, etc.) that may be utilized to create, manufacture, and/or produce machine executable instructions. For example, the machine readable instructions may be fragmented and stored on one or more storage devices and/or computing devices (e.g., servers) located at the same or different locations of a network or collection of networks (e.g., in the cloud, in edge devices, etc.). The machine readable instructions may require one or more of installation, modification, adaptation, updating, combining, supplementing, configuring, decryption, decompression, unpacking, distribution, reassignment, compilation, etc. in order to make them directly readable, interpretable, and/or executable by a computing device and/or other machine. For example, the machine readable instructions may be stored in multiple parts, which are individually compressed, encrypted, and stored on separate computing devices, wherein the parts when decrypted, decompressed, and combined form a set of executable instructions that implement one or more functions that may together form a program such as that described herein.
In another example, the machine readable instructions may be stored in a state in which they may be read by processor circuitry, but require addition of a library (e.g., a dynamic link library (DLL)), a software development kit (SDK), an application programming interface (API), etc. in order to execute the instructions on a particular computing device or other device. In another example, the machine readable instructions may need to be configured (e.g., settings stored, data input, network addresses recorded, etc.) before the machine readable instructions and/or the corresponding program(s) can be executed in whole or in part. Thus, machine readable media, as used herein, may include machine readable instructions and/or program(s) regardless of the particular format or state of the machine readable instructions and/or program(s) when stored or otherwise at rest or in transit.
The machine readable instructions described herein can be represented by any past, present, or future instruction language, scripting language, programming language, etc. For example, the machine readable instructions may be represented using any of the following languages: C, C++, Java, C#, Perl, Python, JavaScript, HyperText Markup Language (HTML), Structured Query Language (SQL), Swift, etc.
As mentioned above, the example processes of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc. may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, and (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B.
As used herein, singular references (e.g., “a”, “an”, “first”, “second”, etc.) do not exclude a plurality. The term “a” or “an” entity, as used herein, refers to one or more of that entity. The terms “a” (or “an”), “one or more”, and “at least one” can be used interchangeably herein. Furthermore, although individually listed, a plurality of means, elements or method actions may be implemented by, e.g., a single unit or processor. Additionally, although individual features may be included in different examples or claims, these may possibly be combined, and the inclusion in different examples or claims does not imply that a combination of features is not feasible and/or advantageous.
The parse tree constructor 208 converts the segment or snippet of program code into a parse tree. (Block 508). In some examples, the parse tree includes the words, mathematical operations, and/or formatting present in the segment or snippet of program code. In some examples, the parse tree includes nodes that are syntactical values (e.g., mathematical operations, integers, if-else statements, etc.).
The syntactical node determiner 212 iterates through the parse tree and determines the syntactical nodes present in the parse tree. (Block 512). The syntactical node determiner 212 saves the syntactical nodes to a temporary location. In some examples, the parse tree includes nodes that include syntactical values (e.g., mathematical operations, integers, if-else statements, etc.).
The abstraction level modifier 216 sets the abstraction level to a default starting value (e.g., zero, one, ten, etc.). (Block 516). In the following examples, the default starting value will be 0. The leaf node creator 220 sets the syntactical nodes identified by the syntactical node determiner 212 as leaf nodes in the program-derived semantic graph. (Block 520). The abstraction level modifier 216 increases the current value of the abstraction level. (Block 524).
The abstraction level determiner 224 determines whether abstraction levels have been defined in the program-derived semantic graph. (Block 528). In some examples, abstraction levels are defined when child nodes are connected to a common parent node. In other examples, abstraction levels are defined when the most abstract abstraction level defined includes the nodes “Operations for Handling Data” and “Code Structure and Flow”. In these examples, the node “Operations for Handling Data” points to children nodes such as algorithms, mathematical operations, integers, etc. Also in these examples, the node “Code Structure and Flow” points to children nodes such as conditional statements, return statements, comparisons, etc. If the abstraction level determiner 224 determines abstraction levels have been defined, the process ends. If the abstraction level determiner 224 determines abstraction levels are not defined, the process proceeds to determine whether the current abstraction level is deterministic.
The abstraction level determiner 224 determines whether the current abstraction level is deterministic. (Block 532). In some examples, a deterministic abstraction level describes an abstraction level where nodes with a parent on the abstraction level only point to a single parent. For example, the nodes while, for, and do while will only map to the singular parent node loop. Also in these examples, a non-deterministic abstraction level describes an abstraction level where at least one node that points to a parent on the current abstraction level, points to at least two parents on the current abstraction level. If the abstraction level determiner 224 determines the current abstraction level to be deterministic, a rule-based approach is utilized to create the current abstraction level. If the abstraction level determiner 224 determines the current abstraction level to be non-deterministic, a learning-based approach is utilized to create the current abstraction level.
The rule-based abstraction level creator 228 creates a node set containing of the nodes to be used at the current abstraction level of the program-derived semantic graph. (Block 536). In some examples, the rule-based abstraction level creator 228 accesses the nodes currently present in the program-derived semantic graph at lower abstraction levels and determines whether the nodes have parent nodes at the current abstraction level. In these examples, the rule-based abstraction level creator 228 has a set of the possible nodes at the current abstraction level and determines the nodes in lower abstraction levels in the program-derived semantic graph that have a parent in the set of the possible nodes at the current abstraction level. For example, if the set of the possible nodes at the current abstraction level contains the node “Arithmetic Operations” and the set of nodes in lower abstraction levels of the program-derived semantic graph contains the node %, the rule-based abstraction level creator 228 would add the node “Arithmetic Operations” to the program-derived semantic graph at the current abstraction level. Once the rule-based abstraction level creator 228 iterates through the set of nodes in lower abstraction levels of the program-derived semantic graph and determines the nodes to include at the current abstraction level, the process proceeds to the next abstraction level.
The learning-based abstraction level creator 232 creates a node set containing the nodes to be used at the current abstraction level of the program-derived semantic graph. (Block 540). In some examples, the learning-based abstraction level creator 232 accesses a set of possible nodes at the current abstraction level of the program-derived semantic graph. In these examples, a non-deterministic abstraction level indicates that at least one node in the set of nodes in lower abstraction levels of the program-derived semantic graph has multiple possible parent nodes in the set of possible nodes at the current abstraction level of the program-derived semantic graph.
In some examples, the learning-based abstraction level creator 232 is a multi-label classification model (e.g., decision tree, deep neural network, etc.). In these examples, the learning-based abstraction level creator 232 determines which of the nodes in the set of possible nodes at the current abstraction level of the program-derived semantic graph to include in the set of nodes at the current abstraction level of the program-derived semantic graph. In these examples, the learning-based abstraction level creator 232 identifies which nodes could be included in the set of nodes at the current abstraction level of the program-derived semantic graph and determines which nodes to include in the set of nodes at the current abstraction level of the program-derived semantic graph.
In some examples, the input to the learning-based abstraction level creator 232 is the set of nodes at lower abstraction levels in the program-derived semantic graph. In other examples, a weight is applied to nodes at lower abstraction levels in the program-derived semantic graph. In these examples, the input to the learning-based abstraction level creator 232 is the set of nodes in lower abstraction levels of the program-derived semantic graph that satisfy a weight threshold. In some examples, the weight threshold is a weight value which compares the weight of a node to the weight value. For example, if the weight value is set to 0.8, nodes in lower abstraction levels of the program-derived semantic graph with a weight greater than 0.8 would be in the input to the learning-based abstraction level creator 232.
In other examples, the input to the learning-based abstraction level creator 232 could be a percentage or amount of the highest weight nodes in the lower abstraction levels of the program-derived semantic graph. For example, the learning-based abstraction level creator 232 could retrieve the 30 nodes with the highest weight in the set of nodes in the lower abstraction levels. In another example, the learning-based abstraction level creator 232 could retrieve the heaviest 30% of nodes in the set of nodes in the lower abstraction levels. For example, if there are 50 nodes in the lower abstraction levels of the program-derived semantic graph, the learning-based abstraction level creator 232 could grab the 15 nodes with the largest weights. After the learning-based abstraction level creator 232 creates the set of nodes to include in the program-derived semantic graph at the current abstraction level, the process proceeds to the next abstraction level.
The node selector 304 determines whether there are remaining input nodes in the data structure. (Block 608). In response to determining that the data structure does not contain input nodes, the process ends. In response to determining the data structure contains input nodes, the node selector 304 selects one of the input nodes from the data structure. (Block 612).
The abstraction level node comparator 308 determines whether the selected input node maps to any of the possible nodes at the current abstraction level. (Block 616). In some examples, the abstraction level node comparator 308 contains sets for the abstraction levels containing possible nodes at the specified abstraction level. For example, if the set of the possible nodes at the current abstraction level contains the node “Arithmetic Operations” and the set of nodes in lower abstraction levels of the program-derived semantic graph contains the node %, the rule-based abstraction level creator 228 would add the node “Arithmetic Operations” to the program-derived semantic graph at the current abstraction level. If the abstraction level node comparator 308 determines that the selected input node maps to an identified node within the set of possible nodes at the current abstraction level, the identified node is added to the program-derived semantic graph. Else, the input node is ignored.
If the abstraction level node comparator 308 identifies a node to include at the current abstraction level, the abstraction level creator 312 adds the identified node to the current abstraction level of the program-derived semantic graph. (Block 620). In some examples, the abstraction level creator 312 adds the identified node to a data structure (e.g., set, array, etc.) containing nodes that have been identified to be included at the current abstraction level. The node selector 304 removes the selected input node from the data structure created by the rule-based abstraction level creator 228.
If the abstraction level node comparator 308 does not identify a node to include at the current abstraction level, the abstraction level creator 312 ignores the selected input node. (Block 624). The node selector 304 removes the selected input node from the data structure created by the rule-based abstraction level creator 228.
In response to determining not to consider the weight of the nodes in the program-derived semantic graph, the learning-based abstraction level creator 232 accesses nodes in the program-derived semantic graph. (Block 708). The node selector 404 creates an input set, array, or other data structure containing the nodes in the program-derived semantic graph. The nodes in the input set are considered input nodes.
In response to determining to consider the weight of the nodes in the program-derived semantic graph, the learning-based abstraction level creator 232 accesses nodes in the program-derived semantic graph meeting a weight threshold. (Block 712). In some examples, the weight threshold is a value. For example, the weight threshold is 0.7 then nodes in the program-derived semantic graph with a weight greater than 0.7 would be accessed. In other examples, the weight threshold is the nodes in the program-derived semantic graph in a top pre-determined percentage or value of weights. For example, if the weight threshold is thirty percent, in a situation with fifty nodes, the 15 nodes with the largest weight would be the input nodes. For another example, if the weight threshold is the top thirty heaviest nodes, then the thirty nodes with the largest weights would be selected as input nodes. The node selector 404 creates an input set, array, or other data structure containing the nodes in the program-derived semantic graph. The nodes in the input set are considered input nodes.
The node selector 404 determines whether there are remaining input nodes in the data structure. (Block 716). In response to determining that the data structure does not contain input nodes, the process ends. In response to determining the data structure contains input nodes, the node selector 404 selects one of the input nodes from the data structure. (Block 720).
The probabilistic abstraction level node comparator 412 determines whether the selected input node maps to any of the possible nodes at the current abstraction level. (Block 724). In some examples, the probabilistic abstraction level node comparator 412 contains sets for the abstraction levels containing possible nodes at the specified abstraction level. For example, if the set of the possible nodes at the current abstraction level contains the node “Arithmetic Operations” and the set of nodes in lower abstraction levels of the program-derived semantic graph contains the node %, the learning-based abstraction level creator 232 would add the node “Arithmetic Operations” to the program-derived semantic graph at the current abstraction level.
In other examples, the selected input node could map to more than one node at the currently selected abstraction level. In these examples, the probabilistic abstraction level node comparator 412 identifies possible parent nodes of the selected node. If the probabilistic abstraction level node comparator 412 determines that the selected input node maps to at least one identified node within the set of possible nodes at the current abstraction level, learning-based probabilistic abstraction level node comparator 412 determines one of the at least one identified nodes to add to the current abstraction level. Else, the input node is ignored.
If the probabilistic abstraction level node comparator 412 identifies at least one node to add to the current abstraction level, the model executor 408 determines one of the at least one identified nodes to add to the current abstraction level of the program-derived semantic graph. (Block 728). In some examples, a machine learning classification model (e.g., decision tree, deep neural network, etc.) is used to determine which of the at least one identified nodes to add to the current abstraction level.
The abstraction level creator 416 adds the identified node to the current abstraction level of the program-derived semantic graph. (Block 732). In some examples, the abstraction level creator 416 adds the identified node to a data structure (e.g., set, array, etc.) containing nodes that have been identified to be included at the current abstraction level. The node selector 404 removes the selected input node from the data structure created by the learning-based abstraction level creator 232.
If the probabilistic abstraction level node comparator 412 does not identify a node to include at the current abstraction level, the abstraction level creator 416 ignores the selected input node. (Block 736). The node selector 404 removes the selected input node from the data structure created by the learning-based abstraction level creator 232.
The processor platform 800 of the illustrated example includes a processor 812. The processor 812 of the illustrated example is hardware. For example, the processor 812 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor implements the example program-derived graph constructor 204, the example parse tree constructor 208, the example syntactical node determiner 212, the example abstraction level modifier 216, the example leaf node creator 220, the example abstraction level determiner 224, the example rule-based abstraction level creator 228, the example learning-based abstraction level creator 232, the example program-derived graph comparator 236, the example node selector 304, the example abstraction level node comparator 308, the example abstraction level creator 312, the example node selector 404, the example model executor 408, the example probabilistic abstraction level node comparator 412, and the example abstraction level creator 416.
The processor 812 of the illustrated example includes a local memory 813 (e.g., a cache). The processor 812 of the illustrated example is in communication with a main memory including a volatile memory 814 and a non-volatile memory 816 via a bus 818. The volatile memory 814 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 816 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 814, 816 is controlled by a memory controller.
The processor platform 800 of the illustrated example also includes an interface circuit 820. The interface circuit 820 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.
In the illustrated example, one or more input devices 822 are connected to the interface circuit 820. The input device(s) 822 permit(s) a user to enter data and/or commands into the processor 812. The input device(s) can be implemented by, for example, a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.
One or more output devices 824 are also connected to the interface circuit 820 of the illustrated example. The output devices 824 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker. The interface circuit 820 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
The interface circuit 820 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 826. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.
The processor platform 800 of the illustrated example also includes one or more mass storage devices 828 for storing software and/or data. Examples of such mass storage devices 828 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives.
The machine executable instructions 832 of
A block diagram illustrating an example software distribution platform 905 to distribute software such as the example computer readable instructions 832 of
From the foregoing, it will be appreciated that example methods, apparatus and articles of manufacture have been disclosed that construct program-derived semantic graphs. The disclosed methods, apparatus and articles of manufacture improve the efficiency of using a computing device by allowing for comparisons between code snippets based on program-derived semantic graphs, code suggestions for developers during the coding process, and protecting against plagiarism of coding programs. The disclosed methods, apparatus and articles of manufacture are accordingly directed to one or more improvement(s) in the functioning of a computer.
Example methods, apparatus, systems, and articles of manufacture to construct program-derived semantic graphs are disclosed herein. Further examples and combinations thereof include the following:
Example 1 includes an apparatus to construct and compare program-derived semantic graphs (PSGs), the apparatus comprising a leaf node creator to identify a first set of nodes within a parse tree, and set a first abstraction level of the PSG to include the first set of nodes, an abstraction level determiner to access a second set of nodes, wherein the second set of nodes is the set of nodes in the PSG, create a third set of nodes, the third set of nodes to include possible nodes at a current abstraction level, and determine whether the current abstraction level is deterministic, a rule-based abstraction level creator to in response to determining the current abstraction level is deterministic, construct the current abstraction level, and a PSG comparator to access a first PSG and a second PSG, and determine if the first PSG and the second PSG satisfy a similarity threshold.
Example 2 includes the apparatus of example 1, wherein the first set of nodes is a set of syntactic nodes in the parse tree.
Example 3 includes the apparatus of example 1, wherein an abstraction level is deterministic when at least one node in the second set of nodes has at least two possible parent nodes in the third set of nodes.
Example 4 includes the apparatus of example 1, wherein to construct the current abstraction level, the rule-based abstraction level creator is to access the second set of nodes and the third set of nodes, determine a fourth set of nodes within the third set of nodes that are parents of at least one node in the second set of nodes, and set the current abstraction level to include the fourth set of nodes.
Example 5 includes the apparatus of example 1, including a learning-based abstraction level creator to in response to determining the current abstraction level is not deterministic, create a fourth set of nodes, wherein to create the fourth set of nodes, the learning-based abstraction level creator is to identify nodes within the second set of nodes with one possible parent node in the third set of nodes, add identified parent nodes to the fourth set of nodes, identify nodes within the second set of nodes with at least two possible parent nodes in the third set of nodes, and determine one of the at least two possible parent nodes to add to the fourth set of nodes, and set the fourth set of nodes as the current abstraction level in the PSG.
Example 6 includes the apparatus of example 1, wherein the second set of nodes is a set of nodes that satisfy a weight threshold.
Example 7 includes the apparatus of example 1, including a parse tree creator to access a code snippet, and construct a parse tree based on the code snippet.
Example 8 includes At least one non-transitory computer readable medium comprising instructions that, when executed, cause a computing device to identify a first set of nodes within a parse tree, set a first abstraction level of a program-derived semantic graph (PSG) to include the first set of nodes, access a second set of nodes, the second set of nodes to include the set of nodes in the PSG, create a third set of nodes, the third set of to include possible nodes at a current abstraction level, determine whether a current abstraction level is deterministic, in response to determining the current abstraction level is deterministic, construct the current abstraction level, access a first PSG and a second PSG, and determine whether the first PSG and the second PSG satisfy a similarity threshold.
Example 9 includes the at least one non-transitory computer readable medium of example 8, wherein the first set of nodes is a set of syntactic nodes in the parse tree.
Example 10 includes the at least one non-transitory computer readable medium of example 8, wherein the current abstraction level is deterministic when at least one node in the second set of nodes has at least two possible parent nodes in the third set of nodes.
Example 11 includes the at least one non-transitory computer readable medium of example 8, wherein the instructions, when executed, cause the computing device, in order to construct the current abstraction level, to access the second set of nodes and the third set of nodes and determine a fourth set of nodes within the third set of nodes that are parents of at least one node in the second set of nodes, and set the current abstraction level to include the fourth set of nodes.
Example 12 includes the at least one non-transitory computer readable medium of example 8, wherein the instructions, when executed, cause the computing device to in response to determining the current abstraction level is not deterministic, create a fourth set of nodes, wherein to create the fourth set of nodes, the computing device is to identify nodes within the second set of nodes with one possible parent node in the third set of nodes, add identified parent nodes to the fourth set of nodes, identify nodes within the second set of nodes with at least two possible parent nodes in the third set of nodes, and determine one of the at least two possible parent nodes to add to the fourth set of nodes, and set the fourth set of nodes as the current abstraction level in the PSG.
Example 13 includes the at least one non-transitory computer readable medium of example 12, wherein the second set of nodes is a set of nodes that satisfy a weight threshold.
Example 14 includes the at least one non-transitory computer readable medium of example 8, wherein the instructions, when executed, cause the computing device to access a code snippet, and construct a parse tree based on the code snippet.
Example 15 includes a method for construction a program-derived semantic graph (PSG), the method comprising identifying a first set of nodes within a parse tree, setting a first abstraction level of a program-derived semantic graph (PSG) to contain the first set of nodes, accessing a second set of nodes, the second set of nodes to include the set of nodes in the PSG, creating a third set of nodes, the third set of nodes to include possible nodes at a current abstraction level, determining whether a current abstraction level is deterministic, in response to determining the current abstraction level is deterministic, constructing the current abstraction level, accessing a first PSG and a second PSG, and determining whether the first PSG and the second PSG satisfy a similarity threshold.
Example 16 includes the method of example 15, wherein the first set of nodes is a set of syntactic nodes in the parse tree.
Example 17 includes the method of example 15, wherein the current abstraction level is deterministic when at least one node in the second set of nodes has at least two possible parent nodes in the third set of nodes.
Example 18 includes the method of example 15, wherein the construction of the current abstraction level includes accessing the second set of nodes and the third set of nodes, determining a fourth set of nodes within the third set of nodes that are parents of at least one node in the second set of nodes, and setting the current abstraction level to include the fourth set of nodes.
Example 19 includes the method of example 15, further including in response to determining the current abstraction level is not deterministic, creating a fourth set of nodes by identifying nodes within the second set of nodes with one possible parent node in the third set of nodes, adding identified parent nodes to the fourth set of nodes, identifying nodes within the second set of nodes with at least two possible parent nodes in the third set of nodes, and determining one of the at least two possible parent nodes to add to the fourth set of nodes, and setting the fourth set of nodes as the current abstraction level in the PSG.
Example 20 includes the method of example 19, wherein the second set of nodes is a set of nodes that satisfy a weight threshold.
Example 21 includes the method of example 15, further including accessing a code snippet, and constructing a parse tree based on the code snippet.
Example 22 includes a computer system to construct and compare program-derived semantic graphs (PSGs) comprising memory, and one or more processors to execute instructions to cause the one or more processors to identify a first set of nodes within a parse tree, set a first abstraction level of a program-derived semantic graph (PSG) to contain the first set of nodes, access a second set of nodes, the second set of nodes to include the set of nodes in the PSG, create a third set of nodes, the third set of nodes to include the possible nodes at a current abstraction level, determine whether a current abstraction level is deterministic, in response to determining the current abstraction level is deterministic, construct the current abstraction level, access a first PSG and a second PSG, and determine whether the first PSG and the second PSG satisfy a similarity threshold.
Example 23 includes the computer system of example 22, wherein the first set of nodes is a set of syntactic nodes in the parse tree.
Example 24 includes the computer system of example 22, wherein the current abstraction level is deterministic when at least one node in the second set of nodes has at least two possible parent nodes in the third set of nodes.
Example 25 includes the computer system of example 22, wherein the construction of the current abstraction level includes accessing the second set of nodes and the third set of nodes and determine a fourth set of nodes within the third set of nodes that are parents of at least one node in the second set of nodes, and setting the current abstraction level to include the fourth set of nodes.
Example 26 includes the computer system of example 22, further including a learning-based abstraction level creator to in response to determining the current abstraction level is not deterministic, create a fourth set of nodes, wherein the learning-based abstraction level creator is to identify nodes within the second set of nodes with one possible parent node in the third set of nodes, add identified parent nodes to the fourth set of nodes, identify nodes within the second set of nodes with at least two possible parent nodes in the third set of nodes, and determine one of the at least two possible parent nodes to add to the fourth set of nodes, and set the fourth set of nodes as the current abstraction level in the PSG.
Example 27 includes the computer system of example 26, wherein the second set of nodes is a set of nodes that satisfy a weight threshold.
Example 28 includes the computer system of example 22, including accessing a code snippet, and constructing a parse tree based on the code snippet.
Example 29 includes an apparatus for construction a program-derived semantic graph (PSG), the apparatus comprising means for a leaf node creator to, identify a first set of nodes within a parse tree, set a first abstraction level of a program-derived semantic graph (PSG) to contain the first set of nodes, means for an abstraction level determiner to access a second set of nodes, the second set of nodes to include the set of nodes in the PSG, create a third set of nodes, the third set of nodes to include the possible nodes at a current abstraction level, determine whether a current abstraction level is deterministic, means for a rule-based abstraction level creator to in response to determining the current abstraction level is deterministic, construct the current abstraction level, means for a PSG comparator to access a first PSG and a second PSG, and determine whether the first PSG and the second PSG satisfy a similarity threshold.
Example 30 includes the apparatus of example 29, wherein the first set of nodes is a set of syntactic nodes in the parse tree.
Example 31 includes the apparatus of example 29, wherein the current abstraction level is deterministic when at least one node in the second set of nodes has at least two possible parent nodes in the third set of nodes.
Example 32 includes the apparatus of example 29, wherein the construction of the current abstraction level includes means for the rule-based abstraction level creator to access the second set of nodes and the third set of nodes, determine a fourth set of nodes within the third set of nodes that are parents of at least one node in the second set of nodes, and set the current abstraction level to include the fourth set of nodes.
Example 33 includes the apparatus of example 29, including means for a learning-based abstraction level creator to, in response to determining the current abstraction level is not deterministic, create a fourth set of nodes, wherein to create the fourth set of nodes the learning-based abstraction level creator is to identify nodes within the second set of nodes with one possible parent node in the third set of nodes, add identified parent nodes to the fourth set of nodes, identify nodes within the second set of nodes with at least two possible parent nodes in the third set of nodes, and determine one of the at least two possible parent nodes to add to the fourth set of nodes, and means for setting the fourth set of nodes as the current abstraction level in the PSG.
Example 34 includes the apparatus of example 33, wherein the second set of nodes is a set of nodes that satisfy a weight threshold.
Example 35 includes the apparatus of example 29, including means for accessing a code snippet, and means for constructing a parse tree based on the code snippet.
Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.
The following claims are hereby incorporated into this Detailed Description by this reference, with each claim standing on its own as a separate embodiment of the present disclosure.