This disclosure relates to complex computer system architectures for implementing enhancements to an existing knowledge graph, including the application of stand alone or combined approaches for knowledge graph generation.
Traditional approaches for searching enterprise data typically entail using string matching mechanisms. However, such previous approaches are limited in their ability to provide queried data. Moreover, most of the data stored within an enterprise is dark, meaning is it not easily searchable or available for analytics. Accordingly, conventional knowledge query systems return results that do not provide a complete picture of knowledge and data available in the enterprise, requiring extra consumption of computing resources as knowledge queries are repeated and return inaccurate or incomplete results.
Data may be stored in different data stores depending on factors including data structure, volatility, volume, or other measurable attribute. These data stores may be designed, managed, and operated by different units within an enterprise organization. It follows that such data stores in practice behave as data silos which are disparate, isolated, and make data less accessible across the units. More transparent and open data storage solutions are desired by enterprise organizations to more efficiently and effectively share and access its information amongst their different units.
To take advantage of the benefits offered by big data technologies, enterprise systems have access to large, and rapidly growing, volumes of information, both proprietary and public. Existing analytical applications and data warehousing systems have not been able to fully utilize this profound access to information. Often times information is simply aggregated into large data lakes or data warehouses without the inclusion of an added layer of relationship data connecting the information. Such aggregation of large amounts of data without contextual or relational information are data dumps that are not particularly useful. Information stored in data lakes and data warehouses are likely to be stored in their original format, thus expending large amounts of computing resources to extract, transform, and load (ETL) the information into a searchable data set to respond to a data query.
To address these technical problems, a knowledge graph is disclosed that offers an innovative data structure for presenting relevant information in response to a data query, as well as relationship information between the relevant information. The knowledge graph includes a knowledge base of relevant information that is structured in a graph presentation that captures entities (i.e., nodes), relationships (i.e., edges), and attributes (i.e., node properties or edge properties) with semantic meaning. This graph data structure model offered by the knowledge graph provides the semantic meaning of the included data, by modeling data with an ontology or taxonomy. Accordingly, technical improvements are realized when a computing device structures information into knowledge graphs and runs search queries on the knowledge graphs, which specifically result in the retrieval of more relevant and accurate information, in a shorter amount of time.
The present disclosure further utilizes the enhanced level of structured data offered by knowledge graphs, to identify new and useful combinations of information extracted from the existing information from the knowledge graphs. To accomplish these results, the present disclosure describes embedding techniques for translating the knowledge graph to a plot of nodes within an embedding space, selecting an area of interest within the embedding space, identifying empty areas within the area of interest in the embedding space, identifying a center node from the empty areas, and reconstructing relationships (i.e., edges or connections) of new nodes that represent the center nodes. The new node are then included in the knowledge graph creating updated knowledge graph. Those nodes are depictions of the center nodes from the embedding space, and represent new combinations, and/or recommendations, of information included in the original knowledge graph.
The features described herein are applicable to knowledge graphs of data representing various fields and may represent information within a specific field such as, for example, food recipe data or pharmaceutical formulation data. In the example of the knowledge graph representing food recipe data, the new nodes in the updated knowledge graph may include a recipe for an existing dish, that has been updated with new added ingredients or compounds, updated with new ingredients to replace existing ingredients or compounds, or ingredients or compounds to be removed from the previously existing recipe. Similarly, in the example of the knowledge graph representing pharmaceutical formulations, the new nodes in the reconstructed knowledge graph may include a drug formulation that has been updated with new added ingredients or compounds, updated with new ingredients to replace existing ingredients or compounds, or ingredients or compounds to be removed from the previously existing drug formulation.
According to the exemplary embodiments described herein, the knowledge graphs are described to represent food recipes, where the system attempts discovering new recipes, and the system determines possible set of ingredients that constitutes newly discovered recipes based on the techniques described herein. For example, the enhancement techniques include identifying new recipe, and then updating the recipe with ingredients known to go well with the identified new recipe. The new recipes may be predicted by identifying “gaps” between known recipes, and trying to fill in these gaps with new recipes. These “gaps” may represent areas where information is determined to be missing. So by looking at a space consisting of known recipes, the enhancement solutions are able to build upon the known recipes (i.e., enhance the old recipes) by presenting new combinations of ingredients previously not thought of that are predicted to go well together. Accordingly, new recipe recommendations may be generated automatically. Although the example of recipe formulation is discussed, the knowledge graph enhancement techniques described herein are applicable to knowledge graphs build on data from other fields as well.
Initially, a knowledge graph generation circuitry 110 constructs a knowledge graph from received information. Constructing a knowledge graph may include at least two steps. First, a graph schema definition is obtained for the knowledge graph and refinement is applied as the knowledge graph is being generated. This defines the types of vertices and edges that are generated into the knowledge graph. Second, the knowledge graph is hydrated with information by ingesting knowledge from one or more data sources, and applying one or more knowledge extraction techniques (e.g., natural language processing (NLP), schema mapping, computer visions, or the like), to create the vertices and edges in the knowledge graph. Each data source may create its own data processing pipeline for extracting data to include into the knowledge graph being constructed. The resulting knowledge graph provides a specific format of structured data where each node includes information, and each connecting edge represents a relationship between nodes. For example,
To provide additional context of the technical field and the knowledge graph system disclosed herein, the contents of U.S. patent application Ser. No. 15/150,030, filed on May 9, 2016 (published as U.S. Patent Application Publication No. US 2017/0324759 on Nov. 9, 2017), are hereby incorporated by reference herein.
According to the KGE system 100, the structured data from the knowledge graph is received by a knowledge graph embedding circuitry 120. The knowledge graph embedding circuitry 120 is configured to convert the knowledge graph into an embedding space.
The KGE system 100 further includes a region identification circuitry 130 for selecting a region of interest within the embedding space. The selection may include selecting a concept-based first sub-set region within the embedding space that represents an area of interest, such as a region corresponding to specific categories of food (e.g., vegetarian recipes).
The region identification circuitry 130 may further determine a padding parameter that represents an extension distance extending out from the first sub-set region R by a predetermined padding distance k. The predetermined padding distance k may not extend past the region of interest and into another adjacent region. For example,
The KGE system 100 further includes computation circuitry 140 for implementing computations within the embedding space. For example, the computation circuitry 140 may identify gap regions (e.g., a second sub-set region) within the region of interest, and compute Max-Min Multi-dimensional computations to determine a center for the gap regions within the region of interest. The computation circuitry 140 is further configured to consider that center node to be an embedding of a newly discovered recipe that was not present in the original knowledge graph. According to some embodiments, the center location may be weighted to include certain predetermined ingredients. This may be technically implemented by generating a new node within the embedding space at the determined center having the attributes of the newly discovered recipe.
The KGE system 100 further includes reconstruction circuitry 150 for reconstructing the structure (i.e., relationships) of the new node(s). This reconstruction circuitry 150 produces updates to the knowledge graph that contain the new node(s) added at the determined center of mass in the embedding space. The reconstruction process may be defined by the following calculation for each center node X that is determined:
By adding the new node(s), the updated knowledge graph is enhanced with new nodes of information that depict new combinations of information previously not found in the original knowledge graph.
The GUIs 210 and the I/O interface circuitry 206 may include touch sensitive displays, voice or facial recognition inputs, buttons, switches, speakers and other user interface elements. Additional examples of the I/O interface circuitry 206 includes microphones, video and still image cameras, headset and microphone input/output jacks, Universal Serial Bus (USB) connectors, memory card slots, and other types of inputs. The I/O interface circuitry 206 may further include magnetic or optical media interfaces (e.g., a CDROM or DVD drive), serial and parallel bus interfaces, and keyboard and mouse interfaces.
The communication interfaces 202 may include wireless transmitters and receivers (“transceivers”) 212 and any antennas 214 used by the transmit and receive circuitry of the transceivers 212. The transceivers 212 and antennas 214 may support WiFi network communications, for instance, under any version of IEEE 802.11, e.g., 802.11n or 802.11ac, or other wireless protocols such as Bluetooth, Wi-Fi, WLAN, cellular (4G, LTE/A). The communication interfaces 202 may also include serial interfaces, such as universal serial bus (USB), serial ATA, IEEE 1394, lighting port, I2C, slimBus, or other serial interfaces. The communication interfaces 202 may also include wireline transceivers 216 to support wired communication protocols. The wireline transceivers 216 may provide physical layer interfaces for any of a wide range of communication protocols, such as any type of Ethernet, Gigabit Ethernet, optical networking protocols, data over cable service interface specification (DOCSIS), digital subscriber line (DSL), Synchronous Optical Network (SONET), or other protocol.
The system circuitry 204 may include any combination of hardware, software, firmware, or other circuitry. The system circuitry 204 may be implemented, for example, with one or more systems on a chip (SoC), application specific integrated circuits (ASIC), microprocessors, discrete analog and digital circuits, and other circuitry. The system circuitry 204 may implement any desired functionality of the KGE system 100. As just one example, the system circuitry 204 may include one or more instruction processor 218 and memory 220.
The memory 220 stores, for example, control instructions 222 for executing the features of the KGE system 100, as well as an operating system 224. In one implementation, the processor 218 executes the control instructions 222 and the operating system 224 to carry out any desired functionality for the KGE system 100, including those attributed to the knowledge graph generation circuitry 110, the knowledge graph embedding circuitry 120, the region identification circuitry 130, the computation circuitry 140, or the reconstruction circuitry 150. The control parameters 226 provide and specify configuration and operating options for the control instructions 222, operating system 224, and other functionality of the computer device 200.
The computer device 200 may further include various data sources 230. Each of the databases that are included in the data sources 230 may be accessed by the KGE system to obtain data for consideration during any one or more of the processes described herein. For example, the knowledge graph generation circuitry 110 may access the data sources 230 to obtain the information for generating the knowledge graph 300.
The knowledge graph generation circuitry 110 constructs a knowledge graph based on received information (801). The knowledge graph includes nodes of information, and connecting edges representing a relationship between nodes at a head end of the edge and a tail end of the edge.
The knowledge graph embedding circuitry 120 receives the knowledge graph 300, and converts it into an embedding space (802). The conversion may include first converting the structured data from the knowledge graph 300 and converting them into a specific data format such as sets of vector triples. An exemplary vector triple may include the following format: <head entity, relationship, tail entity> (e.g., <tiramisu, has Category, dessert>. The vector triples conversion may be applied across the knowledge graph 300. The knowledge graph embedding circuitry 120 further implements the embedding space conversion by modeling the vector triples according to an elaboration of a neural network architecture to learn the representations of the knowledge graph 300. This way, the embedding space is constructed to be comprised of nodes (e.g., embedding vectors) representing the structured data comprising the knowledge graph 300, as shown by the embedding space 400 in
The region identification circuitry 130 selects a first sub-set region from within the embedding space (803). For example, the embedding space 400 may be comprised of one or more sub-set regions that correspond to areas including a specific type of recipe, such as chocolate-based recipes, vegetarian recipes, vegan recipes, or other categories of recipes. In
wjx+bj<0, j=1, . . . ,N
By taking all those constraints together, they can be rewritten in a matrix form:
Ax≤b
The region identification circuitry 130 selects a second sub-set region from within the region of interest R′ (804). The second sub-set region is otherwise referred to as a gap region. The region of interest R′ may include one or more gap regions where the region identification circuitry 130 determines there is a lack of information (i.e., nodes) in the embedding space.
The region identification circuitry 130 calculates a center for each gap region (805). The center calculation is implemented according to an execution of a max/min problem solving calculation, where the center is determined to be the point at which the distance to the closest surrounding node is as large as possible (i.e., the node where the minimum distance to the closest nodes is maximized).
The center calculation is a process that is iteratively repeated to find many possible centers of different graphs. Each iteration consists of solving an optimization problem to identify a center point XJ within the embedding space. In a first step to calculate a first center X1 for a first gap region, the calculation may be represented as:
mini∥x−yi∥→maxx∈R,
Ax≤b
As a second step to calculate a second center X2 for a second gap region, the first center X1 from the previous step is added to the embedding space. The calculation for determining the second center X2 may be represented as:
mini∥x−yi∥maxx∈R,
Ax≤b
These calculations may be iterated for each gap region identified in the region of interest R′ until gap centers XJ corresponding to each of the recognized gap regions are determined (806). For example, the calculation for identifying centers of gap regions may be iterated while:
mini∥XJ−yi∥>threshold
When all center locations (X1, X2, . . . , Xj) have been calculated, a node is created for each of the center locations, where each of the new center nodes include information describing a new recipe composed of ingredients found from recipes from the original knowledge graph. It follows that the reconstruction procedure also finds relationships (i.e., links) between the new recipes and existing ingredients from the original knowledge graph. An updated knowledge graph 700 is constructed from the original knowledge graph 300, where the updated knowledge graph 700 includes the new recipes created to represent all the calculated centers Xj of the recognized gap regions. For example, the updated knowledge graph 700 includes the new recipe 701 which was not present in knowledge graph 300. The new recipe 701 includes a new combination of ingredients, where the ingredients themselves were existing ingredients from recipes included in the original knowledge graph.
Various implementations have been specifically described. However, other implementations that include a fewer, or greater, number of features and/or components for each of the apparatuses, methods, or other embodiments described herein are also possible.
This application claims benefit to U.S. Provisional Patent Application No. 62/741,928, filed on Oct. 5, 2018, the entirety of which is incorporated by reference herein.
Number | Name | Date | Kind |
---|---|---|---|
6067093 | Grau | May 2000 | A |
6285992 | Kwasny | Sep 2001 | B1 |
7496597 | Rising, III | Feb 2009 | B2 |
7925599 | Koren | Apr 2011 | B2 |
9075824 | Gordo | Jul 2015 | B2 |
9275148 | Elassaad | Mar 2016 | B1 |
9483547 | Feller | Nov 2016 | B1 |
9489377 | Feller | Nov 2016 | B1 |
9632654 | Elassaad | Apr 2017 | B1 |
9852231 | Ravi | Dec 2017 | B1 |
9990687 | Kaufhold | Jun 2018 | B1 |
10095775 | Bull | Oct 2018 | B1 |
10157226 | Costabello | Dec 2018 | B1 |
10198491 | Semturs | Feb 2019 | B1 |
10430464 | Ravi | Oct 2019 | B1 |
20040093328 | Damle | May 2004 | A1 |
20050131924 | Jones | Jun 2005 | A1 |
20090012842 | Srinivasan | Jan 2009 | A1 |
20090138415 | Lancaster | May 2009 | A1 |
20100121792 | Yang | May 2010 | A1 |
20100185672 | Rising, III | Jul 2010 | A1 |
20100211927 | Cai | Aug 2010 | A1 |
20100332475 | Birdwell | Dec 2010 | A1 |
20110040711 | Perronnin | Feb 2011 | A1 |
20110191374 | Bengio | Aug 2011 | A1 |
20110302118 | Melvin | Dec 2011 | A1 |
20120158633 | Eder | Jun 2012 | A1 |
20130054603 | Birdwell | Feb 2013 | A1 |
20130149677 | Slone | Jun 2013 | A1 |
20130297617 | Roy | Nov 2013 | A1 |
20130325784 | Morara | Dec 2013 | A1 |
20140075004 | Van Dusen | Mar 2014 | A1 |
20140156733 | Goranson | Jun 2014 | A1 |
20140282219 | Haddock | Sep 2014 | A1 |
20150095303 | Sonmez | Apr 2015 | A1 |
20150161748 | Ratakonda | Jun 2015 | A1 |
20150169758 | Assom | Jun 2015 | A1 |
20150248478 | Skupin | Sep 2015 | A1 |
20160042298 | Liang | Feb 2016 | A1 |
20160042299 | Liang | Feb 2016 | A1 |
20160196587 | Eder | Jul 2016 | A1 |
20160239746 | Yu | Aug 2016 | A1 |
20170011091 | Chehreghani | Jan 2017 | A1 |
20170076206 | Lastras-Montano | Mar 2017 | A1 |
20170139902 | Byron | May 2017 | A1 |
20170161279 | Franceschini | Jun 2017 | A1 |
20170177681 | Potiagalov | Jun 2017 | A1 |
20170177744 | Potiagalov | Jun 2017 | A1 |
20170228641 | Sohn | Aug 2017 | A1 |
20170270245 | van Rooyen | Sep 2017 | A1 |
20170286397 | Gonzalez | Oct 2017 | A1 |
20170324759 | Puri | Nov 2017 | A1 |
20170357896 | Tsatsin | Dec 2017 | A1 |
20180033017 | Gopalakrishnan Iyer | Feb 2018 | A1 |
20180039696 | Zhai | Feb 2018 | A1 |
20180075359 | Brennan | Mar 2018 | A1 |
20180082197 | Aravamudan | Mar 2018 | A1 |
20180129941 | Gustafson | May 2018 | A1 |
20180129959 | Gustafson | May 2018 | A1 |
20180137424 | Gabaldon Royval | May 2018 | A1 |
20180144252 | Minervini | May 2018 | A1 |
20180189634 | Abdelaziz | Jul 2018 | A1 |
20180197104 | Marin | Jul 2018 | A1 |
20180210913 | Beller | Jul 2018 | A1 |
20180254101 | Gilmore | Sep 2018 | A1 |
20180260750 | Varshney | Sep 2018 | A1 |
20180336183 | Lee | Nov 2018 | A1 |
20180351971 | Chen | Dec 2018 | A1 |
20180365614 | Palmer | Dec 2018 | A1 |
20190005115 | Warner | Jan 2019 | A1 |
20190005374 | Shankar | Jan 2019 | A1 |
20190012405 | Contractor | Jan 2019 | A1 |
20190034780 | Marin | Jan 2019 | A1 |
20190042988 | Brown | Feb 2019 | A1 |
20190080245 | Hickman | Mar 2019 | A1 |
20190122111 | Min | Apr 2019 | A1 |
20190155926 | Scheideler | May 2019 | A1 |
20190155945 | Zhelezniak | May 2019 | A1 |
20190196436 | Nagarajan | Jun 2019 | A1 |
20190197396 | Rajkumar | Jun 2019 | A1 |
20190197398 | Jamali | Jun 2019 | A1 |
20190205964 | Onoro Rubio | Jul 2019 | A1 |
20190303535 | Fokoue-Nkoutche | Oct 2019 | A1 |
20190392330 | Martineau | Dec 2019 | A1 |
20200057946 | Singaraju | Feb 2020 | A1 |
20200081445 | Stetson | Mar 2020 | A1 |
20200242484 | Lecue | Jul 2020 | A1 |
20200311616 | Rajkumar | Oct 2020 | A1 |
Number | Date | Country | |
---|---|---|---|
20200110746 A1 | Apr 2020 | US |
Number | Date | Country | |
---|---|---|---|
62741928 | Oct 2018 | US |