This disclosure relates generally to graphical user interfaces, and more particularly to a graphical user interface to depict data lineage information in levels.
Many organizations (e.g., providers of web-based services) are collecting and storing increasing amounts of data. For example, to provide web services, a server system may utilize many data elements that have relationships with one another. In many instances, it may be desirable to know the data lineage of the data elements in the system and, due to the difficulty in interpreting this data lineage information, it may be desirable to use one or more graphical components to visually represent the data lineage associated with the data elements. Prior tools for visualizing data lineage information suffer from various technical shortcomings, however, particularly as the number of data elements in a data system increases, preventing users from gaining meaningful insight from the data lineage information.
Many organizations (e.g., providers of web-based services) are collecting and storing increasing amounts of data. For example, to provide web services, a server system may utilize many “data elements,” which, as used herein, refers to an item of structured or unstructured data in a data system. Non-limiting examples of data elements include databases, datasets, database objects (e.g., tables, indices, etc.), rules, models, variables, etc. In a data system (e.g., a server system used to provide a web service), data elements may have relationships with one another. As one non-limiting example, a dataset may include data pulled from multiple different databases, the dataset may be referenced by multiple different models, which, in turn, may utilize multiple rules and variables, etc.
In many instances, particularly as the number of data elements in a data system increases, it may be desirable to know the “data lineage” of the data elements in the system, for example to facilitate troubleshooting data analytics issues. As will be appreciated by one of skill in the art, “data lineage” refers to the data lifecycle and the relationship between data elements in a data system (or a portion thereof), such as the origin of data elements, the way(s) in which the data elements change over time, the referential relationships between data elements, etc. The data lineage associated with a set of interrelated data elements may include multiple “levels” (also referred to herein as “layers”) that correspond to the number of referential relationships between the data elements. To illustrate, consider the following four data elements: Data Element A, which is a first database; Data Element B, which is a second database; Data Element C, which is a dataset that pulls records from both of the first and second databases; and Data Element D, which is a statistical model that utilizes the dataset. In this example, Data Elements A and B may be said to be at a different level in the data lineage than Data Element C because there is a referential relationship from Data Elements A and B to Data Element C. Stated differently, Data Element C directly depends on both Data Elements A and B. Similarly, Data Element D may be said to be at a different level than both Data Elements A and B and Data Element C since there is another referential relationship from Data Element C to Data Element D. Further, throughout this disclosure, the terms “upstream” and “downstream” are used to describe the relative position of data elements within the various levels of the data lineage. Continuing with the present example, Data Elements A and B may be said to be “upstream” relative Data Element C, which, in turn, may be considered “upstream” relative to Data Element D. Note that a data element may be considered “upstream” or “downstream” to another data element even if there is not a direct link between those data elements. For example, in the present example, Data Element D would be considered to be one level “downstream” relative to Data Element C and two levels “downstream” relative to Data Elements A and B.
The data lineage between a set of data elements may, in some instances, be provided as information specifying a graph that describes the manner in which the different data elements are related. In many instances, however, this data lineage information is complex, particularly as the number of data elements involved increases (e.g., to thousands, tens of thousands, etc.), making it difficult for a user to gain meaningful insight into the data lineage merely by inspecting the raw information. Accordingly, it is often desirable to visualize data lineage so as to gain a better understanding of the interrelationships between data elements in the data system. Prior tools for visualizing data lineage information suffer from various technical shortcomings. For example, one prior technique is to utilize a graph data visualization component in which each data element is represented visually as a node connected to other nodes based on the data lineage relationships. Such an attempt to visually depict the entirety of the data lineage simultaneously in a single visualization component has various disadvantages, however. For example, using such a data visualization technique, it quickly becomes difficult, if not impossible, for users to interpret the data lineage information, particularly as the number of data elements increases. In a data system that includes a large number (e.g., hundreds, thousands, tens of thousands, etc.) of data elements, such prior data visualization techniques fail to meaningfully convey the data lineage information to the user.
In various embodiments, however, the disclosed techniques provide a technical solution to visualizing data lineage information that overcomes the technical limitations of prior approaches. For example, various disclosed embodiments include generating a data lineage graphical user interface (“GUI”) that depicts the data lineage information in levels, allowing a user to quickly and easily navigate between the different (and potentially many) levels of the data lineage. In some embodiments, for example, the disclosed data lineage GUI is operable to receive a user selection of a particular data element and, for that selected data element, navigate the different levels of the data lineage in an upstream and downstream direction relative to the level of the selected data element. According to various embodiments, the disclosed techniques enable a user to investigate the data lineage associated with a selected data element using a GUI that remains legible and intuitive even in instances in which there are a large number of data elements to visualize, improving the ease and efficiency of using the data lineage information and enabling more meaningful insight into the relationship between data elements in the system.
Referring now to
Data lineage information 150, in various embodiments, may specify the data lineage associated with various data elements in a system. For example, data lineage information 150 may be generated based on the various interrelated data elements utilized by a server system to provide one or more web services. Data lineage information 150, according to one non-limiting example, is described in more detail below with reference to
As noted above, in various embodiments the interface generation module 102 is operable to generate the data lineage GUI 110 based on the data lineage information 150. That is, in various embodiments, the interface generation module 102 is operable both to generate various components of the data lineage GUI 110 and, based on the user input 106, generate and update UI data 108 used to populate the various components of the data lineage GUI 110. For example, in some embodiments, the interface generation module 102 may receive user input 106 selecting one of the data elements depicted via the data lineage GUI 110. In response to the user input 106, the interface generation module 102 may generate the data lineage GUI 110 such that, for the selected data element 104 of the plurality of data elements, the GUI 110 is usable to navigate the different levels of the data lineage in both an upstream and downstream direction relative to the level of the selected data element 104. In some such embodiments, for example, the interface generation module 102 may generate the data lineage GUI 110 by analyzing the data lineage information 150 to identify one or more edges associated with a particular node that corresponds to the selected data element 104 and, based on this analysis, identify one or more data elements 120 that are upstream relative to the selected data element and one or more data elements 120 that are downstream relative to the selected data element 104. For example, the interface generation module 102 may determine, based on the one or more edges, a first subset of nodes that are in the upstream direction, in the directed graph specified by the data lineage information 150, relative to the node associated with the selected data element 104. In such an instance, the first subset of nodes may correspond to a first subset of data elements 120 that are at levels that are upstream relative to the selected data element. Similarly, the interface generation module 102 may determine, based on the one or more edges, a second subset of nodes that are in the downstream direction, in the directed graph, relative to the node associated with the selected data element 104, where the second subset of nodes corresponds to a second subset of data elements 120 that are at levels that are downstream relative to the selected data element 104. The interface generation module 102 may generate UI data 108 based on this analysis, which is used to populate the data lineage GUI 110.
In the depicted embodiment, the data lineage GUI 110 includes two main components: a slider navigation element 116 and a “detailed” view that includes upstream panel 112, central panel 113, and downstream panel 114. In various embodiments, the slider navigation element 116 may be used to navigate between the various levels of the data lineage, allowing a user to select one or more levels for which to view additional information in the data lineage GUI 110 (e.g., via the detailed view). The operation of slider navigation element 116, according to some embodiments, is described in detail below with reference to
In various embodiments, the data lineage GUI 110 allows the user to quickly determine which data elements are upstream and downstream relative to the selected data element. In the depicted embodiment, for example, the detailed view of data lineage GUI 110 includes a central panel 113 that depicts information corresponding to the selected data element 104, an upstream panel 112 that depicts information corresponding to one or more upstream data elements 120 (e.g., data element 120A), and a downstream panel 114 that depicts information corresponding to one or more downstream data elements 120 (e.g., data elements 120B-120E). As shown in
Turning now to
In various embodiments, the data lineage information 150 represents data elements using a set of nodes. In the depicted embodiment, the data lineage information 150 includes information for eight nodes, including an identifier and a name for each. For example, in
Further, in various embodiments, the data lineage information 150 may specify a graph (e.g., a directed graph) in which the various nodes are connected by one or more edges, where the connections are indicative of the data lineage between the data elements. For example, referring now to
As noted above, in various embodiments, the interface generation module 102 is operable to generate UI data 108 to populate the data lineage GUI 110 using the data lineage information 150. For example, consider an instance in which a user provides user input 106 that selects one of the data elements depicted via the data lineage GUI 110. In such an embodiment, the interface generation module 102 may use the data lineage information 150 to generate the UI data 108 to populate the data lineage GUI 110 so as to depict the various levels of the data lineage and the relative position of the selected data element within those levels.
In various embodiments, to generate this UI data 108 to populate the data lineage GUI 110, the interface generation module may analyze the data lineage information 150 to identify one or more edges that are connected to a particular node that corresponds to the selected data element and, based on those edges, determine a first subset of nodes that are in an upstream direction, in the graph, relative to the particular node and a second subset of nodes that are in a downstream direction, in the graph, relative to the particular node. For example, consider an instance in which the user input 106 indicates a selection of Data Element 0 via the data lineage GUI 110. In this example, the interface generation module 102 may analyze the data lineage information 150 and determine that Data Element 1 and Data Element 2 are in a downstream direction relative to Data Element 0 because, in the data lineage information 150, edge1 and edge2 start at the node id0 corresponding to Data Element 0 and end at nodes id1 and id2 corresponding to Data Element 1 and Data Element 2, respectively. Similarly, the interface generation module 102 may analyze the data lineage information 150 and determine that Data Element 3 is in an upstream direction relative to Data Element 0 because, in the data lineage information 150, edge5 starts at node id3 corresponding to Data Element 3 and ends at node id0 corresponding to Data Element 0.
Referring now to
Note that, in generating the UI data 108 to populate the data lineage GUI 110 based on a selected data element (e.g., as described above with reference to Data Element 0), the interface generation module 102 may analyze the data lineage information 150 to create a graph data structure in memory that corresponds to the nodes and edges specified in the data lineage information 150. For example, the interface generation module 102 may iterate through the nodes and edges specified in the data lineage information 150, creating corresponding nodes (also referred to as “vertices”) and edges in the graph data structure as specified by the data lineage information 150. By iterating through the dependencies specified by the edges in the data lineage information 150, the interface generation module 102 can determine the manner in which the nodes are linked and, accordingly, the data lineage between the corresponding data elements represented by those nodes.
The interface generation module 102 may use any of various suitable techniques to generate the graph data structure based on the data lineage information 150. In one non-limiting embodiment, for example, the interface generation module 102 may use the JGraphT Java class library to perform various operations in generating the data lineage GUI 110 or the UI data 108 to populate the data lineage GUI 110. This embodiment is provided merely as an example, however, and any of various other suitable techniques, including various software libraries utilizing any suitable programming or scripting language, may be used.
Turning now to
In the depicted embodiment, the detailed view 402 includes a central panel 113, an upstream panel 112, and a downstream panel 114. Note, however, that this embodiment is provided merely as one non-limiting example and, in other embodiments, the detailed view 402 may include any suitable combination and arrangement of display elements. In various embodiments, the central panel 113 is used to represent, and graphically depict one or more items of information for, the selected data element (e.g., Data Element 0, in the depicted embodiment). The upstream panel 112, in various embodiments, depicts information corresponding to the data elements that are in the upstream direction relative to the selected data element. For example, in the embodiment of
In the depicted embodiment, detailed view 402 of the data lineage GUI 110 further includes a slider navigation element 116, which is described in detail below with reference to
As shown in
In some instances, however, the user may wish to investigate the different levels of the data lineage at a more granular level. Accordingly, in various embodiments, the disclosed data lineage GUI 110 is usable to graphically depict the various data elements that reside at a given level of the data lineage. For example, in the embodiment depicted in
For example, referring now to
More specifically, level-specific panel 512A corresponds to one level upstream relative to the selected Data Element 0, level-specific panel 512B corresponds to two levels upstream relative to the selected Data Element 0, and level-specific panel 512C corresponds to three levels upstream relative to the selected Data Element 0. Similarly, level-specific panel 512D corresponds to one level downstream relative to the selected Data Element 0 and level-specific panel 512E corresponds to two levels downstream relative to the selected Data Element 0. In the depicted embodiment, level-specific panel 512A includes a table that lists Data Element 3 as being one level upstream relative to Data Element 0, level-specific panel 512B includes a table that lists Data Elements 4 and 7 as being two levels upstream relative to Data Element 0, and level-specific panel 512C includes a table that lists Data Element 6 as being three levels upstream relative to Data Element 0. Similarly, level-specific panel 512D includes a table that lists Data Elements 1 and 2 as being one level downstream relative to Data Element 0 and level-specific panel 512E includes a table that lists Data Element 5 as being two levels downstream relative to Data Element 0.
Note that, in some embodiments, the data lineage GUI 110 uses the selected data element (Data Element 0, in the current example) as its “focal point,” computing the levels of the upstream and downstream data elements based on the selected data element. Accordingly, in some embodiments, data elements that are located at the same level in the data lineage as the selected data element may be omitted from the data lineage GUI 110. Referring to the embodiment depicted in
Slider navigation element 116, in some embodiments, may include a level-selection window that is usable to select one or more levels of the data lineage for which to view additional information via the data lineage GUI 110 (e.g., using the detailed view 402 of
In various embodiments, the slider navigation element 116 depicts the various levels of the data lineage and is usable to select one or more levels for which to view additional information via one or more components of the data lineage GUI 110. For example, in
In
Referring now to
In various embodiments, the data lineage GUI 110 is operable to graphically depict the direct relationship between the various data elements presented via the GUI 110. For example, in some embodiments, the data lineage GUI 110 allows a user to indicate a data element shown in the expanded detailed view 702 and, in response to this indication, is operable to visually emphasize (e.g., highlight) those data elements shown in the expanded detailed view 702 that are directly related (also referred to herein as “linked”) to this data element. In the depicted embodiment, for example, a user may select (e.g., using any suitable input technique, such as a cursor, keyboard, touchscreen, etc.) data element 720B in the level-specific panel 712A and the data lineage GUI 110 will visually emphasize the data elements in the level-specific panel 712B that are directly related to data element 720B (data elements 720D and 720G, in the present example). For example, in response to the user selecting the data element 720B, the interface generation module 102 may use the graph data structure described above to identify the node corresponding to data element 720B and any nodes that are directly related (e.g., nodes having edges that begin or terminate at) that node. In such embodiments, the data lineage GUI 110 may then highlight (or otherwise graphically emphasize) those data elements 720 that are directly related to the data element 720B (chosen by the user, in the current example).
Referring now to
At 802, in the illustrated embodiment, the interface generation module 102 accesses data lineage information that specifies a directed graph indicative of a data lineage associated with a plurality of data elements, where the plurality of data elements are represented in the data lineage information as a plurality of nodes and where, in the directed graph, the plurality of nodes are connected by a plurality of edges indicative of data lineage relationships between the plurality of data elements. For example, as described above with reference to
At 804, in the illustrated embodiment, the interface generation module 102 generates a data lineage GUI that, for a selected data element of the plurality of data elements, is usable to navigate the different levels of the data lineage associated with the plurality of data elements in an upstream and downstream direction relative to a particular level of the selected data element. For example, referring to the non-limiting example described above with reference to
In various embodiments, the interface generation module 102 generates the UI data 108 used to populate the data lineage GUI 110 based on the data lineage information 150. For example, in some embodiments, method 800 may include analyzing the data lineage information 150 to identify one or more edges associated with a particular node that corresponds to the selected data element. Based on these one or more edges, the interface generation module 102 may determine a first subset of the plurality of nodes that are in an upstream direction, in the directed graph, relative to the particular node, where the first subset of nodes correspond to a first subset of data elements located at upstream levels in the data lineage relative to the selected data element. Similarly, in some embodiments, the interface generation module 102 may determine, based on the one or more edges, a second subset of the plurality of nodes that are in a downstream direction, in the directed graph, relative to the particular node, where the second subset of nodes correspond to a second subset of data elements located at downstream levels in the data lineage relative to the selected data element.
As noted above, in some embodiments the data lineage GUI 110 may be used to present a “detailed” view, as described above in reference to
Referring now to
Processor subsystem 920 may include one or more processors or processing units. In various embodiments of computer system 900, multiple instances of processor subsystem 920 may be coupled to interconnect 980. In various embodiments, processor subsystem 920 (or each processor unit within 920) may contain a cache or other form of on-board memory.
System memory 940 is usable to store program instructions executable by processor subsystem 920 to cause system 900 perform various operations described herein. System memory 940 may be implemented using different physical, non-transitory memory media, such as hard disk storage, floppy disk storage, removable disk storage, flash memory, random access memory (RAM—SRAM, EDO RAM, SDRAM, DDR SDRAM, RAMBUS RAM, etc.), read only memory (PROM, EEPROM, etc.), and so on. Memory in computer system 900 is not limited to primary storage such as system memory 940. Rather, computer system 900 may also include other forms of storage such as cache memory in processor subsystem 920 and secondary storage on I/O devices 970 (e.g., a hard drive, storage array, etc.). In some embodiments, these other forms of storage may also store program instructions executable by processor subsystem 920.
I/O interfaces 960 may be any of various types of interfaces configured to couple to and communicate with other devices, according to various embodiments. In one embodiment, I/O interface 960 is a bridge chip (e.g., Southbridge) from a front-side to one or more back-side buses. I/O interfaces 960 may be coupled to one or more I/O devices 970 via one or more corresponding buses or other interfaces. Examples of I/O devices 970 include storage devices (hard drive, optical drive, removable flash drive, storage array, SAN, or their associated controller), network interface devices (e.g., to a local or wide-area network), or other devices (e.g., graphics, user interface devices, etc.). In one embodiment, I/O devices 970 includes a network interface device (e.g., configured to communicate over WiFi, Bluetooth, Ethernet, etc.), and computer system 900 is coupled to a network via the network interface device.
The present disclosure includes references to “embodiments,” which are non-limiting implementations of the disclosed concepts. References to “an embodiment,” “one embodiment,” “a particular embodiment,” “some embodiments,” “various embodiments,” and the like do not necessarily refer to the same embodiment. A large number of possible embodiments are contemplated, including specific embodiments described in detail, as well as modifications or alternatives that fall within the spirit or scope of the disclosure. Not all embodiments will necessarily manifest any or all of the potential advantages described herein.
Unless stated otherwise, the specific embodiments described herein are not intended to limit the scope of claims that are drafted based on this disclosure to the disclosed forms, even where only a single example is described with respect to a particular feature. The disclosed embodiments are thus intended to be illustrative rather than restrictive, absent any statements to the contrary. The application is intended to cover such alternatives, modifications, and equivalents that would be apparent to a person skilled in the art having the benefit of this disclosure.
Particular features, structures, or characteristics may be combined in any suitable manner consistent with this disclosure. The disclosure is thus intended to include any feature or combination of features disclosed herein (either explicitly or implicitly), or any generalization thereof. Accordingly, new claims may be formulated during prosecution of this application (or an application claiming priority thereto) to any such combination of features. In particular, with reference to the appended claims, features from dependent claims may be combined with those of the independent claims and features from respective independent claims may be combined in any appropriate manner and not merely in the specific combinations enumerated in the appended claims.
For example, while the appended dependent claims are drafted such that each depends on a single other claim, additional dependencies are also contemplated, including the following: Claim 3 (could depend from any of claims 1-2); claim 4 (any preceding claim); claim 5 (claim 4), etc. Where appropriate, it is also contemplated that claims drafted in one statutory type (e.g., apparatus) suggest corresponding claims of another statutory type (e.g., method).
Because this disclosure is a legal document, various terms and phrases may be subject to administrative and judicial interpretation. Public notice is hereby given that the following paragraphs, as well as definitions provided throughout the disclosure, are to be used in determining how to interpret claims that are drafted based on this disclosure.
References to the singular forms such “a,” “an,” and “the” are intended to mean “one or more” unless the context clearly dictates otherwise. Reference to “an item” in a claim thus does not preclude additional instances of the item.
The word “may” is used herein in a permissive sense (i.e., having the potential to, being able to) and not in a mandatory sense (i.e., must).
The terms “comprising” and “including,” and forms thereof, are open-ended and mean “including, but not limited to.”
When the term “or” is used in this disclosure with respect to a list of options, it will generally be understood to be used in the inclusive sense unless the context provides otherwise. Thus, a recitation of “x or y” is equivalent to “x or y, or both,” covering x but not y, y but not x, and both x and y. On the other hand, a phrase such as “either x or y, but not both” makes clear that “or” is being used in the exclusive sense.
A recitation of “w, x, y, or z, or any combination thereof” or “at least one of . . . w, x, y, and z” is intended to cover all possibilities involving a single element up to the total number of elements in the set. For example, given the set [w, x, y, z], these phrasings cover any single element of the set (e.g., w but not x, y, or z), any two elements (e.g., w and x, but not y or z), any three elements (e.g., w, x, and y, but not z), and all four elements. The phrase “at least one of . . . w, x, y, and z” thus refers to at least one of element of the set [w, x, y, z], thereby covering all possible combinations in this list of options. This phrase is not to be interpreted to require that there is at least one instance of w, at least one instance of x, at least one instance of y, and at least one instance of z.
Various “labels” may proceed nouns in this disclosure. Unless context provides otherwise, different labels used for a feature (e.g., “first circuit,” “second circuit,” “particular circuit,” “given circuit,” etc.) refer to different instances of the feature. The labels “first,” “second,” and “third” when applied to a particular feature do not imply any type of ordering (e.g., spatial, temporal, logical, etc.), unless stated otherwise.
Within this disclosure, different entities (which may variously be referred to as “units,” “circuits,” other components, etc.) may be described or claimed as “configured” to perform one or more tasks or operations. This formulation—“[entity] configured to [perform one or more tasks]”—is used herein to refer to structure (i.e. something physical). More specifically, this formulation is used to indicate that this structure is arranged to perform the one or more tasks during operation. A structure can be said to be “configured to” perform some task even if the structure is not currently being operated. A “memory device configured to store data” is intended to cover, for example, an integrated circuit that has circuitry that performs this function during operation, even if the integrated circuit in question is not currently being used (e.g., a power supply is not connected to it). Thus, an entity described or recited as “configured to” perform some task refers to something physical, such as a device, circuit, memory storing program instructions executable to implement the task, etc. This phrase is not used herein to refer to something intangible.
The term “configured to” is not intended to mean “configurable to.” An unprogrammed FPGA, for example, would not be considered to be “configured to” perform some specific function. This unprogrammed FPGA may be “configurable to” perform that function, however.
Reciting in the appended claims that a structure is “configured to” perform one or more tasks is expressly intended not to invoke 35 U.S.C. § 112(f) for that claim element. Should Applicant wish to invoke Section 112(f) during prosecution, it will recite claim elements using the “means for [performing a function]” construct.
The phrase “based on” is used to describe one or more factors that affect a determination. This term does not foreclose the possibility that additional factors may affect the determination. That is, a determination may be solely based on specified factors or based on the specified factors as well as other, unspecified factors. Consider the phrase “determine A based on B.” This phrase specifies that B is a factor that is used to determine A or that affects the determination of A. This phrase does not foreclose that the determination of A may also be based on some other factor, such as C. This phrase is also intended to cover an embodiment in which A is determined based solely on B. As used herein, the phrase “based on” is synonymous with the phrase “based at least in part on.”
The phrase “in response to” describes one or more factors that trigger an effect. This phrase does not foreclose the possibility that additional factors may affect or otherwise trigger the effect. That is, an effect may be solely in response to those factors, or may be in response to the specified factors as well as other, unspecified factors. Consider the phrase “perform A in response to B.” This phrase specifies that B is a factor that triggers the performance of A. This phrase does not foreclose that performing A may also be in response to some other factor, such as C. This phrase is also intended to cover an embodiment in which A is performed solely in response to B.
In this disclosure, various “modules” operable to perform designated functions are shown in the figures and described in detail (e.g., interface generation module 102). As used herein, a “module” refers to software or hardware that is operable to perform a specified set of operations. A module may refer to a set of software instructions that are executable by a computer system to perform the set of operations. A module may also refer to hardware that is configured to perform the set of operations. A hardware module may constitute general-purpose hardware as well as a non-transitory computer-readable medium that stores program instructions, or specialized hardware such as a customized ASIC. Accordingly, a module that is described as being “executable” to perform operations refers to a software module, while a module that is described as being “configured” to perform operations refers to a hardware module. A module that is described as “operable” to perform operations refers to a software module, a hardware module, or some combination thereof. Further, for any discussion herein that refers to a module that is “executable” to perform certain operations, it is to be understood that those operations may be implemented, in other embodiments, by a hardware module “configured” to perform the operations, and vice versa.
Number | Date | Country | Kind |
---|---|---|---|
PCT/CN2021/074652 | Feb 2021 | CN | national |
The present application claims priority to PCT Appl. No. PCT/CN2021/074652, filed Feb. 1, 2021, which is incorporated by reference herein in its entirety.