Embodiments of the present invention generally relate to data confidence fabrics (DCF). More particularly, at least some embodiments of the invention relate to systems, hardware, software, computer-readable media, and methods, for data confidence operations and configurations as they relate to resource utilization in a DCF.
In a computing environment such as a DCF, edge devices consume a certain amount of computing resources, such as power, CPU (central processing unit) cycles, RAM, and other resources. At present however, there is a lack of understanding with respect to various questions relating to resource consumption and utilization, particularly in a DCF context. Such questions may include, for example: (i) can the current state of an overly consumed edge device X, that is, a device X that is consuming too much of one or more resources, be improved?; (ii) are there edge devices that are faulty insofar as they are consuming too much of a particular resource, such as power?; and (iii) are there edge devices which are constantly busy, such that their consumption of resources may be shortening the life of those resources?; (iv) to what extent, and how, does overutilization of a resource, or resources, by an edge device impact data confidence?
In order to describe the manner in which at least some of the advantages and features of the invention may be obtained, a more particular description of embodiments of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, embodiments of the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings.
Embodiments of the present invention generally relate to data confidence fabrics (DCF). More particularly, at least some embodiments of the invention relate to systems, hardware, software, computer-readable media, and methods, for data confidence operations and configurations as they relate to resource utilization in a DCF.
In one example embodiment, data may flow from a device such as an edge device into a DCF. The DCF may annotate the data with annotations, possibly in the form of DCF metadata, indicating, for example, that a TPM (trusted platform module) has added a device signature to the data and has verified a secure boot of that device, and further indicating that AuthZ/N software is currently in place, protecting the device and data against unauthorized access. A profiler deployed in the DCF may be monitoring resource utilization by the device from which the data is received. The profiler may determine, and report, that a ‘consumption’ DCF parameter, or simply a ‘consumption parameter,’ indicates that one or more aspects of resource consumption by the device are outside of established boundaries, and/or are otherwise problematic. As a result of having made this determination, the profiler may generate a corresponding report identifying the problems. The report from the profiler may cause generation of a DCF annotation indicating that a “Consumption” DCF parameter has thus failed to meet its confidence criteria. In an embodiment, the confidence score of the ‘Consumption’ DCF parameter may thus be set to zero (0) when resource consumption by the device is outside of a limit established for that device. When resource consumption by the device is within established limits for the device, the confidence score of the ‘Consumption’ DCF parameter may be set to one (1). In an embodiment then, the confidence score may be Boolean, that is, binary, rather than being expressed as an actual numerical score. Finally, the addition of the confidence score concerning the consumption parameter may increase an overall confidence score for the data, or leave the confidence score for the data unchanged. In an embodiment, the confidence score concerning the consumption parameter may have a negative value so that when added to the overall confidence score for the data, the negative value will decrease the overall confidence score.
Embodiments of the invention, such as the examples disclosed herein, may be beneficial in a variety of respects. For example, and as will be apparent from the present disclosure, one or more embodiments of the invention may provide one or more advantageous and unexpected effects, in any combination, some examples of which are set forth below. It should be noted that such effects are neither intended, nor should be construed, to limit the scope of the claimed invention in any way. It should further be noted that nothing herein should be construed as constituting an essential or indispensable element of any invention or embodiment. Rather, various aspects of the disclosed embodiments may be combined in a variety of ways so as to define yet further embodiments. For example, any element(s) of any embodiment may be combined with any element(s) of any other embodiment, to define still further embodiments. Such further embodiments are considered as being within the scope of this disclosure. As well, none of the embodiments embraced within the scope of this disclosure should be construed as resolving, or being limited to the resolution of, any particular problem(s). Nor should any such embodiments be construed to implement, or be limited to implementation of, any particular technical effect(s) or solution(s). Finally, it is not required that any embodiment implement any of the advantageous and unexpected effects disclosed herein.
In particular, one advantageous aspect of an embodiment of the invention is that resource utilization, including overutilization, by a DCF element may be taken into consideration when determining a confidence score for data associated with that DCF element. An embodiment may enable an admin to identify and resolve problems related to resource utilization in a DCF. An embodiment may enable an enterprise to reduce capital and operating expenses relating to the implementation and operation of a DCF. Various other advantages of one or more example embodiments will be apparent from this disclosure.
It is noted that embodiments of the invention, whether claimed or not, cannot be performed, practically or otherwise, in the mind of a human. Accordingly, nothing herein should be construed as teaching or suggesting that any aspect of any embodiment of the invention could or would be performed, practically or otherwise, in the mind of a human. Further, and unless explicitly indicated otherwise herein, the disclosed methods, processes, and operations, are contemplated as being implemented by computing systems that may comprise hardware and/or software. That is, such methods processes, and operations, are defined as being computer-implemented.
The following is a discussion of aspects of example operating environments for various embodiments of the invention. This discussion is not intended to limit the scope of the invention, or the applicability of the embodiments, in any way. In general, embodiments of the invention may be implemented in connection with systems, software, and components, that individually and/or collectively implement, and/or cause the implementation of, a data confidence fabric (DCF).
With reference now to
As shown in
The trust metadata 112a, 112b, and 112c, may comprise, for example, respective confidence scores associated with trust insertion processes performed by the nodes with respect to the data 102. The trust metadata 112a, 112b, and 112c may be associated with the data 102 by respective node APIs (Application Program Interfaces) 104a, 106a, and 108a that communicate with an interface 114 such as an Alvarium SDK (Software Development Kit). After the data 102 has transited the various nodes, the final, comprehensive trust metadata 112c may be entered into a ledger 116 which may make the trust metadata 112c available for use by the applications 110. Note that, in this example, the trust metadata 112c is an accumulation of all the trust metadata respectively added by the gateway 104, edge server 106, and cloud ecosystem 108.
To illustrate with reference to the specific example of
As noted earlier, the DCF metadata, that is, the trust metadata 112a, ultimately arrives at the ledger 116, where a ledger entry may be created that permanently records the contents of the trust metadata 112a table as well as an overall Confidence Score, which is 6.0 in this illustrative example. Note that the equation used to calculate the Confidence Score in the example of
A useful aspect of the example DCF 100 is that, as a result of the annotation of trust metadata 112a, 112b, and 112c, the application 110 may have access to additional context about the trustworthiness of the data 102, addressing the problem of potentially untrustworthy or malicious data sources. The problems presented by such data sources is increasingly faced by enterprise customers as they move their business logic closer to non-enterprise, and potentially untrustworthy, data sources at the edge and/or elsewhere. In the example DCF 100, the path of the data 102 may be largely software-dependent, in the sense that data path handling software, which may comprise a respective instance at each of the gateway 104, edge server 106, and cloud ecosystem 108, may call an annotation/scoring API 104a, 106a, and 108a, respectively, and routing software may be provided at these nodes that forwards the annotations along the data path. However, such software dependencies in a DCF, such as the DCF 100 for example, may lead to vulnerabilities in the trustworthiness of the actual DCF metadata, that is, the trust metadata 112a, 112b, and/or 112c, for example. Examples of such potential vulnerabilities are described below.
It is noted that as used herein, the term ‘data’ is intended to be broad in scope. Thus, that term embraces, by way of example and not limitation, data segments such as may be produced by data stream segmentation processes, data chunks, data blocks, atomic data, emails, objects of any type, files of any type including media files, word processing files, spreadsheet files, and database files, as well as contacts, directories, sub-directories, volumes, and any group of one or more of the foregoing. Example embodiments of the invention may be applicable to any system capable of storing and handling various types of objects, in analog, digital, or other form. Although terms such as document, file, segment, block, or object may be used by way of example, the principles of the disclosure are not limited to any particular form of representing and storing data or other information. Rather, such principles are equally applicable to any object capable of representing information.
In an embodiment, a Data Confidence Fabric (DCF) “Resource Consumption Score” (DCF-RCS) may be defined that may be used as the basis for generation of one or more confidence scores. The DCF-RS may be calculated and stored in any number of ways. Thus, one example embodiment may define and implement two confidence scores, namely, a first confidence score that pertains to data traffic that is flowing over the DCF, and a second confidence score that pertains to a DCF node that is handling the data. In both cases, respective confidence scores may be generated and used that apply to one, some, or all, nodes of a DCF.
In the case of the latter confidence score, pertaining to node resource consumption, that confidence score may be indicative of various status of a node such as, for example, (1) whether the node is, or is becoming, overburdened, that is, the node lacks the resources need to carry out its workload within parameters, such as may be specified in an SLA for example, and (2) whether the node is trending towards consuming, or is consuming, more resources than have been allocated to that node for performance of its tasks. Depending on the circumstances, the latter confidence score may indicate, for example, that a node is simply over-subscribed in terms of its resource consumption, and/or that there is a problem with the hardware/software of the node causing the node to behave in a particular way with respect to its resource consumption.
Thus, in an embodiment, a DCF-RCS measurement for a node may, for example, be 100% if that node is experiencing normal traffic and is not overburdened in any way. But if the data traffic coming into the node is measured to be excessive, or far out of the normal, or historical, range, the DCF-RCS measurement may be reduced. Similarly, if the life span of the parts on the node are noticed to be reaching their limit, the DCF-RCS measurement may be reduced accordingly. In an embodiment, the DCF-RCS measurements, which may comprise direct measurements of resource consumption, may be mapped to particular confidence values, which may be expressed in binary form. To illustrate, if a measurement indicates that a node is overconsuming a resource by 10 percent over an established limit, the corresponding confidence score may be zero (0), which may indicate that resource usage by the node is over the limit. Similarly, if a measurement indicates that a node is consuming 90 percent of available resources, as expressed by a resource appropriate metric, such as “X’ 1000 IOs for a processor, the corresponding confidence score may be one (1), which may indicate that resource usage by the node is within an acceptable range. Note that significant underconsumption of a resource, such as 5 percent of available resources, may indicate a problem with the node as well, such as a hardware/software problem, and the corresponding confidence score may be set to zero (0) to reflect that a problem may exist.
DCF-RS measurements may be calculated and stored in various ways, and these measurements may be mapped to, or used to determine, one or more corresponding confidence scores. In one embodiment, for example, DCF-RS measurements may be calculated occasionally and reported to a centralized service. Over time, a DCF capabilities graph, which may comprise information and metadata about computing resources deployed in a DCF, may be generated/updated to contain pointers to time-series databases keeping these DCF-RCS measurements. In another embodiment, DCF-RS measurements may be stored as implicit DCF annotations as in the case where a DCF-RS measurement maps to a confidence score, or may be stored as explicit DCF annotations where the measurement itself comprises the confidence score, and in either case, the DCR-RS measurements may indicate indirectly (implicit) or directly (explicit) the resource consumption health of the node that is performing annotation as the data is flowing through that node. A DCF-RCS measurement may fall within, or outside, a specified range and designate a ‘healthy’ (within range) or ‘over-subscribed’ (out of range) condition depending upon where the DCF-RS measurement falls with respect to the range. Alternatively, a DCF-RCS measurement may be a Boolean that indicates, for a node, either ‘healthy’ or ‘over-subscribed.’
With reference now to
In an embodiment, as the data 204 flows through a DCF, including node 202, the DCF may perform various annotations with respect to the data 204. For example, the DCF may annotate the data 204 to indicate that a ‘TPM’ has added a device signature and verified a secure boot, and to indicate that the ‘AuthZ/N’ software is currently in place, protecting against unauthorized access. However, a report from the profiler 206 may cause the generation and application, to the data 204, of a DCF annotation, which may be expressed as a Boolean (‘success’ or ‘failure’) rather than as an actual value of a resource consumption measurement, indicating that a consumption parameter has failed to meet its confidence criteria. That is, for example, overuse of a computing resource by the node 202 may result in a confidence score of zero (0) for a ‘consumption’ confidence parameter 210 that may be one of a group of confidence parameters 212, and associated values, that may be used to annotate the data 204. The confidence parameters 212 and their respective values may collectively form a set of trust metadata. As noted elsewhere herein, overuse of a computing resource may indicate a problem of some kind concerning the hardware/software configuration, and/or operation, of the node 202, and the existence of such a problem may thus be reflected in the zero (0) value for the ‘consumption’ confidence parameter 210.
In the illustrative example of
In any case, a confidence score of zero (0) may indicate low/no confidence that the node 202 will be able to handle current/future workloads in a timely and effective manner. This information may be used to steer workloads away from node 202 to other nodes that are not oversubscribed in terms of their resource consumption. On the other hand, a confidence score of one (1) may indicate confidence that the node 202 is able to at least handle its current workloads, and possibly additional workloads. Thus, in an embodiment, DCF-RS measurements, and the associated confidence scores, may be used to enable workload allocation and workload balancing in a DCF.
Where DCF-RS measurements indicate overconsumption of one or more computing resources by a node, an admin or other operator may use this information to begin identifying and solving for a variety of problems. Thus, the DCF-RS measurements are not limited to use for confidence scoring, but may also be used for other purposes and applications as well. Some examples of questions that may be addressed through the use of DCF-RS measurements are set forth hereafter.
For example, is traffic that requires the use of node resources coming from the right resource—that is, is the node processing traffic that should have been directed to one or more other nodes instead? As another example, given that a node has a zero (0) confidence score for its resource consumption parameter, should one or more other nodes instead be used to service some or all of the traffic directed to the node with the zero (0) confidence score? Further, the use of DCF-RS measurements may help to determine whether or not a DCF environment, or a portion of it, is overly busy in terms of the amount of traffic/operations that are consuming computing resources. Related to that, an embodiment may also help determine whether new or replacement hardware is needed at the node. In an embodiment, a ‘heat map’ may be created that combines data confidence information with resource consumption scores. Such a heat map may be used to identify trouble spots in a DCF where resources are being overconsumed, or trending toward overconsumption, and confidence scores are correspondingly low. When the trouble spots are identified, adjustments to workload allocations, and hardware configurations, may be implemented to resolve any problems.
It is noted with respect to the disclosed methods, including the example method of
Directing attention now to
Periodically and/or on an ad hoc basis, or any other basis, a check 304 may be performed to determine if resource consumption is within a defined range. At this point, various alternatives may be possible, for any of which a report may be generated:
Finally, one or more actions 310 may be taken based on values of and/or changes to a confidence score for resource consumption and/or an overall confidence score for the node. Such actions 310 may include, but are not limited to, adding or replacing hardware of the node, and balancing a workload in a DCF where the node is located.
Following are some further example embodiments of the invention. These are presented only by way of example and are not intended to limit the scope of the invention in any way.
The embodiments disclosed herein may include the use of a special purpose or general-purpose computer including various computer hardware or software modules, as discussed in greater detail below. A computer may include a processor and computer storage media carrying instructions that, when executed by the processor and/or caused to be executed by the processor, perform any one or more of the methods disclosed herein, or any part(s) of any method disclosed.
As indicated above, embodiments within the scope of the present invention also include computer storage media, which are physical media for carrying or having computer-executable instructions or data structures stored thereon. Such computer storage media may be any available physical media that may be accessed by a general purpose or special purpose computer.
By way of example, and not limitation, such computer storage media may comprise hardware storage such as solid state disk/device (SSD), RAM, ROM, EEPROM, CD-ROM, flash memory, phase-change memory (“PCM”), or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other hardware storage devices which may be used to store program code in the form of computer-executable instructions or data structures, which may be accessed and executed by a general-purpose or special-purpose computer system to implement the disclosed functionality of the invention. Combinations of the above should also be included within the scope of computer storage media. Such media are also examples of non-transitory storage media, and non-transitory storage media also embraces cloud-based storage systems and structures, although the scope of the invention is not limited to these examples of non-transitory storage media.
Computer-executable instructions comprise, for example, instructions and data which, when executed, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. As such, some embodiments of the invention may be downloadable to one or more systems or devices, for example, from a website, mesh topology, or other source. As well, the scope of the invention embraces any hardware system or device that comprises an instance of an application that comprises the disclosed executable instructions.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts disclosed herein are disclosed as example forms of implementing the claims.
As used herein, the term ‘module’ or ‘component’ may refer to software objects or routines that execute on the computing system. The different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system, for example, as separate threads. While the system and methods described herein may be implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated. In the present disclosure, a ‘computing entity’ may be any computing system as previously defined herein, or any module or combination of modules running on a computing system.
In at least some instances, a hardware processor is provided that is operable to carry out executable instructions for performing a method or process, such as the methods and processes disclosed herein. The hardware processor may or may not comprise an element of other hardware, such as the computing devices and systems disclosed herein.
In terms of computing environments, embodiments of the invention may be performed in client-server environments, whether network or local environments, or in any other suitable environment. Suitable operating environments for at least some embodiments of the invention include cloud computing environments where one or more of a client, server, or other machine may reside and operate in a cloud environment.
With reference briefly now to
In the example of
Such executable instructions may take various forms including, for example, instructions executable to perform any method or portion thereof disclosed herein, and/or executable by/at any of a storage site, whether on-premises at an enterprise, or a cloud computing site, client, datacenter, data protection site including a cloud storage site, or backup server, to perform any of the functions disclosed herein. As well, such instructions may be executable to perform any of the other operations and methods, and any portions thereof, disclosed herein.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.