The present disclosure relates generally to distributed-computing systems and, more specifically, to methods and systems that enable providing entity data services for virtualized computing and data systems.
Modern computing systems provide distributed and virtualized entities for computing and data services. Such services may be provided by a software designed data center (SDDC) that may implement one or more virtual storage area networks (e.g., a vSAN) and a virtual disk file system (e.g., a vDFS). These distributed systems provide virtualized entities (e.g., virtual machines, virtual storage disks, virtual network components, and the like) to users within a cloud-computing environment. For instance, cloud providers may provide virtualized entities to an organization to generate and operate the organization's computing “cloud.” Often, a user of the organization may be responsible for managing and maintaining compliance of these entities with policies of the organization and/or cloud provider. For example, a group of users may be allotted with a maximum number for a particular entity type, a particular entity may not consume more than an allotted amount of resources (e.g., CPU cycles, storage volume, network bandwidth), and the like. The user tasked with managing and maintaining the organization's cloud may routinely query the entities to determine the entities' status and/or compliance with respect to the policies. Some of these systems may be comprised of hundreds or even thousands of such entities. Thus, the manual operations required to manage and maintain an organization's cloud may be cumbersome and/or complex.
These entities may be provided by more than one cloud providers. For example, a first cloud provider may provide some of the virtual machines, while another provider may provide other virtual machines. Each cloud provider may have their own query system and/or syntax. Thus, the task of maintaining such a cloud is increased because the user may have to target their queries across multiple inconsistent systems. Thus, there is a need for an entity-related data service that is harmonized and unified across multiple platforms.
Overview
Described herein are techniques, for entity data services for virtualized computing and data systems. In one embodiment, a method for providing virtualized entity-related data services to a group of users is performed. The method may include receiving a data stream. The data stream encodes current state-information of a plurality of virtualized entities. Each virtualized entity of the plurality of entities may be provided by one or more virtualized-entity providers of a set of virtualized-entity providers. A graph database may be updated based on the received data stream. The graph database stores current graph data that indicates a plurality of current relationships between virtualized entities of the plurality of virtualized entities. A key-value database may be updated based on the received data stream. The key-value database may persistently store historical virtualized entity data for the group of users. A reverse-indexed database may be updated based on the received data stream. The reverse-indexed database stores globally-searchable current entity data for the plurality of virtualized entities. In response to receiving a query, one or more databases of the graph, key-value, and reverse-indexed databases may be identified based on content of the query. In response to providing the query to the one or more identified databases, search results may be received from each of the one or more identified databases. The search results received from each database of the one or more identified databases may be aggregated. The aggregated search results may encode a status of at least a first virtualized entity of the plurality of virtualized entities. An indication of the status of the first virtualized entity may be provided.
In one embodiment, a non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors is provided. The one or more programs stored by the non-transitory computer-readable storage medium include instructions for performing operations that are executable by a distributed computing system that provides virtualized entity-related data services to a group of users. The operations may include receiving a data stream. The data stream encodes current state-information of a plurality of virtualized entities. Each virtualized entity of the plurality of entities may be provided by one or more virtualized-entity providers of a set of virtualized-entity providers. A graph database may be updated based on the received data stream. The graph database stores current graph data that indicates a plurality of current relationships between virtualized entities of the plurality of virtualized entities. A key-value database may be updated based on the received data stream. The key-value database may persistently store historical virtualized entity data for the group of users. A reverse-indexed database may be updated based on the received data stream. The reverse-indexed database stores globally-searchable current entity data for the plurality of virtualized entities. In response to receiving a query, one or more databases of the graph, key-value, and reverse-indexed databases may be identified based on content of the query. In response to providing the query to the one or more identified databases, search results may be received from each of the one or more identified databases. The search results received from each database of the one or more identified databases may be aggregated. The aggregated search results may encode a status of at least a first virtualized entity of the plurality of virtualized entities. An indication of the status of the first virtualized entity may be provided.
In one embodiment, a distributed computing system for providing virtualized entity-related data services to a group of users may include one or more processors and memory. The memory may store one or more programs configured to be executed by the one or more processors. The one or more programs include instructions for performing operations comprising receiving a data stream. The data stream encodes current state-information of a plurality of virtualized entities. Each virtualized entity of the plurality of entities may be provided by one or more virtualized-entity providers of a set of virtualized-entity providers. A graph database may be updated based on the received data stream. The graph database stores current graph data that indicates a plurality of current relationships between virtualized entities of the plurality of virtualized entities. A key-value database may be updated based on the received data stream. The key-value database may persistently store historical virtualized entity data for the group of users. A reverse-indexed database may be updated based on the received data stream. The reverse-indexed database stores globally-searchable current entity data for the plurality of virtualized entities. In response to receiving a query, one or more databases of the graph, key-value, and reverse-indexed databases may be identified based on content of the query. In response to providing the query to the one or more identified databases, search results may be received from each of the one or more identified databases. The search results received from each database of the one or more identified databases may be aggregated. The aggregated search results may encode a status of at least a first virtualized entity of the plurality of virtualized entities. An indication of the status of the first virtualized entity may be provided.
In the following description of embodiments, reference is made to the accompanying drawings in which are shown by way of illustration specific embodiments that can be practiced. It is to be understood that other embodiments can be used and structural changes can be made without departing from the scope of the various embodiments.
Distributed computing systems, such as software designed data centers (SDDCs), may implement one or more virtual storage area networks (vSANs), which provide virtualized entities such as virtualized machines (VMs), virtualized data storage components (e.g., virtualized disks), virtualized network components, and the like. Such a virtualized computing component and/or service may be herein referred to as an entity. An entity may be a resource or asset. Thus, the terms “entity,” “asset,” or “resource,” may be employed interchangeably throughout. Entities are not limited to VMs and virtualized storage disks, and may include other virtualized components that offer various computing and/or data services via virtualized components, such as virtualized load balancers, gateways, user authentication components, security components, and the like. Such entities may be provided to a group of users (e.g., one or more employees of an organization) by one or more cloud-based computing service providers (e.g., cloud providers). A group of users may be associated with an organization, company, or the like. For example, a technology company may use multiple cloud providers to provide a set virtualized entities that comprise the company's “cloud.” The virtualized entities of the technology company's cloud provides computing and/or data services to customers of the company. As used herein, the term “cloud” may refer to any set of physical and/or virtual entities that are associated with a group. A cloud may be associated with a group. One or more cloud providers may provide the entities of any cloud. For example, the group may be company, and the group's cloud may include one or more vSANs, SDDCs, and the like. A first subset of the entities of the cloud may be provided by a first service provider, a second subset of the entities of a cloud may be provided by a second subset, and a third subset of the entities of the cloud may be provided by a third cloud provider.
A cloud may be created, monitored, and maintained such that the cloud remains in compliance with the needs and/or restrictions of the associated group of users and/or the cloud providers. That is, a cloud may be managed to remain in compliance with one or more policies associated with the group, their entities, or a cloud provider. A subset of the group of users may be tasked with managing the cloud for the group of users. Each user of the group of users may have a cloud account. Managing a cloud may include allocating new entities, deallocating (or terminating) stale or expired entities, resynchronizing stale entities, and ensuring each of the entities, each of the users, and the group as a whole maintains compliance with the one or more associated policies. One or more entity policies may be associated with one or more entities. One or more user policies may be associated with one or more users. One or more group policies may be associated with one or more groups. One or more provider policies may be associated with one or more cloud providers. Non-limiting examples of such policies include: a given group may be allowed to have a maximum number of VMs deployed at any one time; a specific VM may only be allowed to be deployed for a predetermined time period; an entity, a user, or a group may be only allocated a maximum volume of a virtualized computing resource (network bandwidth); a virtual disk may be only enabled to access a certain storage volume; and the like. Managing a cloud may include monitoring the entities and the one or more policies to ensure that each entity and the group is compliant with their associated policies.
Managing a cloud may include routinely querying the group's currently deployed entities to determine various information (or data) of the entities. Such information may be employed to ensure that the group, the group's users, and the group's entities stay in compliance with all applicable policies. Such queries may include, but are not limited to, how many entities of each entity type are currently deployed; which user (or group of users) has the most entities currently deployed; what are the relationships and/or dependencies between a subset of the entities; which entities have “child” or “parent” relationships and/or dependencies to other entities; which entities are provided by which cloud providers; and the like. Such queries may include queries regarding historical, but no longer existing entities.
In modern systems, a group of users may have hundreds, thousands, or even tens of thousands of currently deployed entities in their “cloud.” Thus, managing such a modern cloud may be difficult, complex, and/or cumbersome. The dynamic nature of modern clouds increase the difficulty in managing a cloud. For example, based on fluctuations of user demand, virtualized entities are routinely allocated and/or deallocated, and thus the set of currently extent entities may significantly vary over time. Still further increasing the difficultly of managing a cloud may be that the entities are provided by multiple cloud providers, with separate mechanisms (e.g., syntaxes for allocating and deallocating entities, querying the entities, and the like). For instance, a first subset of the group's entities may be provided by a first cloud provider, a second subset of the group's entities may be provided by a second cloud provider, and a third subset of the group's entities may be provided by a third cloud provider. Each cloud provider may provide separate and/or inconsistent mechanisms to query state information of the entities that they provide.
As such, in some embodiments, entity-related data services are provided to groups of user. The various embodiments herein are enabled to provide data to users of a group that indicates various information on any set of virtualized entities. That is, in some the embodiments, users are enabled to query and receive search results for current and/or historical information relating to any of their entities. An integrated platform is provided that accepts queries for entity information that is not specific to the cloud provider and/or not specific to a query type. That is, users of a group are not required to employ separate systems, separate search engines, and/or separate queries to request information about their entities and/or to manage their cloud. In some embodiments, automated management services are provided. For instance, in one embodiment, current state of the group's cloud may be monitored, via the entity data services, and the current state is compared to each of the policies associated with the group. If one or more entities become in violation (or out of compliance) with one or more of the policies, an automated warning message may be provided to one or more users of the group, such that the user may take action to bring the cloud entities back into compliance. In at least one embodiment, one or more automated actions may be taken to update one or more entities to bring the cloud back into compliance with the one or more violated policies.
More specifically, in some the embodiments, an entity data service system collects and integrates entity-related data from each of the cloud providers that provides one or more entities comprising the groups cloud. The entity-related data provided by the cloud providers may be in the form of a real-time data stream. The data stream from each of the cloud providers may encode state-information of the respective entities. State-information of a particular entity may include any current information (or data) related to the particular entity. For example, the state-information may include a current status of an entity, a current bandwidth of the entity, a current utilization of the entity, a timestamp associated with the entity (e.g., a timestamp indicating its creation or allocation, its expected expiration or deallocation) its current relationships to one or more other entities, its current size, its current owner, and the like.
In some embodiments, the data streams from the one or more cloud providers are collected, aggregated, and ingested, via a streaming service (e.g., a data stream integrator). More specifically, a data stream collector (e.g., Amazon Kinesis) may collect, aggregate, process, and/or at least partially analyze the data streams from the cloud providers. Upon collecting and aggregating the data streams, the data stream may be ingested, and at least portions of the entity-related data encoded data stream are stored in one or more databases. Some embodiments may include at least three separate databases. One of the databases may be a graph database configured to store relationships and/or dependencies of currently deployed entities, another database may be a key-value based database configured to store the current and historical state information of the entities, and another database may be an inverse-indexed (or Lucene-indexed) database configured to be searchable via a dynamic and distributed search engine. In some embodiments, a first database may be a graph database (e.g., Amazon Neptune), the second database may be a NoSQL database that includes a key-value store (e.g., Amazon DynamoDB), and a third database may be a database that supports unstructured and/or document-type data, such as but not limited to Elasticsearch or an Elasticsearch-type database. Some embodiments may include one or more search and analytic engines that are enabled to search each of the one or more databases for requested data, as well as analyze the data in the databases to generate analytics and/or metrics of the data.
Each of the databases may be intelligently sharded to ensure efficient lookups when servicing a query. That is, each of the databases may be intelligently partitioned into a plurality of shards or database slices, based on the group of users and the data, such that queried data may be efficiently located and retrieved from the databases. One or more policies of the group may indicate a sharding strategy for the databases. A policy may define one or more heuristics that indicates on which portions of the data content and/or data structures to partition the databases along. For instance, a policy may indicate a key for which to shard the database along. The sharding may be performed at the group or organizational level, the cloud account level, or the like. Some embodiments may be based on a representation state transfer (REST)-based architecture. Thus, these embodiments may support RESTful application programming interfaces (APIs) to implement at least some of its functionality. These embodiments may include RESTful APIs for querying the databases and for ingesting the data stream (e.g., updating each of the databases to include new entity-related data within the data stream). Some of the APIs may include SQL-style commands, operations, and/or syntax, while other APIs may include NoSQL-style commands, operations, and/or syntax. The APIs may include create, read, update, and delete (CRUD)—style commands, operation, and/or syntax.
The automated out-of-compliance warnings, queries, and query results may be provided to the user and/or the system via a query user interface (UI). Some embodiments support dynamic schemas, which may be configured at runtime. At least due to the intelligent sharding, these embodiments are highly scalable, as a group adds additional entities to their cloud. These embodiments provide high concurrency for both data ingestions and query servicing. Some embodiments may encrypt the data stored in each of the databases to ensure privacy. For additional privacy, the databases for a particular group of users may be isolated from the databases of other groups.
Virtualization layer 110 is installed on top of hardware platform 120. Virtualization layer 110, also referred to as a hypervisor, is a software layer that provides an execution environment within which multiple VMs 102 are concurrently instantiated and executed. The execution environment of each VM 102 includes virtualized components analogous to those comprising hardware platform 120 (e.g. a virtualized processor(s), virtualized memory, etc.). In this manner, virtualization layer 110 abstracts VMs 102 from physical hardware while enabling VMs 102 to share the physical resources of hardware platform 120. As a result of this abstraction, each VM 102 operates as though it has its own dedicated computing resources.
Each VM 102 includes operating system (OS) 106, also referred to as a guest operating system, and one or more applications (Apps) 104 running on or within OS 106. OS 106 (e.g., Darwin, RTXC, LINUX, UNIX, OS X, iOS, WINDOWS, or an embedded operating system such as VxWorks) includes various software components and/or drivers for controlling and managing general system tasks (e.g., memory management, storage device control, power management, etc.) and facilitates communication between various hardware and software components. As in a traditional computing environment, OS 106 provides the interface between Apps 104 (i.e. programs containing software code) and the hardware resources used to execute or run applications. However, in this case, the “hardware” is virtualized or emulated by virtualization layer 110. Consequently, Apps 104 generally operate as though they are in a traditional computing environment. That is, from the perspective of Apps 104, OS 106 appears to have access to dedicated hardware analogous to components of hardware platform 120.
It should be appreciated that applications (Apps) implementing aspects of the present disclosure are, in some embodiments, implemented as applications running within traditional computing environments (e.g., applications run on an operating system with dedicated physical hardware), virtualized computing environments (e.g., applications run on a guest operating system on virtualized hardware), containerized environments (e.g., applications packaged with dependencies and run within their own runtime environment), distributed-computing environments (e.g., applications run on or across multiple physical hosts) or any combination thereof. Furthermore, while specific implementations of virtualization and containerization are discussed, it should be recognized that other implementations of virtualization and containers could be used without departing from the scope of the various described embodiments.
As illustrated in
As noted throughout, the entities comprising the group's cloud may be provided by a set of cloud-based service providers 302. The entities may be virtualized entities (e.g., VMs, virtualized storage disks, virtualized network components, and the like). In the non-limiting example of
System 300 may include a data stream integrator 310. As shown in
System 300 may include one or more databases. In the non-limiting embodiment of
Graph database 340 may encode a graph representation of the current entities, as well as various relationships between the current entities. In various embodiments, each of the group's entities may be represented by a corresponding node in the graph. Relationships between the entities may be represented by directed or undirected edges between the corresponding nodes. System 300 may employ a data model (discussed below) that supports dynamic schema modeling, where the schema may be configured at runtime. The data model may include at least a first data structure (e.g., an entity data structure) and a second data structure (an entity relationship data structure). Thus, the nodes of the graph may be encoded in the entity data structure and the edges of the graph may be encoded in an entity relationship data structure. In various embodiments, the graph database 340 may be implemented on a cloud-based database provider. For instance, graph database 340 may be implemented on Amazon Neptune, or another such cloud-based graph database provider. In some embodiments, graph database 340 may support a graph traversal query language, such as but not limited to the Gremlin language. In some embodiments, the graph database may primarily store data that is current (via the entity and entity relationship data structures), with respect to the group's current entities.
Key-value database 342 may include a key-value store that persistently stores information relating to the group's entities. In contrast to the graph database 340 which is directed at storing the group's current entities and current relationships amongst the current entities, key-value database 342 may persistently store historical information regarding the group's current and historical entities, as well as preserving the historical and current relationships between the entities. Key-value database 342 may be a NoSQL database that supports structuring data as key-value pairs. In some embodiments, the keys and corresponding values stored in the key-value database 342 may be the keys and values of the entity and entity relationship data structures. In various embodiments, key-value database 342 may be implemented by a cloud-based database provider. For instance, key-value database 342 may be implemented by Amazon DynamoDB, or another such cloud-based database provider.
Inverse-indexed database 344 may be a globally reverse-indexed database that stores at least portions of the data stream in a document data structure, such that the group's entity data may be globally searched, aggregated, and analyzed. In some embodiments, reverse-indexed database 344 may be a Lucene-indexed database. The entity and entity relationship data stored in a document data structure in the reverse-indexed database 344 may be globally searchable via a search engine. In at least one embodiment, the inverse-indexed database 344 may be implemented via search and analytics provider. For example, inverse-indexed database 344 may be implemented via Elasticsearch. In such embodiments, the inverse-indexed database 344 may be searched, and the search results may be aggregated and analyzed via all the capabilities that an Elastisearch-based system provides.
Some embodiments may be based on a representation state transfer (REST)-based architecture. Thus, these embodiments may support RESTful application programming interfaces (APIs) to implement at least some of its functionality. As such, data service provider may include a REST API server 322. The REST API server 322 may include an ingest API module 324, a create, read, update, and delete (CRUD) API module 326, and a query API module 328. The REST API server 322 is generally responsible for receiving and servicing RESTful API calls pertaining to various query functions of the system 300. The ingest API module 324 is generally responsible for serving ingest API calls, which act to format and ingest the received data stream into the various databases. In some embodiments, one ingest API call may enable the bulk updating of the databases with the data received from the data stream. More specifically, via ingest APIs; system 300 may accept updates to the databases for multiple entities and multiple relationships between the entities, from multiple cloud-based service providers 302. Thus, system 300 provides for updating multiple databases with data from multiple cloud providers for multiple entities, which encodes relationships between the multiple entities from separate cloud providers. The CRUD API module 326 is generally responsible for receiving and servicing CRUD API calls that enable the creation, reading (or access), updating, and deleting of entities. Such CRUD APIs may not be specific to a specific cloud provider. Thus, user 352 may employ a single set of CRUD API calls to perform CRUD-type operations on entities from multiple providers.
Query API module 328 is generally responsible for receiving and servicing query API calls. For example, user 352 may provide a query API call to search one or more of the databases. Such query API calls include, but are not limited to functions calls for querying one or more of the databases. Such querying APIs support rich filtering, paginations, and sorting of the query results. Note that these query APIs need not be cloud provider specific. That is, the query APS, as well as the ingest and CRUD APIs may be agnostic as to the provider of the entities. Thus, the user 352 may employ a single uniform set of query APIs to query entities from separate cloud providers. Some query API calls may enable traversing and exploring the entities and relationships between entities, via traversing and exploring the graph database 340. Query APIs specific to the graph database 340 may be coded in a graph traversal language, such as but not limited to the Gremlin language. Graph-related APIs may additional enable querying the graph to determine a degree of centrality and/or connectedness for a set of the entities. Query APIs may support aggregations and sub-aggregations of the entity data, as well as filtering of the data.
As discussed throughout, the data service provider 320 may format and store at least portions of the data stream received from the data stream integrator 310 into each of the databases included in the system 300. In some embodiments, subsets of the data stream are stored in the separate databases. In some embodiments, there may be at least some overlap in the subsections of the data stored in the separate databases. Data service provider 320 may include a data ingestor 334 that is generally responsible for ingesting the data stream, formatting the data stream, and/or generating the subsets of the formatted data stream to insert in the databases. Data ingestor 334 may receive ingest API calls from the ingest API module. Via ingest API calls; system 300 may ingest entity-related data that encodes various state information of the entities and the relationships between the entities. Such ingest API calls may enable bulk updating of the databases, with regards to entities from multiple cloud providers.
The data ingestor 334 (or the data stream integrator 310) may format the data into one or more data structures for insertion into the databases. In at least one embodiment, the data structure may be a JavaScript object that is encoded in JavaScript Object Notation (JSON). A JSON object may encode data for the object as key-value pairs. In some embodiments, at least two data structures may be employed. A first data structure may encode an entity, while a second data structure may encode relationships between the entities. The entity data structure may encode an entity as a JSON object. Some of the keys in an entity data structure may include, but are not limited to, a graph identifier, an entity identifier, an entity type, a user identifier, a service identifier, a provider identifier, one or more tags, one or more properties, a creation timestamp, and a last updated timestamp. The value for the graph identifier key may indicate which graph the entity is located in, the value for the entity identifier key may indicate which entity is encoded, and the value for the entity type key may indicate an entity type for the entity. The value for the user identifier may indicate which user owns the entity. In some embodiments, the user identifier key may alternatively be an identifier for a cloud account. A cloud account may indicate a user or a container associated with the entity. The value for the region key may indicate a geographical region that the entity is located in, the value for the service key may indicate a service provided by the entity, and the value for the provider key may indicate a cloud-based provider that is providing the entity. The tag key may be paired with one or more values for providing one or more descriptive tags for the entity. The properties key may be paired with one or more values providing descriptions of one or more properties of the entity. The value for the creation timestamp key may indicate a time and date that the entity was created, while the value for the last updated timestamp may indicate the time and date that the entity was last updated.
In some embodiments, the entity relationship may encode an entity as a JSON object. Some of the keys in an entity data structure may include, but are not limited to, a graph identifier, a relationship identifier, a relationship type, a source entity identifier, a foreign entity identifier, relationship properties, a creation timestamp, and a last updated timestamp. The value for the relationship identifier key may indicate which graph the entity relationship is located in, the value for the relationship identifier key may indicate which relationship is encoded, and the value for the relationship type key may indicate a relationship type for the relationship. The values for the source entity identifier and the foreign entity identifier may indicate which entities the relationship is between. In some embodiments, entities identified by these keys may indicate a direction of the relationship (e.g., parent vs child entities). Thus, for directed relationships, the edge may be directed from the source entity to the foreign entity. The properties key may be paired with one or more values providing descriptions of one or more properties of the relationship. The value for the creation timestamp key may indicate a time and date that the relationship was created, while the value for the last updated timestamp may indicate the time and date that the relationship was last updated.
In some embodiments, data service provider 320 may include a sharding engine 330. The sharding engine 330 may be generally responsible for sharding each of the databases. Each of the databases may be intelligently sharded to ensure efficient lookups when servicing a query or another API call. That is, each of the databases may be intelligently partitioned into a plurality of shards or database slices, based on the group of users and the data, such that queried data may be efficiently located and retrieved from the databases. One or more policies of the group may indicated a sharding strategy for the databases. A policy may define one or more heuristics that indicates on which portions of the data content and/or data structures to partition the databases along. For instance, a policy may indicate a key for which to shard the database along. The sharding may be performed at the group or organizational level, the cloud account level, or the like.
In some embodiments, data service provider 320 may include a search engine 336. The search engine may be employed to perform a search on the databases. In some embodiments, the search engine may be an elasticsearch engine. Data service provider 320 may include a user interface (UI) component 332. The UI component 332 may provide an interface for user 352 to interact with the entity data services.
At block 406, a graph database (e.g., graph database 340 of
At block 412, and in response to receiving a search query, one or more of the three databases may be identified based on the content of the query. At block 414, the search query may be provided to each of the identified databases. At block 416, search results may be received from each of the identified databases. At block 418, the search results may be aggregated. The aggregated search results may encode a status of at least a first entity of the group's entities. At block 420, an indication of the search results may be provided. For example, an indication of the status of the first entity may be provided to a user that provided the search query.
In accordance with some implementations, a computer-readable storage medium (e.g., a non-transitory computer-readable storage medium) is provided, the computer-readable storage medium storing one or more programs for execution by one or more processors of an electronic device, the one or more programs including instructions for performing any of the methods or processes described herein.
The foregoing descriptions of specific embodiments have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the scope of the claims to the precise forms disclosed, and it should be understood that many modifications and variations are possible in light of the above teaching.