This disclosure relates to the field of cloud technologies, and in particular, to a metadata management method, related apparatus and device, and a storage medium.
A distributed data warehouse refers to use of a high-speed computer network to connect multiple physically dispersed data storage units and constitute a logically unified data warehouse. In recent years, with the rapid growth of data volume, the distributed data warehouse technology has also been rapidly developed. The distributed data warehouse distributes data to multiple data nodes connected through the network to obtain a larger storage capacity and a higher concurrent access amount.
Metadata management of the distributed data warehouse is useful. Typically, metadata can be persisted in a relational database. When calling the metadata, first the metadata needs to be called to obtain a library table structure and a data storage position. Then, the Structured Query Language (SQL) is executed to perform operations such as adding, deleting, modifying, and querying on the metadata.
However, the metadata management mode would affect the metadata resources between the tenants. For example, tenant A creates a metadatabase named “DB.01”, and then a metadatabase named “DB.01” cannot be further created by tenant B. Tenant B may need to create a metadatabase with a proper name only through multiple attempts.
The embodiments of this disclosure provide a metadata management method, related apparatus and device, and a non-transitory storage medium. It not only facilitates to expand boundary of metadata management of the cloud account, but also can realize isolation of metadata resources (for example, a metadatabase and a metadata table), preventing metadata resources between tenants from being affected to achieve a better metadata management effect.
In view of the above, a first aspect of this disclosure provides a metadata management method, performed by a server, and including:
Another aspect of this disclosure provides a metadata management apparatus, deployed on a server and including:
Another aspect of this disclosure provides a computer device, including: a memory, a processor, and a bus system;
Another aspect of this disclosure provides a non-transitory computer-readable storage medium. The computer-readable storage medium stores instructions, and when being run in a computer, the computer is enabled to execute the method of the aspects.
Another aspect of this disclosure provides a computer program product, including a computer program stored in a computer-readable storage medium. The processor of the computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program, so that the computer device performs the method provided in the aspects.
Another aspect of this disclosure provides a non-transitory computer-readable medium, storing one or more instructions, the one or more instructions, when executed by at least one processor, being configured to cause an electronic device to perform steps including:
According to the foregoing technical solutions, it can be learned that the embodiments of this disclosure have the following advantages:
The embodiment of this disclosure provides a metadata management method: receiving an account authentication request transmitted by a client; when the account authentication request is passed, transmitting a metadata tenant set to the client, the metadata tenant set having a binding relation with the cloud account information. On this basis, the client can trigger the tenant selection request; in response to the tenant selection request transmitted by the client, the server transmits a metadatabase set to the client; the client can trigger the database query request, and then, the server transmits the metadata table set to the client in response to the database query request transmitted by the client; the metadata table set has a mapping relation with the to-be-requested metadatabase. In this way, the concept of the metadata tenant is designed on an upper layer of the metadatabase, which takes the metadata tenant as a minimum granularity of isolation among tenants and supports a mode that one cloud account is bound to multiple metadata tenants. Therefore, when the number of multi-tenants supported by a cloud account needs to be expanded, the metadata tenants bound to the cloud account can be increased, so that the number of multi-tenants supported by the cloud account can be expanded, that is, it facilitates that the metadata management boundary of the cloud account can be expanded. The same metadata tenant has an independent metadata management space. For different metadata tenants, it can realize isolation of metadata resources (for example, a metadatabase and a metadata table), preventing metadata resources between tenants from being affected to achieve a better metadata management effect.
The embodiments of this disclosure provide a metadata management method, related apparatus and device, and a non-transitory storage medium. It not only facilitates to expand boundary of metadata management of the cloud account, but also can realize isolation of metadata resources (for example, a metadatabase and a metadata table), preventing metadata resources between tenants from being affected to achieve a better metadata management effect.
With the wide use of big data and cloud computing technology, the importance of data directory and data governance is increasingly realized. Data governance requires a clear understanding of what data there is, and obviously the mode of manual combing can no longer keep up with the speed of data growth and change. As data operations mature and data pipelines become more complex, a traditional data directory often cannot meet these requirements. Therefore, it is of great significance to implement metadata management.
The metadata management process may involve big data technology, public cloud applications, etc., which are respectively described below. Big Data refers to a data set that cannot be captured, managed and processed by conventional software tools in a certain time range, and is a massive, high-growth and diversified information asset that needs new processing modes to have stronger decision-making power, insight and discovery ability and process optimization ability. With the advent of the cloud era, big data also has attracted more and more attentions. Big data needs special technology to effectively process a large amount of data that has been tolerated for a long time. Technologies suitable for big data include a large-scale parallel processing database, data mining, a distributed file system, a distributed database, a cloud computing platform, an Internet, and an extensible storage system.
Public Cloud normally refers to cloud provided by third-party providers for users to use. Public cloud can be used through the Internet for free or at a low cost. The core attribute of public cloud is to share resource services. There are many instances of this cloud, and services are available throughout today's open public network.
In order to achieve better results of metadata management, this disclosure provides a metadata management method, which is applied to the metadata management system. The physical architecture of the metadata management system is introduced below in combination with
An overall architecture of the multi-tenant metadata online data directory management is introduced by combining with
Professional terms involved in this disclosure are introduced in the following.
By combining with the introduction above, the metadata management method in this disclosure is introduced. Referring to
In one or more embodiments, the metadata management apparatus receives the account authentication request transmitted by the client. The account authentication request carries the cloud account information. The cloud account may be an account registered by an enterprise, and the cloud account information may include the enterprise account, password, etc. The cloud account can also be an account registered by an individual. The cloud account information can include a personal account (or mobile phone, email address, etc.) and password.
The metadata management apparatus can be deployed on one or multiple servers. It supports not only physical servers (for example, a server cluster or distributed system consisting of multiple physical servers) but also containerized deployment.
The client can be run on the terminal device in the form of a browser or can also be run on the terminal device in the form of an independent application (APP). The specific presentation form of the client is not limited herein. The terminal device may be smart phones, tablets, laptops, palmtops, PCS, smart TVs, smart watches, car devices, wearable devices, etc., but is not limited thereto.
In one or more embodiments, the metadata management apparatus verifies the cloud account information carried in the account authentication request. After the verification is successful, the metadata tenant set can be fed back to the client. Cloud account information is bound to a metadata tenant set, and the metadata tenant set includes at least one exemplary metadata tenant.
The metadata tenant set can be presented on the client in a form of a list. Table 1 is a diagram of the relationship between the cloud account information and the metadata tenant set.
Exemplarily, the correspondence shown in table 1 is merely an example, and should not be understood as the limitation to this disclosure.
In one or more embodiments, the user selects a metadata tenant from the metadata tenant set and triggers a tenant selection request for this metadata tenant. The tenant selection request carries the identifier of the to-be-requested metadata tenant, and the to-be-requested metadata tenant belongs to the metadata tenant set. The metadata management apparatus feeds back to the client the metadatabase set based on a tenant selection request, where one metadata tenant is associated with one metadatabase set and the metadatabase set includes at least one metadatabase.
The metadatabase set can be presented on the client in the form of a list. Combined with Table 1, it is assumed that the user selects “metadata tenant A” from the metadata tenant set as the to-be-requested metadata tenant. On this basis, Table 2 is a diagram of the relationship between the to-be-requested metadata tenant and the metadatabase set.
The correspondence shown in table 2 is merely an example, and should not be understood as the limitation to this disclosure.
In one or more embodiments, the user selects a metadatabase from the metadatabase set and triggers a database query request for this metadatabase. The database query request carries the identifier of the to-be-requested metadatabase, and the to-be-requested metadatabase belongs to the metadatabase set. The metadata management apparatus feeds back to the client the metadata table set based on a database query request, where one metadatabase is associated with one metadata table set and the metadata table set includes at least one metadata table.
The metadata table set can be presented on the client in the form of a list. Combined with Table 3, it is assumed that the user selects “metadatabase A” from the metadatabase set as the to-be-requested metadatabase. On this basis, Table 3 is a diagram of the relationship between the to-be-requested metadatabase and the metadata table set.
The correspondence shown in table 3 is merely an example, and should not be understood as the limitation to this disclosure.
The embodiments of this disclosure provide a metadata management method. In this way, the concept of the metadata tenant is designed on an upper layer of the metadatabase, which takes the metadata tenant as a minimum granularity of isolation among tenants and supports a mode in which one cloud account is bound to multiple metadata tenants. Therefore, when the number of multi-tenants supported by a cloud account needs to be expanded, the metadata tenants bound to the cloud account can be increased, so that the number of multi-tenants supported by the cloud account can be expanded, that is, it facilitates that the metadata management boundary of the cloud account can be expanded. The same metadata tenant has an independent metadata management space. For different metadata tenants, it can realize isolation of metadata resources (for example, a metadatabase and a metadata table), preventing metadata resources between tenants from being affected to achieve a better metadata management effect.
Based on the embodiment corresponding to
In one or more embodiments, a metadata table query mode based on metadata tenant is introduced. As can be known from the embodiments above, the user selects a metadata table from the metadata table set and triggers a data table query request for this metadata table. The data table query request carries the identifier of the to-be-requested metadata table, and the to-be-requested metadata table belongs to the metadata table set. Based on a data table query request, the to-be-requested metadata is fed back to the client.
The metadata tenant and the metadatabases are in a one-to-many mapping relationship (i.e., 1:0 . . . *). On this basis, multiple metadatabases can be created under one metadata tenant.
The metadatabase and the metadata tables are in a one-to-many mapping relationship (i.e., 1:0 . . . *). On this basis, multiple metadata tables can be created under one metadatabase.
Secondly, the embodiment of this disclosure provides a mode of realizing metadata table query based on the metadata tenant. Through the mode above, the concept of the metadata tenant is designed for online data directory management, so that the metadata can be divided and the metadata tenant can be taken as the minimum granularity of multi-tenant isolation; metadata under different metadata tenants can be isolated from each other without affecting each other. Therefore, different metadata tenants can implement operations such as querying the metadata table when the metadata is isolated, so as to improve the flexibility and feasibility of the solution.
Based on the foregoing embodiment corresponding to
In one or more embodiments, a mode for metadata management in a multi-dimensional tenant system is introduced. As can be seen from the preceding embodiments, this disclosure also defines a service tenant. The service tenant is an abstraction of a specific service scene and a tenant resource is isolated based on common service division. Through the design of the service tenant, different personalized specific service scenes can be generally adapted. By designing the service tenants, the strong association relationship between the metadata tenants and specific service scenes can be decoupled, so that the underlying metadata tenant is irrelevant to the specific service, while the service tenants are linked to the specific service scenes.
The metadata management apparatus verifies the cloud account information carried in the account authentication request. After the verification is successful, the service tenant set can be fed back to the client. Cloud account information is bound to a service tenant set, and the service tenant set includes at least one exemplary service tenant. The service tenant set can be presented on the client in a form of a list. Table 4 is a diagram of the relationship between the cloud account information and the service tenant set.
The correspondence shown in table 4 is merely an example, and should not be understood as the limitation to this disclosure.
The user selects a service tenant from the service tenant set and triggers a service selection request for this service tenant. The service selection request carries the identifier of the to-be-requested service tenant, and the to-be-requested service tenant belongs to the service tenant set. Hence, according to a service selection request, service processing information generated based on a to-be-requested service tenant is fed back to the client, and the client may display the service processing information. One service tenant is associated with one metadata tenant set; the metadata tenant set includes at least one metadata tenant. As can be understood that the service tenants and metadata tenants can be in a one-to-one mapping relationship, a one-to-many mapping relationship, a many-to-one mapping relationship, or a many-to-many mapping relationship.
Combined with Table 4, it is assumed that the user selects “service tenant_01” from the service tenant set as the to-be-requested service tenant. On this basis, Table 5 is a diagram of the relationship between the to-be-requested service tenant and the to-be-requested metadata tenant set.
The correspondence shown in table 5 is merely an example, and should not be understood as the limitation to this disclosure.
Secondly, the embodiment of this disclosure provides a metadata management mode under the multi-tenant system. Through the mode above, in order to meet the unified management of the multi-tenant metadata in the public cloud scene, this disclosure abstractly designs a multi-tenant domain model, i.e., the metadata tenant and service tenant. In this way, the pursuit of the unified metadata of different service scenes can be met, and the multi-tenant online data directory management function of public cloud can be provided.
Based on the embodiment corresponding to
In one or more embodiments, a mode for service processing in different service scenes is introduced. From the above embodiment, it can be seen that the service tenants are associated with the metadata tenants through the tenant dimension mapping; the tenant dimension mapping can be expressed in the form of a mapping table. Based on this, the corresponding to-be-requested metadata tenant set can be determined according to the identifier of the to-be-requested service tenant carried by the service selection request. Hence, the to-be-requested metadata table set that has a mapping relationship with the to-be-requested metadata tenant set is obtained, and the relevant service data is obtained combined with the to-be-requested metadata table set. In this way, the service data is accordingly processed according to the to-be-requested service type, to obtain service processing information, so as to transmit the service processing information to the client.
Exemplarily,
Exemplarily,
Exemplarily,
Again, the embodiment of this disclosure provides a mode of conducting service processing in different service scenes. In this way, the association between two tenant dimensions is realized based on the tenant dimension mapping. That is, the mapping relationship between the metadata tenants and service tenants is defined through the tenant dimension mapping; the mapping relationship is related to specific service logic pursuits. Mapping is carried out according to the specific service scene, so as to realize the general and multi-scene central metadata online data directory management system. The online data directory management system has the advantages of high scalability, high performance, and high fault tolerance, and supports the rapid adaptation and interconnection of multi-compute engines.
On the basis of the embodiment corresponding to
On this basis, in another exemplary embodiment provided by an embodiment of this disclosure, receiving the account authentication request transmitted by the client specifically may include:
In one or more embodiments, a mode for enhancing security authentication in the case of multi-compute engine compatibility is introduced. As can be known from the preceding embodiment, considering that some original online metadata management services (for example, Hive Metastore) are general and recognized online data directory management components, therefore, many big data components are all adapted and connected with the general data directory management component services (i.e., the Hive Metastore) to manage data directories. To reduce the cost of switching between existing components and clients and support rapid and efficient metadata system switching, this disclosure designs a set of RPC interface services compatible with general data directory management component services (i.e., the Hive Metastore) to implement metadata switching and connection at a relatively low cost. In addition to providing RPC interface call for the big data computing and analysis engine, it also provides a data directory management operation for an HTTP interface support interface, meeting diversified usage requirements of an upper-layer service product.
When the RPC interface is compatible, security authentication reinforcement can also be performed on the RPC interface. For ease of understanding,
Next, in the embodiment of this disclosure, a mode of enhancing security authentication when implementing multi-compute engine compatibility is provided. In this way, it creates and implements a customized Handler type. This Handler type inherits the IHMSHandler interface and re-implements the metadata management logic. The customized Handler type mainly implements authentication and data encapsulation processing for request parameters. Finally, the service layer of the underlying service of the metadata is called through the RPC interface inside the metadata to perform a persistence operation. In addition, existing interfaces can be directly reused to enhance security authentication of the RPC interfaces, thus improving data security.
When the cloud account information can be obtained from the to-be-requested session through an RPC request, in another exemplary embodiment provided by an embodiment of this disclosure, receiving the account authentication request transmitted by the client specifically may include:
In one or more embodiments, another mode for enhancing security authentication in the case of multi-compute engine compatibility is introduced. As can be seen from the embodiment above, in order to reduce the cost of switching between existing components and clients and support rapid and efficient metadata system switching, this disclosure not only designs a set of RPC interface services compatible with original online metadata management services, but also provides data directory management operations of the HTTP interface support interface to meet the diversified usage requirements of upper-layer service products.
Exemplarily, the design of multi-compute engine compatibility can be seen in
In the extended RPC authentication framework, corresponding modules are added to both the RPC server and the RPC client.
Next, in the embodiment of this disclosure, another mode of enhancing security authentication when implementing multi-compute engine compatibility is provided. In this way, it creates and implements a customized Handler type. This Handler type inherits the IHMSHandler interface and re-implements the metadata management logic. The customized Handler type mainly implements authentication and data encapsulation processing for request parameters. Finally, the service layer of the underlying service of the metadata is called through the RPC interface inside the metadata to perform a persistence operation. In addition, authentication can be performed on each request, which facilitate the improving the authentication security.
Based on the foregoing embodiment corresponding to
In one or more embodiments, a mode for creating the metadata table is introduced. As can be known from the preceding embodiment, the online service provides the RPC interface method. On this basis, after the account authentication request is successfully verified, the create_table method can be called to create the metadata table according to the metadata table creation request transmitted by the client. The metadata table creating request carries a first object parameter, and the first object parameter includes metadata category information. The metadata category information is used for indicating the data type, for example, the Hive type. After parameter verification is performed on the first object parameter, if the verification succeeds, the corresponding metadata table is created.
The creation process of the metadata table would be introduced in combination with the drawing below.
Secondly, the embodiment of this disclosure provides a mode of creating a metadata table. Through the mode above, the metadata table can be created based on the RPC interface method provided by the online service. Therefore, in the case of compatibility with multi-compute engines, the RPC interface inside the metadata can be used for calling the underlying service of the metadata for the persistent operation, so as to improve the feasibility and operability of the solution.
Based on the foregoing embodiment corresponding to
In one or more embodiments, a mode for updating the metadata table is introduced. As can be known from the preceding embodiment, the online service provides the RPC interface method. On this basis, after the account authentication request is successfully verified, the alter_table method can be called to update the metadata table according to the metadata table update request transmitted by the client. The metadata table update request carries a second object parameter, and the second object parameter includes metadata category information, table name information, etc. The metadata category information is used for indicating the data type, for example, the Hive type. After parameter verification is performed on the second object parameter, if verification is successful, the metadata table is obtained according to the table name information, and then the column information in the metadata table is deleted and the column is re-created according to the metadata category information to update the metadata table.
The creation process of the metadata table would be introduced in combination with the drawing below.
Secondly, the embodiment of this disclosure provides a mode of updating a metadata table. Through the mode above, the metadata table can be changed based on the RPC interface method provided by the online service. Therefore, in the case of compatibility with multi-compute engines, the RPC interface inside the metadata can be used for calling the underlying service of the metadata for the persistent operation, so as to improve the feasibility and operability of the solution.
Based on the foregoing embodiment corresponding to
In one or more embodiments, a mode for deleting the metadata table is introduced. As can be known from the preceding embodiment, the online service provides the RPC interface method. On this basis, after the account authentication request is successfully verified, the alter_table method can be called to delete the metadata table according to the metadata table deletion request transmitted by the client. The metadata table deletion request carries a third object parameter, and the third object parameter includes metadata category information, table name information, etc. The metadata category information is used for indicating the data type, for example, the Hive type. After parameter verification is performed on the third object parameter, if verification is successful, the metadata table is obtained according to the table name information, and then the column information in the metadata table is deleted to delete the metadata table.
The creation process of the metadata table would be introduced in combination with the drawing below.
Secondly, the embodiment of this disclosure provides a mode of deleting a metadata table. Through the mode above, the metadata table can be deleted based on the RPC interface method provided by the online service. Therefore, in the case of compatibility with multi-compute engines, the RPC interface inside the metadata can be used for calling the underlying service of the metadata for the persistent operation, so as to improve the feasibility and operability of the solution.
Based on the embodiment corresponding to
In response to a data table query request transmitted by the client, transmitting a to-be-requested metadata table to the client may specifically include:
In one or more embodiments, a mode for querying the metadata table is introduced. As can be known from the preceding embodiment, the online service provides the RPC interface method. On this basis, after the account authentication request is successfully verified, the metadata table can be queried according to the data table query request transmitted by the client. The metadata table creation request carries a fourth object parameter, and the fourth object parameter includes metadata category information, table name information, etc. After the parameter verification is performed on the fourth object parameter, if verification is successful, the corresponding metadata table is queried.
The query process of the metadata table would be introduced in combination with the drawing below.
Secondly, the embodiment of this disclosure provides a mode of querying a metadata table. Through the mode above, the metadata table can be queried based on the RPC interface method provided by the online service. Therefore, in the case of compatibility with multi-compute engines, the RPC interface inside the metadata can be used for calling the underlying service of the metadata for the persistent operation, so as to improve the feasibility and operability of the solution.
Based on the foregoing embodiment corresponding to
In one or more embodiments, another general metadata data model is introduced. As can be seen from the embodiments above, the design of the original data model for the data module is relatively complicated, and an association operation among multiple tables is carried out, rendering slow metadata reading and writing. In addition, the original data model cannot support the multi-tenant design, either. Therefore, this disclosure has transformed and simplified the original data model, which can only realize logical division of the metadata under multi-tenant. It can also improve the metadata read and write performances.
For example, the TBLS maintains the association between a table and a base using a metadata Foreign Key (FK) (i.e., DB_ID), and can be associated with a corresponding base record on the DBS through the record of the table. For example, when the first query request is received, based on the table identifier (i.e., TBL_ID) carried in the first query request and the association between TBL_ID and DB_ID, the metadatabase (DBS) corresponding to DB_ID can be found. Thus, the data query is realized.
For example, the COLUMNS maintain the association between a column and a table through a metadata table FK (i.e., TBL_ID), and can be associated with a corresponding table record on the TBLS through the record of the column. For example, when the second query request is received, based on the column carried in the second query request and the association between the column and TBL_ID, the metadata table (TBLS) corresponding to TBL_ID can be found. Thus, the data query is realized.
For example, the PARTITIONS maintain the association between a partition and a table through a metadata table FK (i.e., TBL_ID), and can be associated with a corresponding table record on the TBLS through the record of the partition. For example, when the third query request is received, based on the partition identifier (i.e., PART_ID) carried in the third query request and the association between PART_ID and TBL_ID, the metadata table (TBLS) corresponding to TBL_ID can be found. Thus, the data query is realized.
For example, the PARTITIONS maintain the association between a partition and a storage descriptor through a storage table FK (i.e., SD_ID), and can be associated with a corresponding record on the SDS through the record of the partition. For example, when the fourth query request is received, based on the partition identifier (i.e., PART_ID) carried in the fourth query request and the association between PART_ID and SD_ID, the storage descriptor (SDS) corresponding to SD_ID can be found. Thus, the data query is realized.
For example, the TBLS maintains the association between a table and a storage descriptor through a storage table FK (i.e., SD_ID), and can be associated with a corresponding base record on the SDS through the record of the table. For example, when the fifth query request is received, based on the table identifier (i.e., TBL_ID) carried in the fifth query request and the association between TBL_ID and SD_ID, the SDS corresponding to SD_ID can be found. Thus, the data query is realized.
For example, the UDF maintains the association between a function and a base through a metadatabase FK (i.e., DB_ID), and can be associated with a corresponding base record on the DBS through the function. For example, when the sixth query request is received, based on the function identifier (i.e., func_ID) carried in the sixth query request and the association between func_ID and DB_ID, the metadatabase (DBS) corresponding to DB_ID can be found. Thus, the data query is realized.
Secondly, the embodiment of this disclosure provides a general metadata data model. For Hive type data, a more simplified general data model is designed to logically divide metadata resources while supporting multi-tenant metadata. The design and optimization of the underlying data model can improve the performance of metadata management, accelerate metadata read and write performances, remove multi-table dependency of the database, and implement the dependency relationship through logic. In addition, the distributed storage system can support the storage and management of massive metadata.
Based on the foregoing embodiment corresponding to
In one or more embodiments, another general metadata data model is introduced. As can be seen from the embodiments above, the design of the original data model for the data module is relatively complicated, and an association operation among multiple tables is carried out, rendering slow metadata reading and writing. In addition, the original data model cannot support the multi-tenant design, either. Therefore, this disclosure has transformed and simplified the original data model, which can only realize logical division of the metadata under multi-tenant. It can also improve the metadata read and write performances.
For example, the TBLS maintains the association between a table and a base through a metadatabase FK (i.e., DB_ID), and can be associated with a corresponding base record on the DBS through the record of the table. For example, when the first query request is received, based on the table identifier (i.e., TBL_ID) carried in the first query request and the association between TBL_ID and DB_ID, the metadatabase (DBS) corresponding to DB_ID can be found. Thus, the data query is realized.
For example, the COLUMNS maintain the association between a column and a table through a metadata table FK (i.e., TBL_ID), and can be associated with a corresponding table record on the TBLS through the record of the column. For example, when the second query request is received, based on the column carried in the second query request and the association between the column and TBL_ID, the metadata table (TBLS) corresponding to TBL_ID can be found. Thus, the data query is realized.
Secondly, the embodiment of this disclosure provides another general metadata data model. For non-Hive type data, a more simplified general data model is designed. For example, metadata in a storage system database management system can adopt this data model and only focus on metadata for bases, tables, and columns. Logic division is performed on the metadata resources when metadata multi-tenant is supported. The design and optimization of the underlying data model can improve the performance of metadata management, accelerate metadata read and write performances, remove multi-table dependency of the database, and implement the dependency relationship through logic. In addition, the distributed storage system can support the storage and management of massive metadata.
Based on the introduction above, the performance of data directory management provided by this disclosure will be evaluated below. Compared with the original data directory management (for example, Hive Metastore's data directory management), this disclosure implements the general public cloud multi-tenant metadata online data directory management. It can provide services for different accounts on the cloud through a Software-as-a-Service (SaaS) metadata management service, and support extendable, highly scalable, and low-cost metadata management.
In addition, the unified metadata online directory management performance has been greatly improved For the convenience of explanation,
The following describes the metadata management apparatus in this disclosure in detail.
The term module (and other similar terms such as unit, submodule, etc.) in this disclosure may refer to a software module, a hardware module, or a combination thereof. A software module (e.g., computer program) may be developed using a computer programming language. A hardware module may be implemented using processing circuitry and/or memory. Each module can be implemented using one or more processors (or processors and memory). Likewise, a processor (or processors and memory) can be used to implement one or more modules. Moreover, each module can be part of an overall module that includes the functionalities of the module.
The embodiments of this disclosure provide a metadata management apparatus. Using the apparatus above, the concept of the metadata tenant is designed on an upper layer of the metadatabase, which takes the metadata tenant as a minimum granularity of isolation among tenants and supports a mode that one cloud account is bound to multiple metadata tenants. Therefore, when the number of multi-tenants supported by a cloud account needs to be expanded, the metadata tenants bound to the cloud account can be increased, so that the number of multi-tenants supported by the cloud account can be expanded, that is, it facilitates that the metadata management boundary of the cloud account can be expanded. The same metadata tenant has an independent metadata management space. For different metadata tenants, it can realize isolation of metadata resources (for example, a metadatabase and a metadata table), preventing metadata resources between tenants from being affected to achieve a better metadata management effect.
Based on the embodiment corresponding to
The embodiments of this disclosure provide a metadata management apparatus. Through the apparatus above, the concept of the metadata tenant is designed for online data directory management, so that the metadata can be divided and the metadata tenant can be taken as the minimum granularity of multi-tenant isolation, so that metadata under different metadata tenants can be isolated from each other without affecting each other. Therefore, different metadata tenants can implement operations such as querying the metadata table when the metadata is isolated, so as to improve the flexibility and feasibility of the solution.
Based on the embodiment corresponding to
The embodiments of this disclosure provide a metadata management apparatus. Through the apparatus above, in order to meet the unified management of the multi-tenant metadata in the public cloud scene, this disclosure abstractly designs a multi-tenant domain model, i.e., the metadata tenant and service tenant. In this way, the pursuit of the unified metadata of different service scenes can be met, and the multi-tenant online data directory management function of public cloud can be provided.
Based on the embodiment corresponding to
The embodiments of this disclosure provide a metadata management apparatus. Through the apparatus above, the association between two tenant dimensions is realized based on the tenant dimension mapping. That is, the mapping relationship between the metadata tenants and service tenants is defined through the tenant dimension mapping; the mapping relationship is related to specific service logic pursuits. Mapping is carried out according to the specific service scene, so as to realize the general and multi-scene central metadata online data directory management system. The online data directory management system has the advantages of high scalability, high performance, and high fault tolerance, and supports the rapid adaptation and interconnection of multi-compute engines.
Based on the embodiment corresponding to
The processing module 230 is configured to when the account authentication request is successfully verified, store the cloud account information in a to-be-requested session, the to-be-requested session being created based on the account authentication request; and
On the basis of the embodiment corresponding to
The embodiments of this disclosure provide a metadata management apparatus. Using the apparatus above, a customized Handler type is created and implemented. This Handler type inherits the IHMSHandler interface and re-implements the metadata management logic. The customized Handler mainly implements authentication and data encapsulation processing for request parameters. Finally, the service layer of the underlying service of the metadata is called through the RPC interface inside the metadata to implement a persistence operation. In addition, existing interfaces can be directly reused to enhance security authentication of the RPC interfaces, thus improving data security.
Based on the embodiment corresponding to
The embodiments of this disclosure provide a metadata management apparatus. Using the apparatus above, a customized Handler type is created and implemented. This Handler type inherits the IHMSHandler interface and re-implements the metadata management logic. The customized Handler mainly implements authentication and data encapsulation processing for request parameters. Finally, the service layer of the underlying service of the metadata is called through the RPC interface inside the metadata to implement a persistence operation. In addition, authentication can be performed on each request, which facilitate the improving the authentication security.
Based on the embodiment corresponding to
The embodiments of this disclosure provide a metadata management apparatus. Through the apparatus above, the metadata table can be created based on the RPC interface method provided by the online service. Therefore, in the case of compatibility with multi-compute engines, the RPC interface inside the metadata can be used for calling the underlying service of the metadata for the persistent operation, so as to improve the feasibility and operability of the solution.
Based on the embodiment corresponding to
The embodiments of this disclosure provide a metadata management apparatus. Through the apparatus above, the metadata table can be changed based on the RPC interface method provided by the online service. Therefore, in the case of compatibility with multi-compute engines, the RPC interface inside the metadata can be used for calling the underlying service of the metadata for the persistent operation, so as to improve the feasibility and operability of the solution.
Based on the embodiment corresponding to
The embodiments of this disclosure provide a metadata management apparatus. Through the apparatus above, the metadata table can be deleted based on the RPC interface method provided by the online service. Therefore, in the case of compatibility with multi-compute engines, the RPC interface inside the metadata can be used for calling the underlying service of the metadata for the persistent operation, so as to improve the feasibility and operability of the solution.
Based on the embodiment corresponding to
The transmitting module 220 is specifically configured to perform parameter verification on the fourth object parameter carried in the data table query request; and
The embodiments of this disclosure provide a metadata management apparatus. Through the apparatus above, the metadata table can be queried based on the RPC interface method provided by the online service. Therefore, in the case of compatibility with multi-compute engines, the RPC interface inside the metadata can be used for calling the underlying service of the metadata for the persistent operation, so as to improve the feasibility and operability of the solution.
Based on the embodiment corresponding to
The processing module 230 is further configured to when receiving a third query request, determine a metadata table corresponding to a metadata table foreign key from a third metadata table according to the third query request, the third query request carrying a subregion identifier, and the subregion identifier being associated with the metadata table foreign key.
The processing module 230 is further configured to when receiving a fourth query request, determine a storage descriptor corresponding to a storage table foreign key from a fourth metadata table according to the fourth query request, the fourth query request carrying a subregion identifier, and the subregion identifier being associated with the storage table foreign key.
The processing module 230 is further configured to when receiving a fifth query request, determine a storage descriptor corresponding to a storage table foreign key from a fifth metadata table according to the fifth query request, the fifth query request carrying a table identifier, and the table identifier being associated with the storage table foreign key.
The processing module 230 is further configured to when receiving a sixth query request, determine a metadatabase corresponding to a metadatabase foreign key from a sixth metadata table according to the sixth query request, the sixth query request carrying a function identifier, and the function identifier being associated with the metadatabase foreign key.
The embodiments of this disclosure provide a metadata management apparatus. Using the apparatus above, for Hive type data, a more simplified general data model is designed to logically divide metadata resources while supporting multi-tenant metadata. The design and optimization of the underlying data model can improve the performance of metadata management, accelerate metadata read and write performances, remove multi-table dependency of the database, and implement the dependency relationship through logic. In addition, the distributed storage system can support the storage and management of massive metadata.
Based on the embodiment corresponding to
The embodiments of this disclosure provide a metadata management apparatus. Using the apparatus above, for non-Hive type data, a more simplified general data model is designed. For example, metadata in a storage system database management system can adopt this data model and only focus on metadata for bases, tables, and columns. Logic division is performed on the metadata resources when metadata multi-tenant is supported. The design and optimization of the underlying data model can improve the performance of metadata management, accelerate metadata read and write performances, remove multi-table dependency of the database, and implement the dependency relationship through logic. In addition, the distributed storage system can support the storage and management of massive metadata.
The computer device 300 may further include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input/output interfaces 358, and/or one or more operating systems 341, such as, Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™.
The steps performed by the computer device in the foregoing embodiment may be based on the computer device structure shown in
In the embodiment of this disclosure, a computer-readable storage medium is further provided; the computer-readable storage medium stores computer programs, and when being run in a computer, the computer is enabled to perform the method described according to the foregoing embodiments.
An embodiment of this disclosure further provides a computer program product including a program, enabling, when running on a computer, the computer to perform the method described according to the foregoing embodiments.
A person skilled in the art can clearly understand that for convenience and conciseness of description, for specific working processes of the foregoing systems, devices and units, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described herein again.
In the several embodiments provided in this disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the apparatus embodiments described above are merely exemplary. For example, the division of the units is merely the division of logic functions, and may use other division manners during actual implementation. For example, a plurality of units or components may be combined, or may be integrated into another system, or some features may be omitted or not performed. In addition, the coupling, or direct coupling, or communication connection between the displayed or discussed components may be the indirect coupling or communication connection through some interfaces, apparatus, or units, and may be electrical, mechanical or of other forms.
The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, and may be located in one place or may be distributed over a plurality of network units. Some or all of the units may be selected based on actual needs to achieve the objectives of the solutions of the embodiments of the disclosure.
In addition, functional units in the embodiments of this disclosure may be integrated into one processing unit, or each of the units may be physically separated, or two or more units may be integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in a form of a software functional unit.
When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, the integrated unit may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of this disclosure essentially, or the part contributing to the related technology, or all or some of the technical solutions may be implemented in the form of a software product. The computer software product is stored in a storage medium and includes several instructions for instructing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or some of the steps of the methods described in the embodiments of this disclosure. The foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disc.
As stated above, the embodiments are merely used for describing the technical solutions of this disclosure, but are not intended to limit same. Although this disclosure is described in detail with reference to the foregoing embodiments, it should be understood by a person skilled in the art that, modifications may still be made to the technical solutions described in the foregoing embodiments, or equivalent replacements may be made to the part of the technical features; moreover, such modifications or replacements do not cause the essence of corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of this disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202111302438.1 | Nov 2021 | CN | national |
This disclosure is a continuation of International Patent Application No. PCT/CN2022/118865, filed on Sep. 15, 2022, which claims priority to Chinese Patent Application No. 202111302438.1, filed with the Chinese Patent Office on Nov. 4, 2021 and entitled “METADATA MANAGEMENT METHOD, RELATED APPARATUS AND DEVICE, AND STORAGE MEDIUM.” Both applications above are incorporated herein by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2022/118865 | Sep 2022 | US |
Child | 18360103 | US |