This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2018-126130, filed Jul. 2, 2018, the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to a database management apparatus and a query dividing method.
In recent years, as network environment improves, for example, the amount of data which companies need to accumulate for operations is rapidly increasing. Hence, for example, a distributed database which can collectively handle data held by each of a plurality of servers is becoming increasingly important.
A general architecture that implements the various features of the embodiments will now be described with reference to the drawings. The drawings and the associated descriptions are provided to illustrate the embodiments and not to limit the scope of the invention.
In general, according to one embodiment, a database management apparatus capable of operating as one of a plurality of servers constituting a distributed database in a tree structure is provided. The database management apparatus includes a processor. The processor is configured to manage server information of an own server and a subordinate server, analyze an input query, and decide a table used for the query, determine a generation number of query executing modules configured to execute the query, based on the server information of the own server and the subordinate server, and divide the query according to the generation number if a plurality of query executing modules is generated for a subordinate server and accumulate a result of the query executed by the query executing modules of the determined generation number.
Various embodiments will be described hereinafter with reference to the accompanying drawings.
As illustrated in
For example, as illustrated in
Furthermore, as described above, the database management apparatus 1 can operate as one of the host server and the subordinate server. That the database management apparatus 1 can operate as one of the host server and the subordinate server means not only that the database management apparatus 1 exclusively selectively operates as one of the host server and the subordinate server, but also that the database management apparatus 1 operates as the host server in relation to the another certain database management apparatus 1, and operates as the subordinate server in relation to the another different database management apparatus 1. Hence, by connecting a plurality of database management apparatuses 1 in a tree shape as illustrated in, for example,
As illustrated in
In addition,
The DBMS program 100 roughly includes a processing module 110 which operates as the host server, and a processing module 120 which operates as a subordinate server. The DBMS program 100 includes a subordinate server information management module 111, a subordinate server information obtaining module 112, a query analyzing module 113, a query dividing module 114, a query executing module 115 and a result accumulating module 116 as components of the former processing module 110. Furthermore, the DBMS program 100 includes a table management module 121, an own server information management module 122, an own server information transmitting module 123 and a query executing module 124 as components of the latter processing module 120. In addition, these components are not necessarily realized as one module of the DBMS program 100, and may be realized as electronic circuits, for example.
Furthermore, information stored in the storage region 200 is also roughly classified into information 210 used for an operation of the host server, and information 220 used for an operation of the subordinate server. Elements of the former information 210 include processor count information 211 and subordinate table record count information 212, and elements of the latter information 220 include a table 221.
The subordinate server information management module 111 stores and manages subordinate server information obtained by the subordinate server information obtaining module 112 as the processor count information 211 and the subordinate table record count information 212 described below in the storage region 200. The subordinate server information obtaining module 112 receives subordinate server information transmitted from the subordinate server to transfer to the subordinate server information management module 111. This subordinate server information corresponds to own server information transmitted by the own server information transmitting module 123 on the subordinate server side to the host server.
The query analyzing module 113 analyzes a query accepted from the user, and decides a table related to this query and, more specifically, decides which table this query uses. The query dividing module 114 firstly determines the generation number of the query executing modules 115 which execute this query based on the processor count information 211 and the subordinate table record count information 212 managed by the subordinate server information management module 111 in response to a decision result of the query analyzing module 113, and generates the determined generation number of the query executing modules 115. The generation number of the query executing modules 115 corresponds to a parallel number for executing a query in parallel. Furthermore, the query dividing module 114 secondly divides a query according to the generation number when, for example, a plurality of query executing modules 115 is generated for one subordinate server. This query dividing module 114 works to allow this database management apparatus 1 to efficiently allocate limited resources, which will be described below.
The query executing module 115 is dynamically generated by the query dividing module 114, and executes a query passed by the query dividing module 114. More specifically, the query executing module 115 transmits a query passed from the query dividing module 114 to the subordinate server to which this database management apparatus 1 is allocated, and transfers a result of the query received from the subordinate server to the result accumulating module 116.
The result accumulating module 116 accumulates results of queries accepted from the query executing module 115, and transfers the result of the queries to the user who is an issuance source of this query.
The table management module 121 manages the table 221. The table 221 is a data structure of a table format including rows (records) and columns, and a plurality of tables 221 can be held.
The own server information management module 122 manages the number of processors of the database management apparatus 1 and the number of holding records of the table 221 as own server information. In addition, the number of processors of the database management apparatus 1 is stored as the processor count information 211 in the storage region 200. The own server information transmitting module 123 transmits the own server information managed by the own server information management module 122 to the host server. This own server information corresponds to subordinate server information obtained by the subordinate server information obtaining module 112 on the host server side. The own server information is transmitted from the own server information transmitting module 123 to the host server when, for example, the number of holding records of the table 221 is updated.
The query executing module 124 obtains data from the table held by the own server based on the query transmitted from the host server, more specifically, the query executing module 115 on the host server side, and transfers the obtained data to the query executing module 115 of the host server. While the query executing modules 115 are dynamically generated by the query dividing module 114, a number of query executing modules 124 corresponding to the number of processors of the database management apparatus 1 are statically generated. Furthermore, every time a query from the host server is received, the query executing module 124 may be dynamically generated.
In addition, when an analysis result of a query accepted from the user in the query analyzing module 113 shows that this query does not use the tables dispersed in a plurality of subordinate servers, but uses the table held by the own server, this query is transferred from the query analyzing module 113 to the query executing module 124. When receiving the query from the query analyzing module 113, the query executing module 124 transfers a result of the query to the user who is an issuance source of this query. In this case, too, the query dividing module 114 may divide the query according to, for example, the number of the query executing modules 124 which is the same as the number of processors of the own server, and transfer the divided queries to the query executing modules 124.
As illustrated in
As illustrated in
As illustrated in
As described above, the host server and the subordinate server first cooperate in such a way that the own server information transmitting module 123 on the subordinate server side transfers own server information to the subordinate server information obtaining module 112 on the host server side (
In this regard, for deeper understanding of the database management apparatus 1 according to the present embodiment, a typical method for executing a query which uses tables dispersed in a plurality of subordinate servers will be described as one comparative example.
When, for example, the tables used by this query are dispersed and held by the three subordinate tables, three processes (corresponding to the query executing modules 115 of the database management apparatus 1 according to the present embodiment) which execute this query are generated, and allocated to each subordinate server to make each subordinate server execute the query. However, this method does not take into account the number of holding records in each subordinate server, and therefore resources are hardly allocated efficiently.
By contrast with this, the database management apparatus 1 according to the present embodiment realizes efficient resource allocation by taking into account not only the number of holding records of the table in each subordinate server which holds the table used for a query, but also the number of processors of the own server and the number of processors of each subordinate server. This point will be described in detail below.
First, a basic rule of the database management apparatus 1 according to the present embodiment for executing a query which uses tables dispersed in a plurality of subordinate servers will be described.
Firstly, in this database management apparatus 1, the total generation number (parallel number) of the query executing modules 115 generated by the query dividing module 114 is the number of processors of the own server at maximum. In addition, when, for example, the number of processors of the own server is less than the number of subordinate servers, the total generation number of the query executing modules 115 may exceed the number of processors of the own server.
Secondly, in this database management apparatus 1, the number of the query executing modules 115 generated for each subordinate server is the number of processors of each subordinate server at maximum.
In view of the above basic rule, this database management apparatus 1, more specifically, the query dividing module 114 generates an appropriate number of the query executing modules 115 as follows.
The query dividing module 114 first calculates a ratio E of the number of holding records of each subordinate server in a table used for a query decided by the query analyzing module 113 by using the subordinate table record count information 212 managed by the subordinate server information management module 111. The ratio ε of the number of holding records is calculated as, for example, “ratio εserver 1 of number of holding records of subordinate server 1=number of holding records of subordinate server 1/number of holding records of overall subordinate servers”.
Next, the query dividing module 114 temporarily calculates the number of the query executing modules 115 generated for each subordinate server by using the ratio c of the number of holding records calculated for each subordinate server, and the number of processors N of the own server included in the processor count information 211 managed by the subordinate server information management module 111. The generation number of the query executing modules 115 is calculated, for example, as “generation number of query executing modules 115 for subordinate server 1=number of processors N of own server×ratio εserver 1 of number of holding records of subordinate server 1”. In this regard, the query dividing module 114 determines a calculated value as one when the calculated value is less than one, and converts the calculated value into an integer value by, for example, rounding when the calculated number is one or more and includes a decimal.
Lastly, the query dividing module 114 determines the number of the query executing modules 115 generated for each subordinate server, i.e., the total generation number of the query executing modules 115 by using the temporarily calculated number of the query executing modules 115 generated for each subordinate, the number of processors N of the own server included in the processor count information 211 managed by the subordinate server information management module 111, and the number of processors M of each subordinate server. More specifically, the query dividing module 114 determines the generation number of the query executing modules 115 according to the above-described basic rule of the database management apparatus 1 according to the present embodiment. That is, the query dividing module 114 determines the generation number of the query executing modules 115 such that the total generation number of the query executing modules 115 does not exceed the number of processors of the own server and the number of the query executing modules 115 generated for each subordinate server does not exceed the number of processors of each subordinate server.
Next, a specific operation example where the query dividing module 114 determines the generation number of the query executing modules 115 will be described by citing some model cases.
First, the operation example of the query dividing module 114 in a case where the number of processors of the host server (own server) is larger than the number of processors of the subordinate server (which holds a table used for a query) will be described with reference to
Hereinafter, it is assumed that there are subordinate servers 1 to 3, and a query which uses the tables 1 dispersed in these subordinate servers is executed. Furthermore, as illustrated in
As illustrated in
At this point of time, the query dividing module 114 calculates 6.4 as the numbers of the query executing module 115 for the server 1, 0 as the numbers of the query executing module 115 for the server 2 and 1.6 as the numbers of the query executing module 115 for the server 3. In this regard, the number of processors of the server 1 is four, and therefore the query dividing module 114 determines four as the number of the query executing modules 115 generated for the server 1 instead of six which is obtained by rounding 6.4. Furthermore, the query dividing module 114 determines 0 as the number of the query executing modules 115 generated for the server 2. Furthermore, the query dividing module 114 determines two (which is within the number of processors of the server 2) which is obtained by rounding 1.6, as the number of the query executing modules 115 generated for the server 3.
That is, as illustrated in
Thus, the query dividing module 114 operates to allocate more processors for processing related to a subordinate server having a larger number of holding records (processing amount) and a higher load. Furthermore, allocating the processors of the own server exceeding the number of processors of one subordinate server to processing related to the one subordinate server makes, for example, part of processors standby and is wasteful. Therefore, the query dividing module 114 operates without causing such waste.
That is, compared to a case where a process which executes a query is uniformly generated according to the number of subordinate servers as in the one aforementioned comparative example, the database management apparatus 1 according to the present embodiment realizes efficient resource allocation.
Next, an operation example of the query dividing module 114 in a case where all processing amounts of the subordinate servers (which hold the tables used for a query) are the same degree will be described with reference to
Hereinafter, it is assumed that there are the servers 1 to 3 which are the subordinate servers, and a query which uses the tables 1 dispersed in these subordinate servers is executed. Furthermore, as illustrated in
In this case, too, the query dividing module 114 first calculates the ratio E of the numbers of the holding records of the servers 1 to 3 in the tables 1 as illustrated in
At this point of time, the query dividing module 114 calculates 2.4 as the number of the query executing modules 115 for all of the servers 1 to 3. The query dividing module 114 determines two (which is within the numbers of processors of the servers 1 to 3) obtained by rounding 2.4, as the number of the query executing modules 115 generated for these servers 1 to 3.
That is, as illustrated in
Thus, when the processing amounts of the subordinate servers are all the same degree, the query dividing module 114 operates to equally allocate the processors. In addition, while the number of processors of the own server is eight, the total generation number of the query executing modules 115 is six, and, while the number of processors of the server 1 is four, the number of the query executing modules 115 generated for the server 1 is six. Therefore, there is a room for allocating two more processors to processing related to the server 1, i.e., a room for generating the two more query executing modules 115. However, in this case, the processors of the own server are not allocated beyond the number calculated from the number of processors of the own server and the ratio of the number of holding records of the subordinate servers.
Next, an operation example of the query dividing module 114 in a case where the number of processors of the host server (own server) is smaller than the number of subordinate servers (which hold the table used for the query) will be described with reference to
Hereinafter, it is assumed that there are the servers 1 to 3 which are the subordinate servers, and a query which uses the tables 1 dispersed in these subordinate servers is executed. Furthermore, as illustrated in
In this case, too, as illustrated in
At this point of time, the query dividing module 114 calculates 0.5 as the number of the query executing modules 115 for the server 1, 1 as the number of the query executing modules 115 for the server 2 and 0.5 as the number of the query executing modules 115 for the server 3. In this regard, values less than one are calculated for the server 1 and the server 3, and therefore the query dividing module 114 determines one as the numbers of the query executing modules 115 generated for the server 1 and the server 3. One is calculated for the server 2, and therefore the query dividing module 114 determines one as the number of the query executing modules 115 generated for the server 2.
That is, as illustrated in
In addition, when there is a plurality of processing of subordinate servers whose generation number of the query executing modules 115 has been determined as one in order that a value calculated from the number of processors of the own server and the ratio of the number of holding records of the subordinate server is less than one, the query dividing module 114 may perform control to allocate a plurality of processing to one processor. When, for example, 0.5 is calculated from the number of processors of the own server and the ratio of the number of holding records of the subordinate server and therefore the generation number of the query executing modules 115 for the server 1 and the server 3 is determined as one as illustrated in
More specifically, in order from the subordinate server of the smallest processing amount among the subordinate servers from which a value less than one has been calculated from the number of processors of the own server and the ratio of the number of holding records of the subordinate servers, the query executing modules 115 generated for the subordinate servers may be allocated to one processor until the calculated values may be added up and exceed one.
Next, the query dividing method of the query dividing module 114 will be described with reference to
The query dividing module 114 divides a query by using, for example, a LIMIT phrase (
In this case, the query dividing module 114 which has determined to generate the three query executing modules 115 for the subordinate server whose number of holding records in the tables 1 (Tables 1) used for the query is 100 is assumed to divide this query into three.
In this case, as illustrated in
The database management apparatus 1 first analyzes a query, and decides which table to use (step A1). Next, the database management apparatus 1 calculates a ratio of the number of holding records of each subordinate server in this table (step A2).
Subsequently, the database management apparatus 1 determines the number of the query executing modules 115 to generate, i.e., the number of processors (of the own server) to be allocated to each subordinate server based on the ratio of the number of the holding records and the number of processors (of the own server and each subordinate server) (step A3). Furthermore, the database management apparatus 1 divides the query based on the number of the query executing modules 115 to generate per subordinate server (step A4).
When determining the generation number of the query executing modules 115 and dividing the query, the database management apparatus 1 executes the query in each query executing module 115 (step A5). The database management apparatus 1 accumulates results obtained by the respective query executing modules 115 (step A6).
By the way, according to the above configuration, the host server and the subordinate servers employ the same configuration, own server information transmitted from the own server information transmitting module 123 on the subordinate server side to the host server is received by the subordinate server information obtaining module 112 on the host server side, and the subordinate server information management module 111 manages the own server information as subordinate server information. On the other hand, as illustrated in, for example,
To meet this request, the database management apparatus 1 may further include a mechanism which absorbs a difference between the own server and the subordinate servers when there are subordinate servers of different configurations.
In the first example, the subordinate server information obtaining module 112 first collects subordinate server information related to a subordinate server for this subordinate server which is the different configuration data source 2 from this subordinate server on a regular basis, for example (
Furthermore, the query executing module 115 includes a query converting module 115A, and the query converting module 115A converts the query passed from the query dividing module 114 into an executable format of the subordinate server and transmits the converted query to the subordinate server (
For example, information indicating which subordinate server is the different configuration data source 2 and in which format the subordinate server which is the different configuration data source 2 holds a table may be given in advance to the database management apparatus 1 or may be actively obtained by the database management apparatus 1. When, for example, a subordinate server is newly connected, and when subordinate server information is not transmitted from the subordinate server to the own server after a certain period, this subordinate server may be decided as the different configuration data source 2, and a query for inquiring a holding table and the number of holding records of the table may be transmitted.
Consequently, the database management apparatus 1 can absorb the difference between the own server and the subordinate servers. In addition, when subordinate server information is collected on a regular basis, an error may occur in the number of holding records during executing a query, for example. However, the query is divided as illustrated in
Furthermore,
In this second example, instead of collecting subordinate server information on a regular basis, the query executing module 115 collects the subordinate server information at a timing (
In the second example, the subordinate server information may be collected every time the query executing module 115 transmits the query to the subordinate server, or may be collected when the query executing module 115 transmits the query to the subordinate server after a certain period or more passes since previous collection.
The database management apparatus 1 checks whether or not the subordinate server employs the same configuration as that of the own server (host server) (step B1). In a case of the same configuration (step B1: YES), the subordinate server side transmits subordinate server information as own server information. Consequently, the database management apparatus 1 does not actively collect the subordinate server information from the subordinate server.
On the other hand, in a case of the different configuration (step B1: NO), the database management apparatus 1 obtains the number of tables from the subordinate server at a predetermined timing, and updates the subordinate server information related to this subordinate server (step B2).
As described above, the database management apparatus 1 according to the present embodiment realizes efficient resource allocation by taking into account the number of holding records of the table in each subordinate server, the number of processors of the host server (own server), and the number of processors of each subordinate server.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2018-126130 | Jul 2018 | JP | national |