The present invention relates to an information processing system and an information processing method, and is suitable for application to an analysis system that analyzes big data, for example.
In recent years, the use of big data is expanding. Although analysis of the big data is necessary in the case of using big data, it is considered that an application of a scale-out distributed database such as Hadoop or Spark will be mainstream in the future in the analysis field of big data. Further, a need for self-service analysis of interactive and short Turn Around Time (TAT) using big data is also increasing for quick decision making.
PTL 1 discloses a technique that generates each query based on the processing capability of each database server, which is a coordinator server connected to a plurality of distributed database servers each including a database that stores XLM data.
PTL 1: JP-A-2009-110052
Here, although a large number of nodes are required for securing performance in order to process a large amount of data at high speed in a distributed database system, as a result, there is a problem that the system scale is increased and introduction cost and maintenance cost are increased.
A method that reduces the number of nodes and prevents the system scale by installing an accelerator in a node of a distributed database system and improving per-node performance is considered as one of methods for solving such problem. In practice, many accelerators with a same function as an Open-Source Software (OSS) database engine have been announced at research level, and it is considered that the performance of the node can be improved by using such accelerators.
However, this kind of accelerator is premised on some system alterations, and there is no accelerator available without altering the general database engine so far.
Here, in recent years, there is a movement (Apache Arrow) to extend an user-defined function (UDF) of an OSS Apache distributed database engine (Spark, Impala and the like), and an environment that achieves an OSS distributed database accelerator without an alteration of a database engine is being established. Meanwhile, when the user-defined function is used, there still remains a problem that an alteration of an application that generates a Structured Query Language (SQL) query is necessary.
The invention has been made in view of the above points, an object is to propose an information processing technique that can prevent increase in the system scale for high-speed processing of large capacity data without performing an alteration of an application and prevent increase in introduction cost and maintenance cost.
In order to solve such problem, in one embodiment of the invention, an accelerator is installed in each server which is a worker node of a distributed DB system. A query generated by an application of an application server is divided into a first task that should be executed by an accelerator and a second task that should be executed by software, and is distributed to a server of a distributed DB system. The server causes the accelerator to execute the first task and executes the second task based on the software.
According to one embodiment of the invention, it is possible to provide a technique for high-speed processing of large volume data.
Hereinafter, one embodiment of the invention is described in detail with reference to drawings.
(1-1) Configuration of Information Processing System according to the Present Embodiment
1 denotes an information processing system according to the present embodiment as a whole in
In practice, the information processing system 1 includes one or a plurality of clients 2, an application server 3, and a distributed database system 4. Further, each client 2 is connected to the application server 3 via a first network 5 such as Local Area Network (LAN) or Internet.
Further, the distributed database system 4 includes a master node server 6 and a plurality of worker node servers 7. The master node server 6 and the worker node server 7 are respectively connected to the application server 3 via a second network 8 such as LAN or Storage Area Network (SAN).
The client 2 is a general-purpose computer device used by a user. The client 2 transmits a big data analysis request which includes an analysis condition specified based on a request from an user operation or an application mounted on the client 2 to the application server 3 via the first network 5. Further, the client 2 displays an analysis result transmitted from the application server 3 via the first network 5.
The application server 3 is a server device that has a function of generating an SQL query used for acquiring data necessary for executing analysis processing requested from the client 2 and transmitting the SQL query to the master node server 6 of the distributed database system 4, executing the analysis processing based on a SQL query result transmitted from the master node server 6, and displaying the analysis result on the client 2.
The application server 3 includes a Central Processing Unit (CPU) 10, a memory 11, a local drive 12, and a communication device 13.
The CPU 10 is a processor that governs overall operation control of the application server 3. Further, the memory 11 includes, for example, a volatile semiconductor memory and is used as a work memory of the CPU 10. The local drive 12 includes, for example, a large-capacity nonvolatile storage device such as a hard disk device or Solid State Drive (SSD) and is used for holding various programs and data for a long period.
The communication device 13 includes, for example, Network Interface Card (NIC), and performs protocol control at the time of communication with the client 2 via the first network 5 and at the time of communication with the master node server 6 or the worker node server 7 via the second network 8.
The master node server 6 is a general-purpose server device (an open system) which functions as a master node, for example, in Hadoop. In practice, the master node server 6 analyzes the SQL query transmitted from the application server 3 via the second network 8, and divides the processing based on the SQL query into tasks such as Map processing and Reduce processing. Further, the master node server 6 creates an execution plan of these task of the Map processing (hereinafter referred to as a Map processing task) and task of the Reduce processing (hereinafter referred to as a Reduce processing task), and transmits execution requests of these Map processing task and Reduce processing task to each worker node server 7 according to the created execution plan. Further, the master node server 6 transmits the processing result of the Reduce processing task transmitted from the worker node server 7 to which the Reduce processing task is distributed as the processing result of the SQL query to the application server 3.
Similar to the application server 3, the master node server 6 includes a CPU 20, a memory 21, a local drive 22, and a communication device 23. Since functions and configurations of the CPU 20, the memory 21, the local drive 22, and the communication device 23 are the same as corresponding portions (the CPU 10, the memory 11, the local drive 12, and the communication device 13) of the application server 3, detailed descriptions of these are omitted.
The worker node server 7 is a general-purpose server device (an open system) which functions as a worker node, for example, in Hadoop. In practice, the worker node server 7 holds a part of the distributed big data in a local drive 32 which will be described later, executes the Map processing and the Reduce processing according to the execution request of the Map processing task and the Reduce processing task (hereinafter referred to as a task execution request) given from the master node server 6, and transmits the processing result to other worker node server 7 and the master node server 6.
The worker node server 7 includes an accelerator 34 and a Dynamic Random Access Memory (DRAM) 35 in addition to a CPU 30, a memory 31, a local drive 32, and a communication device 33. Since functions and configurations of the CPU 30, the memory 31, the local drive 32, and the communication device 33 are the same as corresponding portions (the CPU 10, the memory 11, the local drive 12, and the communication device 13) of the application server 3, detailed descriptions of these are omitted. Communication between the master node server 6 and the worker node server 7 and communication between the worker node servers 7 are all performed via the second network 8 in the present embodiment.
The accelerator 34 includes a Field Programmable Gate Array (FPGA) and executes the Map processing task and the Reduce processing task defined by a prescribed format user-defined function included in the task execution request given from the master node server 6. Further, DRAM 35 is used as a work memory of the accelerator 34. In the following description, it is assumed that all the accelerators installed in each worker node server have the same performances and functions.
Further, an analysis Business Intelligence (BI) tool 41, a Java (registered trademark) Database Connectivity/Open Database Connectivity (JDBC/ODBC) driver 42, and a query conversion unit 43 are mounted on the application server 3. The analysis BI tool 41, the JDBC/ODBC driver 42, and the query conversion unit 43 are functional units which are embodied by executing a program (not shown) stored in the memory 11 (
The analysis BI tool 41 is an application which has a function of generating the SQL query used for acquiring database data necessary for analysis processing according to the analysis condition set on the analysis condition setting screen displayed on the client 2 by a user from the distributed database system 4. The analysis BI tool 41 executes the analysis processing in accordance with such analysis condition based on the acquired database data and causes the client to display the analysis result screen including the processing result.
Further, the JDBC/ODBC driver 42 functions as an interface (API: Application Interface) for the analysis BI tool 41 to access the distributed database system 4.
The query conversion unit 43 inherits a class of the JDBC/ODBC driver 42 and is implemented as a child class to which a query conversion function is added. The query conversion unit 43 has a function of converting the SQL query generated by the analysis BI tool 41 into the SQL query explicitly divided into a task that should be executed by the accelerator 34 (
In practice, the accelerator information table 44 in which hardware specification information of the accelerator 34 mounted on the worker node server 7 of the distributed database system 4 is previously stored by a system administrator and the like is stored in the local drive 12 of the application server 3 in the present embodiment.
As shown in
Further, the query conversion unit 43 divides the SQL query generated by the analysis BI tool 41 into the Map processing task and the Reduce processing task with reference to the accelerator information table 44. The Map processing task and the Reduce processing task which can be executed by the accelerator 34 are defined (described) by the user-defined function among the Map processing task and the Reduce processing task. The SQL query defined (described) by a format (that is, SQL) which can be recognized by software mounted on the worker node server 7 of the distributed database system 4 is generated for other task (that is, the SQL task generated by the analysis BI tool 41 is converted into such SQL).
For example, when the SQL query generated by the analysis BI tool 41 only includes the Map processing (filter processing) task as shown in
Further, when the SQL query generated by the analysis BI tool 41 includes the Map processing task and the Reduce processing task as shown in
Meanwhile, a Thrift server unit 45, a query parser unit 46, a query planner unit 47, a resource management unit 48, and a task management unit 49 are mounted on the master node server 6 of the distributed database system 4 as shown in
The Thrift server unit 45 has a function of receiving the SQL query transmitted from the application server 3 and transmitting an execution result of the SQL query to the application server 3. Further, the query parser unit 46 has a function of analyzing the SQL query received from the application server 3 by the Thrift server unit 45 and converting the SQL query into an aggregate of data structures handled by the query planner unit 47.
The query planner unit 47 has a function of dividing the content of the processing specified by the SQL query into respective Map processing task and Reduce processing task and creating execution plans of these Map processing task and Reduce processing task based on the analysis result of the query parser unit 46.
Further, the resource management unit 48 has a function of managing specification information of hardware resources of each worker node server 7, information relating to the current usage status of the hardware resource collected from each worker node server 7, and the like, and determining the worker node server 7 that executes the Map processing task and the Reduce processing task according to the execution plan created by the query planner unit 47 for each task respectively.
The task management unit 49 has a function of transmitting a task execution request that requests the execution of such Map processing task and Reduce processing task to the corresponding worker node server 7 based on the determination result of the resource management unit 48.
On the other hand, a scan processing unit 50, an aggregate processing unit 51, a join processing unit 52, a filter processing unit 53, a processing switching unit 54, and an accelerator control unit 55 are mounted on each worker node server 7 of the distributed database system 4. The scan processing unit 50, the aggregate processing unit 51, the join processing unit 52, the filter processing unit 53, the processing switching unit 54, and the accelerator control unit 55 are functional units that are embodied by executing corresponding programs (not shown) stored in the memory (
The scan processing unit 50 has a function of reading necessary database data 58 from the local drive 32 and loading the necessary database data 58 into the memory 31 (
The processing switching unit 54 has a function of determining whether the Map processing task and the Reduce processing task included in the task execution request given from the master node server 6 should be executed by software processing using the aggregate processing unit 51, the join processing unit 52 and/or the filter processing unit 53 or should be executed by hardware processing using the accelerator 34. When a plurality of tasks are included in the task execution request, the processing switching unit 54 determines whether each task should be executed by software processing or should be executed by hardware processing.
In practice, when the task is described by the SQL in the task execution request, the processing switching unit 54 determines that the task should be executed by the software processing and causes the task to be executed in a necessary processing unit among the aggregate processing unit 51, the join processing unit 52 and the filter processing unit 53. Further, when the task is described by the user-defined function in the task execution request, the processing switching unit 54 determines that the task should be executed by the hardware processing, calls the accelerator control unit 55, and gives the user-defined function to the accelerator control unit 55.
The accelerator control unit 55 has a function of controlling the accelerator 34. When called from the processing switching unit 54, the accelerator control unit 55 generates one or a plurality of commands (hereinafter referred to as accelerator command) necessary for causing the accelerator 34 to execute the task (the Map processing task or the Reduce processing task) defined by the user-defined function based on the user-defined function given from the processing switching unit 54 at that time. Then, the accelerator control unit 55 sequentially outputs the generated accelerator commands to the accelerator, and causes the accelerator 34 to execute the task.
The accelerator 34 has various functions for executing the Map processing task and the Reduce processing task.
Thus, the accelerator control unit 55 executes a summary processing that summarizes a processing result of each accelerator command output from the accelerator 34. When the task executed by the accelerator 34 is the Map processing task, the worker node server 7 transmits the processing result to other worker node server 7 to which the Reduce processing is allocated, and when the task executed by the accelerator 34 is the Reduce processing task, the worker node server 7 transmits the processing result to the master node server 6.
Next, processing contents of various processing executed in the information processing system 1 will be described.
When the SQL query is given from the analysis BI tool 41, the query conversion unit 43 starts the query conversion processing, firstly analyzes the given SQL query, and converts the SQL query content into an aggregate of data structures handled by the query conversion unit 43 (S1).
Then, the query conversion unit 43 divides the content of the processing specified by the SQL query into respective Map processing task and Reduce processing task based on such analysis result, and creates an execution plan of these Map processing task and Reduce processing task (S2). Further, the query conversion unit 43 refers to the accelerator information table 44 (
When obtaining a negative result in this determination, the query conversion unit 43 transmits the SQL query given from the analysis BI tool 41 as it is to the master node server 6 of the distributed database system 4 (S5), and thereafter, ends this query conversion processing.
In contrast, when obtaining a positive result in a determination of step S4, the query conversion unit 43 converts such SQL query into the SQL query in which the task (the Map processing task or the Reduce processing task) executable by the accelerator 34 of the worker node server 7 is defined by the user-defined function (S6), further, other task is defined by the SQL (S7).
Then, the query conversion unit 43 transmits the converted SQL query to the master node server 6 of the distributed database system 4 (S8), and thereafter ends the query conversion processing.
Meanwhile,
When the SQL query is transmitted from the application server 3, a processing shown in
The query planner unit 47 (
Thereafter, the resource management unit 48 (
Next, the task management unit 49 (
When the task execution request of the Map processing task is given from the master node server 6 to the worker node server 7, the processing shown in
Then, the processing switching unit 54 (
When obtaining a negative result in this determination, the processing switching unit 54 activates a necessary processing unit among the aggregate processing unit 51 (
In contrast, when obtaining a positive result in the determination of step S21, the processing switching unit 54 causes the aggregate processing unit 51, the combining processing unit 52 and the filter processing unit 53 to execute the Map processing task and the Reduce processing task which are not defined by the user-defined function, and meanwhile, in parallel with this, calls the accelerator control unit 55 (
Further, the accelerator control unit 55 called by the processing switching unit 54 generates one or a plurality of necessary accelerator commands based on the user-defined function included in the task execution request, and causes the accelerator 34 to execute the Map processing task defined by the user-defined function by sequentially giving the generated accelerator commands to the accelerator 34 (S23).
Further, when the Map processing task is completed by the accelerator 34, the accelerator control unit 55 executes the summary processing summarizing the processing result (S24), and thereafter, transmits a processing result of the summary processing and a processing result of the Map processing task that undergoes software processing to the worker node server 7 to which the Reduce processing is allocated (S25). Thus, the processing in the worker node server 7 is ended.
Meanwhile,
When the task execution request of the Reduce processing task is given from the master node server 6 to the worker node server 7, the processing shown in
Further, when receiving all necessary processing results of the Map processing task, the processing switching unit 54 determines whether or not the user-defined function is included in the task execution request given from the master node server 6 (S31).
When obtaining a negative result in this determination, the processing switching unit 54 activates the necessary processing unit among the aggregate processing unit 51, the join processing unit 52, and the filter processing unit 53 to execute the Reduce processing task (S32). Further, the processing unit that executes the Reduce processing task transmits the processing result to the master node server 6 (S35). Thus, the processing in the worker node server 7 is ended.
In contrast, when obtaining a positive result in the determination of step S31, the processing switching unit 54 calls the accelerator control unit 55. Further, the accelerator control unit 55 called by the processing switching unit 54 generates one or a plurality of necessary accelerator commands based on the user-defined function included in the task execution request, and causes the accelerator 34 to execute the Reduce processing task defined by the user-defined function by sequentially giving the generated accelerator commands to the accelerator 34 (S33).
Further, when the Reduce processing task is completed by the accelerator 34, the accelerator control unit 55 executes a summary processing summarizing the processing result (S34), and thereafter transmits the processing result of the summary processing to the master node server 6 (S35). Thus, the processing in the worker node server 7 is ended.
When the analysis instruction is given and the SQL query is generated based on the analysis instruction, the application server 3 converts the generated SQL query into an SQL query in which the task executable by the accelerator 34 of the worker node server 7 is defined by the user-defined function and other task is defined by the SQL (S41). Further, the application server 3 transmits the converted SQL query to the master node server 6 (S42).
When the SQL query is given from the application server 3, the master node server 6 creates a query execution plan and divides the SQL query into the Map processing task and the Reduce processing task. Further, the master node server 6 determines the worker node server 7 to which these divided Map processing task and Reduce processing task are distributed (S43).
Further, the master node server 6 transmits the task execution requests of the Map processing task and the Reduce processing task to the corresponding worker node server 7 respectively based on such determination result (S44 to S46).
The worker node server 7 to which the task execution request of the Map processing task is given exchanges the database data 58 (
Further, when the processing result of the Map processing task is given from all the worker node servers 7 to which the related Map processing task is allocated, the worker node server 7 to which the task execution request of the Reduce processing task is given executes the Reduce processing task specified in the task execution request (S50). Further, when the Reduce processing task is completed, such worker node server 7 transmits the processing result to the master node server 6 (S51).
The processing result of the Reduce processing task received by the master node server 6 at this time is the processing result of the SQL query given from the application server 3 by the master node server 6 at that time. Thus, the master node server 6 transmits the received processing result of the Reduce processing task to the application server 3 (S52).
When the processing result of the SQL query is given from the master node server 6, the application server 3 executes the analysis processing based on the processing result and displays the analysis result on the client 2 (S53).
Meanwhile,
Since various processing executed by the scan processing unit 50, the aggregate processing unit 51, the join processing unit 52, the filter processing unit 53, the processing switching unit 54, and the accelerator control unit 55 are eventually executed by the CPU 30, processing of the CPU 30 is used in
When receiving the task execution request of the Map processing task transmitted from the master node server 6, the communication device 33 stores the task execution request in the memory 31 (S60). Then, the task execution request is read from the memory 31 by the CPU 30 (S61).
When reading the task execution request from the memory 31, the CPU 30 instructs transfer of necessary database data 58 (
The accelerator 34 starts the Map processing task according to an instruction from the CPU 30, and executes necessary filter processing and aggregate processing (S66) while appropriately reading the necessary database data 58 from the memory 31. Then, the accelerator 34 appropriately stores the processing result of the Map processing task in the memory 31 (S67).
Thereafter, the processing result of such Map processing task stored in the memory 31 is read by the CPU 30 (S68). Further, the CPU 30 executes the summary processing summarizing the read processing results (S69), and stores the processing result in the memory 31 (S70). Thereafter, the CPU 30 gives an instruction to the communication device 33 to transmit the processing result of such result summary processing to the worker node server 7 to which the Reduce processing is allocated (S71).
Thus, the communication device 33 to which such instruction is given reads the processing result of the result summary processing from the memory 31 (S72), and transmits the processing result to the worker node server 7 to which the Reduce processing is allocated (S73).
In the information processing system 1 according to the present embodiment as described above, the application server 3 converts the SQL query generated by the analysis BI tool 41 which is the application into the SQL query in which the task executable by the accelerator 34 of the worker node server 7 of the distributed database system 4 is defined by the user-defined function and other task is defined by the SQL; the master node server 6 divides the processing of the SQL query for each task, and allocates these tasks to each worker node server 7; each worker node server 7 executes the task defined by the user-defined function in the accelerator 34, and processes the task defined by the SQL by the software.
Therefore, it is possible to improve the performance per worker node server 7 by causing the accelerator 34 to execute some tasks without requiring alteration of the analysis BI tool 41, for example, according to the information processing system 1. At this time, the information processing system 1 does not require the alteration of the analysis BI tool 41. Therefore, it is possible to prevent an increase in system scale for high-speed processing of large-capacity data without requiring the alteration of the application, and to prevent an increase in introduction cost and maintenance cost according to the information processing system 1.
(2) Second Embodiment 60 shows an information processing system according to a second embodiment as a whole in
In practice, in the information processing system 1 according to the first embodiment, the transfer of the database data 58 from other worker node server 7 or the local drive 32 to the accelerator 34 is performed via the memory 31 as described above with reference to
When the task execution request of the Map processing is given from the master node server 6 to the worker node server 62, the processing shown in
When obtaining a negative result in this determination, the processing switching unit 54 activates a necessary processing unit among the aggregate processing unit 51, the join processing unit 52, and the filter processing unit 53 to execute the task of the Map processing (S81). Further, the processing unit that executes such Map processing task transmits the processing result to the worker node server 62 to which the Reduce processing task is allocated (S85). Thus, the processing in the worker node server 62 is ended.
In contrast, when obtaining a positive result in the determination of step S80, the processing switching unit 54 causes the aggregate processing unit 51, the join processing unit 52 and the filter processing unit 53 to execute the Map processing task and the Reduce processing task which are not defined by the user-defined function, and meanwhile, in parallel with this, calls the accelerator control unit 55.
Further, the accelerator control unit 55 called by the processing switching unit 50 converts the user-defined function included in the task execution request into a command used for the accelerator and instructs the accelerator 63 to execute the Map processing task by giving the command to the accelerator 63 (
When such instruction is given, the accelerator 63 gives the instruction to the local drive 32 or other worker node server 62 to directly transfer the necessary database data (S83). Thus, the accelerator 63 executes the Map processing task specified in the task execution request by using the database data transferred directly from the local drive 32 or the other worker node server 62.
Then, when the Map processing is completed by the accelerator 63, the accelerator control unit 55 executes the result summary processing summarizing the processing results (S84), and thereafter, transmits the processing result of the result summary processing and the processing result of the Map processing task that undergoes the software processing to the worker node server 62 to which the Reduce processing is allocated (S85). Thus, the processing in the worker node server 62 is ended.
As in the case of
When receiving the task execution request of the Map processing task transmitted from the master node server 6, the communication device 33 stores the task execution request in the memory 31 (S90). Thereafter, the task execution request is read from the memory 31 by the CPU 30 (S91).
When reading the task execution request from the memory 31, the CPU 30 gives the instruction to the accelerator 63 to execute the Map processing task according to the task execution request (S92). Further, the accelerator 63 receiving the instruction requests the transfer of necessary database data to the local drive 32 (or other worker node server 62). As a result, the necessary database data is directly given from the local drive 32 (or other worker node server 62) to the accelerator 63 (S93).
Further, the accelerator 63 stores the database data transferred from the local drive 32 (or other worker node server 62) in the DRAM 35 (
Thereafter, the processing similar to the step S68 to step S71 in
Since the accelerator 63 directly acquires the database data 58 from the local drive 32 without going through the memory 31 according to the information processing system 60 of the present embodiment as described above, it is unnecessary to transfer the database data from the local drive 32 to the memory 31 and transfer the database data from the memory 31 to the accelerator 63 so as to reduce the necessary data transfer bandwidth of the CPU 30 and to perform data transfer with low delay, and as a result, the performance of the worker node server 62 can be improved.
Although a case where the hardware specification information of the accelerators 34, 63 stored in the accelerator information table 44 (
The accelerator information acquisition unit 72 may have a software configuration embodied by executing the program stored in the memory 11 by the CPU 10 of the application server 3 or a hardware configuration including dedicated hardware.
Although a case where communication between the worker node servers 7, 62 is performed via the second network 8 is described in the first embodiment and the second embodiment, the invention is not limited to this, for example, as shown in
Further, although a case where the application (a program) mounted on the application server 3 is the analysis BI tool 41 is described in the first and second embodiments, the invention is not limited to this, the invention can be widely applied even if the application is other than the analysis BI tool 41.
(4) Third Embodiment 90 shows an information processing system according to third embodiment as a whole in
The worker node server 92 has a combined function of the master node server 6 and the worker node server 7 (62) in
The query received from the application server 91 is first analyzed by the query parser unit 46. The query planner unit 93 cooperates with an accelerator optimization rule unit 95 to generate the query plan suitable for accelerator processing by using the query analyzed by the query parser unit 46.
The accelerator optimization rule unit 95 applies a query plan generation rule optimized for the accelerator processing taking account of constraint conditions of the accelerator using the accelerator information table 44 (
A file path resolution unit 96 searches and holds conversion information from storage location information on a distributed file system 100 (a distributed file system path) and storage location information on a local file system 101 (a local file system path) of a database file, and responds to the file path inquiry.
An execution engine unit 94 includes the join processing unit 52, the aggregate processing unit 51, the filter processing unit 53, the scan processing unit 50, and the exchange processing unit 102, and executes the query plan in cooperation with an accelerator control unit 97 and an accelerator 98 (so-called software processing).
The distributed file system 100 is configured as one single file system by connecting a plurality of server groups with a network. An example of the distributed file system is Hadoop Distributed File System (HDFS).
A file system 101 is one of the functions possessed by an operating system (OS), manages logical location information (Logical Block Address (LBA) and size) and the like of the file stored in the drive, and provides a function to read data on the drive from the location information of the file in response to a read request based on a file name from the application and the like.
A standard query plan 110 is a query plan generated first by the query planner unit 93 from an input query. The standard query plan may be converted into a converted query plan 124 as will be described later or may be executed by the execution engine unit 94 without conversion. The standard query plan 110 shows that processing is executed in the order of scan processing S122, filter processing S119, aggregate processing S116, exchange processing S113, and aggregate processing S111 from the processing in the lower part of the drawing.
The scan processing S122 is performed by the scan processing unit 50, and includes: reading the database data from the distributed file system 100 (S123); converting the database data into an in-memory format for the execution engine unit, and storing the converted database data in a main storage (a memory 31 (
The filter processing S119 is performed by the filter processing unit 53, and includes: reading the scan processing result data from the main storage (S120); determining whether or not each line data matches the filter condition; making a hit determination on the matching line data; and storing the result in the main storage (S118) (filter processing).
The first aggregate processing (the aggregate processing) S116 is performed by the aggregate processing unit 51, and includes: reading the hit-determined line data from the main storage (S117); executing the processing according to the aggregate condition; and storing the aggregate result data in the main storage (S115).
The exchange processing S113 is performed by the exchange processing unit 102, and includes: reading aggregate result data from the main storage (S114); and transferring the aggregate result data to the worker node server 92 that executes the second aggregate processing (the summary processing) described later on S111 via the network (S112).
In the second aggregate processing (the summary processing) S111, the worker node server 92 in charge of the summary executes summary aggregate processing of the aggregate result data collected from each worker node server 92, and transmits the aggregate result data to the application server 91.
The converted query plan 124 is converted and generated by the accelerator optimization rule unit 95 based on the standard query plan 110. The query plan to be processed by the accelerator 98 is converted, and the query plan processed by the execution engine unit is not converted. The specification information of the accelerator and the like are referred to determine which processing is appropriate, and decide the necessity of conversion. The converted query plan 124 shows that processing is executed in the order of FPGA parallel processing S130, exchange processing S113, and aggregate processing S111 from the processing in the lower part of the drawing.
The FPGA parallel processing S130 is performed by the accelerator 98 (the scan processing unit 99, the filter processing unit 57, and the aggregate processing unit 56), and includes: reading the database data of the local drive 32 (S135) and performing the scan processing, the filter processing, and the aggregate processing according to an aggregate condition 131, a filter condition 132, a scan condition 133, and a data locality utilization condition 134; and thereafter, format-converting the processing result of the accelerator 99 and storing the processing result in the main storage (S129). The accelerator optimization rule unit 95 detects the scan processing S122, the filter processing S119, and the aggregate processing S116 that exist in the standard query plan, collects the conditions of the processing and sets as the aggregate condition, the filter condition, and the scan condition of the FPGA parallel processing S130. The aggregate condition 131 is information necessary for the aggregate processing such as an aggregate operation type (SUM/MAX/MIN), a grouping target column, an aggregate operation target column, the filter condition 132 is information necessary for the filter processing such as comparison conditions (=, >, < and the like) and comparison target columns, and the scan condition 133 is information necessary for the scan processing of location information on the distributed file system of the database data file of read target (a distributed file system path) and the like. The data locality utilization condition 134 is a condition for targeting the database data file which exists in the file system 101 on the own worker node server 92 as a scan processing target. The FPGA parallel processing S130 is executed by the accelerator 99 according to an instruction from the accelerator control unit 97.
The exchange processing S113 and the second aggregate processing S111 are performed by the exchange processing unit 102 and the aggregate processing unit 51 in the execution engine unit 94 similarly to the standard query plan. These processing units may be provided in the accelerator 99.
Since the standard query plan 110 is assumed to be processed by CPU, in each processing of scan, filter, and aggregate, the basic operation is to place data in the main storage or read from the main storage at the start and completion of the processing. Data input/output of such main storage causes data movement between the CPU and the memory, which is a factor of lowering the processing efficiency. In the query plan conversion method according to the invention, each processing can undergo pipeline parallel processing within the accelerator by converting each processing to a new integrated FPGA parallel processing S130, and the movement of data between the FPGA and the memory is unnecessary, thereby improving the processing efficiency.
Further, since the scan processing S122 in the standard query plan acquires the database data from the distributed file system 100, database data may be acquired from other worker node server 92 via the network according to the data distribution situation of the distributed file system 100. In the query plan conversion according to the invention, it is possible to efficiently operate the accelerator by ensuring that the accelerator 98 can reliably acquire the database data from the neighboring local drive.
The client 2 first instructs a database data storage instruction to the distributed file system 100 (S140). The distributed file system 100 of the summarized worker node server #0 divides the database data into a block of a prescribed size and transmits a copy of the data to other worker node server for replication (S141 and S142). In each worker node, the file path resolution unit 96 detects that the block of the database data is stored according to an event notification from the distributed file system 100, and then, a correspondence table between the distributed file system path and the local file system path is created by searching the block on the local file system 101 on each server 92 (S143, S144 and S145). The correspondence table may be updated each time the block is updated, or may be stored and saved in a file as a cache.
Next, the client 2 transmits the analysis instruction to the application server (S146). The application server 91 transmits the SQL query to the distributed database system. 103 (S148). The worker node server #0 that received the SQL query converts the query plan as described above and transmits the converted query plan (and the non-converted standard query plan) to other worker node servers #1 and #2 (S150 and S151).
Each of the worker nodes #0, #1 and #2 offloads the scan processing, the filter processing, and the aggregate processing of the FPGA parallel processing to the accelerator 98 for execution (S152, S153 and S154). The non-converted standard query plan is executed by the execution engine 94. Then, the worker node servers #1 and #2 transmit the result data output by the accelerator 98 or the execution engine 94 to the worker node server #0 for summary processing (S155 and S156).
The worker node server #0 executes the summary processing of the result data (S157), and transmits the summary result data to the application server (S158). The application server transmits the result to the client used for displaying to the user (S159).
Although the query conversion is performed by the worker node server #0 in the embodiment, the query conversion may be performed by the application server or individual worker node servers #1 and #2.
The accelerator control unit 97 determines whether the filter condition is a normal form (S170). If the filter condition is not the normal form, it is converted into the normal form by a distribution rule and a De Morgan's law (S171). Then, a normal form filter condition expression is set to a parallel execution command of the accelerator (S172). The normal form is a conjunctive normal form (a multiplicative normal form) or a disjunctive normal form (an additive normal form).
Further, an example of the conversion of the filter condition is shown in
Then, the accelerator control unit 97 converts the file system path into the LBA (for example: 0x0124abcd . . . ) and size information which is the logical location information of the file on the drive by inquiring the file system of the OS as a second conversion (S191). Finally, the scan condition is set to the parallel execution command together with the LBA and size information (S192).
According to this method, the accelerator does not need to analyze a complicated distributed file system or a file system, and it is possible to directly access the database data of the drive from the LBA and size information in the parallel execution command.
The invention can be widely applied to an information processing system of various configurations that executes processing instructed from a client based on information acquired from a distributed database system.
60, 70, 80, 90 . . . information processing system; 2 . . . client; 3, 71, 91 . . . application server; 4, 61, 103 . . . distributed database system; 6 . . . master node server; 7, 62, 92 . . . worker node server; 10, 20, 30 . . . CPU; 11, 21, 31 . . . memory; 12, 22, 32 . . . local drive; 34, 63, 98 . . . accelerator; 41 . . . analysis BI tool; 43 . . . query conversion unit; 44 . . . accelerator information table; 45 . . . Thrift server unit; 46 . . . query parser unit; 47 . . . query planner unit; 48 . . . resource management unit; 49 . . . task management unit; 50 . . . scan processing unit; 51, 56 . . . aggregate processing unit; 52 . . . join processing unit; 53, 57 . . . filter processing unit; 54 . . . processing switching unit; 55, 97 . . . accelerator control unit; 58 . . . database data; 72 . . . accelerator information acquisition unit; 81 . . . code; 95 . . . accelerator optimization rule unit, 96 . . . file path resolution unit, 99 . . . scan processing unit, 100 . . . distributed file system, 101 . . . file system.
Number | Date | Country | Kind |
---|---|---|---|
PCT/JP2017/004083 | Feb 2017 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2018/003703 | 2/2/2018 | WO | 00 |