Large-scale or massive-scale statistical analysis, sometimes referred to as MaSSA, may involve examining large amounts of data at once. For example, scientific instruments used in astronomy, physics, remote sensing, oceanography, and biology can produce large data volumes. Efficiently processing such large amounts of data may be challenging.
Some embodiments are described with respect to the following figures:
Traditional database systems may encounter certain difficulties when processing data for large-scale statistical analyses. Current database systems may approach storage of data at an element granularity. For instance, a data structure such as a matrix may be stored in an array, and each data element in the matrix may correspond to an element in the array. Dense arrays having many elements (e.g., arrays representing large matrices) can occupy a large amount of storage space, and in some cases may be larger than available memory.
Furthermore, database query engines use an iterative execution model to execute functions on the stored data on an element-by-element basis. As such, iterating through each element in a data structure to satisfy a complicated query request may be relatively inefficient. In the context of large data sets, the inefficiency in executing such query requests may be exacerbated, thereby degrading performance of the database system.
The database subsystem 105 may also be in communication with a graphics processing unit (GPU) 140. The GPU 140 may be coupled to a GPU memory 150 which may store GPU libraries 160. The GPU 140 may be a graphics processing unit that is capable of executing particular computations traditionally performed by a central process unit (CPU) such as the processor 110. This ability may be referred to as general purpose computing in graphics processing unit (GPGPU). Such capabilities may be in addition to the ability of the GPU 140 to perform computations for computer graphics, which provide images for display in a display device (not shown).
The GPU libraries 160 may provide an interface for the database subsystem 105 to access the GPU 140 to execute the particular computations traditionally performed by a CPU (e.g. processor 110). Indeed, the GPU libraries 160 may provide access to instructions sets for the GPU 140 as well as the GPU memory 150. For example, through the GPU libraries 160, a developer may be able to use a standard programming language (such as C) to code instructions for execution on the GPU 140 to take advantage of the GPU's 140 parallel processing architecture.
In some implementations, the GPU 140 may have multiple processing cores with each core capable of processing multiple threads simultaneously. The GPU 140 may have relatively high parallel processing capability, which may benefit operations on large data sets such as those produced by large-scale statistical analyses. Certain processing cores within the GPU 140 may have relatively high floating-point computational capabilities, which may be appropriate in large-scale statistical analysis. Other processing cores may have relatively low floating-point computation abilities and may be used only for processing graphics data. For example, algebraic operations performed on matrices (e.g., matrix multiplication, transposition, addition, etc.) may be conducive to a parallel processing architecture and floating-point computational power provided by the GPU 140.
In some implementations, the user-defined data 135 may include instructions for dividing a data structure into multiple sections and storing these sections as data elements in a table or array. Such a table is described in more detail with respect to
As shown in
In some instances, the database engine 210 may be implemented using PostgreSQL, which provides for an open source object-relational database management system (ORDBMS). PostgreSQL may provide a framework for developers to extend the ORDBMS through the use of various user-defined definitions. For example, User-Defined Types (UDTs) may enable developers to create unique data structures within PostgreSQL. Similarly, User-Defined Functions (UDFs) may enable the creation of functions that operate on the UDTs. User-Defined Aggregates (UDAs) may be a type of UDF that performs a calculation on a set of values and returns a single value. Thus, rather than creating an entirely new programming language to manage the numerous data in large-scale data analyses, an existing database framework such as PostgreSQL can simply be extended to provide the desired functionality through the use of UDTs, UDFs, and UDAs.
For example, a UDT data structure may be created for storing a matrix as a collection of sub-matrices rather than a collection of individual data elements in the matrix. Various UDFs and UDAs may be created that can operate on the above created UDT data structure. For example, a developer can create a UDF that performs matrix multiplication on the UDT data structure, i.e., at the sub-matrix granularity instead of at a data element granularity. This level of abstraction may enable reduced input/output (I/O) operations in the database system 200 when compared to functions that operate on an element by element basis.
In some implementations, the GPU libraries 250 may be according to the Compute Unified Device Architecture (CUDA), Open Computing Language (OpenCL), or a combination thereof. OpenCL may provide a standard for writing programs that can be executed across heterogeneous platforms including CPUs, GPUs, and other types of processors. Thus, a program written under OpenCL may generate instructions that can be executed by both the processor 110 and the GPU 140. CUDA may be a parallel computing architecture developed by NVIDIA Corp. to specifically manage NVIDIA GPUs. Using CUDA, developers may use the ‘C’ programming language to call functions in the CUDA library to execute instructions on an NVIDIA GPU. Thus, in some examples, the GPU 140 may be an NVIDIA GPU that is associated with CUDA libraries.
After dividing Matrix A 310 into these four sections, Matrix A can then be represented by Matrix A′ 360, which may include each section 320-350 or sub-matrix as data elements. Matrix A′ 360 can then be stored into an array, such as Table A 370, which can be recognized by a computer or other processing device. In some instances, Table A 350 may be defined using a UDT in PostgreSQL to specifically store Matrix A 310 as a collection of its sections 320-250, rather than a collection of its individual elements, in Table A 350.
Furthermore, in some implementations, Matrix A 310 may be stored in a memory (e.g., memory 120 and/or GPU memory 150 in
In column major form, this matrix may be stored in a one-dimensional array as {1, 4, 2, 5, 3, 6}. Moreover, storing data in column major form may be suitable to facilitate certain GPU calculation techniques. However, other storage methods are also possible, such as row-major, Z-order, and the like.
As previously mentioned, certain UDFs and UDAs may also be created to operate on a UDT data structure such as Table A 370. In some implementations, Table A 370 may conceptualize Matrix A 310 into two rows and two columns. Thus, index I 372 of Table A 370 may represent the rows of Matrix A 310 while index J 374 may represent the columns of Matrix A 310. The Value 376 may correspond to the sub-matrix 320-350 represented by each combination of index I 372 and index J 374. For example, sub-matrix P21 340 is the Value 376 corresponding to when index I=2 and index J=1.
For a UDT data structure, section-oriented aggregation operators may be created to function similarly to certain SQL functions such as SUM, COUNT, MIN, and MAX, which traditionally operate at the data element granularity. For instance, a new function such as CHUNK_SUM( )may replace SUM( ) while MATRIX MULTIPLY( )may replace the standard operator * to operate on a UDT data structure on a section-by-section basis. The naming of these new functions are merely examples and any other names are also contemplated. While
In order to increase efficiency in execution, the UDFs/UDAs may invoke GPU libraries 250 to access the GPU 240 in block 430. In particular, the UDFs/UDAs may invoke certain GPU-accelerated primitives, which in turn access GPU libraries 250. For example, a UDF such as MATRIX MULTIPLY( )may be recognizable by the database engine 210 for performing matrix multiplication between two matrices. MATRIX MULTIPLY( )may then call various GPU-accelerated primitives to actually invoke GPU libraries 250 for performing matrix multiplication between sub-matrices of the two matrices. Since the GPU 240 may be capable of a relatively high degree of parallel processing, the GPU 240 may be efficient in executing functions on relatively large amounts of data related to large-scale statistical analyses, which can include matrix multiplication and other mathematical tasks.
Then, in block 440, the GPU 240 may execute the GPU libraries 250 invoked by the particular UDFs/UDAs. For example, data may be copied from a main memory of the database engine 210 (e.g. memory 120) into GPU memory (e.g., GPU memory 150). A processor (e.g., processor 110) in the database engine 210 may then instruct the GPU 240 to process the data by executing these GPU libraries 250. Subsequently, the GPU 240 may then return the results of the execution from GPU memory 150 to main memory 120 in the database engine 210. Finally, in block 450, the database engine 250 may return the results to a user in response to the query received in block 410.
In block 520, the method 500 may generate instructions to execute a function on the data structure on a section-by-section basis. This may be in contrast executing the function on an element by element basis. In some examples, where the data structure may be matrix, the function may be an algebraic operation, such as matrix multiplication, transposition, etc. Thus, instead of iterating through each element of the matrix, the function may iterate through on a section-by-section basis, thereby increasing input/output efficiency and performance.
In block 530, the instructions from the function may be executed on a graphics processing unit (GPU). In some implementations, the GPU may be a GPGPU capable of executing instructions normally executed by a CPU.
Instructions of modules described above (including modules for performing tasks of
Data and instructions are stored in respective storage devices, which are implemented as one or more computer-readable or machine-readable storage media. The storage media include different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; optical media such as compact disks (CDs) or digital video disks (DVDs); or other types of storage devices. Note that the instructions discussed above can be provided on one computer-readable or machine-readable storage medium, or alternatively, can be provided on multiple computer-readable or machine-readable storage media distributed in a large system having possibly plural nodes. Such computer-readable or machine-readable storage medium or media is (are) considered to be part of an article (or article of manufacture). An article or article of manufacture can refer to any manufactured single component or multiple components. The storage medium or media can be located either in the machine running the machine-readable instructions, or located at a remote site from which machine-readable instructions can be downloaded over a network for execution.
In the foregoing description, numerous details are set forth to provide an understanding of the subject disclosed herein. However, implementations may be practiced without some or all of these details. Other implementations may include modifications and variations from the details discussed above. It is intended that the appended claims cover such modifications and variations.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN2012/074509 | 4/23/2012 | WO | 00 | 10/23/2014 |