This application claims the benefit of Korean Patent Application No. 10-2014-0054932, filed on May 8, 2014, and Korean Patent Application No. 10-2014-0147620, filed on Oct. 28, 2014, in the Korean Intellectual Property Office, the disclosures of which are incorporated herein in their entireties by reference.
1. Field
One or more exemplary embodiments relate to a database management system (DBMS), and more particularly, to a hybrid storage apparatus capable of storing data based on columns while maintaining a row-based data structure.
2. Description of the Related Art
In general, user queries sent to a database management system (DBMS) request access to data values of several columns of a row rather than access to data values of all columns of the row. However, existing N-array storage models (NSMs) store data in row units and are thus not suitable for processing such user queries.
To solve this problem, storage products for selectively using a storage model only for on-line analytical processing (OLAP) employing column-based storage and a storage model only for on-line transaction processing (OLTP) employing row-based storage have recently been introduced. In the storage products, both column-based storage and row-based storage should be implemented. Furthermore, in the case of the storage products, column-based storage or row-based storage should be selected within one table. Thus, the efficiency of the storage products is low when both a column-based query and a row-based query are input to one table.
One or more exemplary embodiments include a hybrid storage model manufactured by additionally including a column-based storage model into a general-purpose database management system (DBMS) so that a user may use the hybrid storage model both in a column-based on-line analytical processing (OLAP) environment and a row-based on-line transaction processing (OLTP) environment.
Additional aspects will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments.
According to one or more exemplary embodiments, a hybrid storage apparatus includes a table generator for generating a table; a column group generator for generating a column group by collecting at least one column among one or more columns included in the table; and a segment allocation unit for allocating a base segment to the table and a group segment to the column group which includes at least one column of the table, wherein the base segment includes group-segment link information regarding the group segment.
When a plurality of the column groups are present, the base segment may include group-segment link information regarding group segments that are respectively allocated to the plurality of column groups.
A group page into which a value of the at least one column belonging to the column group is to be inserted using the group segment may be allocated to the column group.
A base page may be allocated to the table by using the base segment, and information regarding records of the table may be stored in the base page.
According to one or more exemplary embodiments, a method of storing data in a hybrid storage apparatus based on columns while maintaining a row-based data structure includes generating a table by using a table generator; generating a column group by collecting at least one column among one or more columns forming the table by using a column group generator; and allocating a base segment to the table and a group segment to the column group which includes at least one column of the table by using a segment allocation unit, wherein the base segment includes group-segment link information regarding the group segment.
These and/or other aspects will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings in which:
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout. In this regard, the present embodiments may have different forms and should not be construed as being limited to the descriptions set forth herein. Accordingly, the exemplary embodiments are merely described below, by referring to the figures, to explain aspects of the present description. As used herein, expressions such as “at least one of,” when preceding a list of elements, modify the entire list of elements and do not modify the individual elements of the list.
The hybrid storage apparatus 100 includes a table generator 110, a column group generator 120, and a segment allocation unit 130.
The table generator 110 generates a table. In the table, data is stored based on columns and rows.
The column group generator 120 generates a column group by collecting at least one among one or more columns forming the table generated by the table generator 110. In this case, the column group generator 120 may support an interface via which a user may select at least one column among the one or more columns.
The segment allocation unit 130 allocates a base segment to the table generated by the table generator 110, and allocates a group segment to a column group related to the table. That is, the group segment is allocated to the column group which includes the columns of the table. In this case, the base segment includes group-segment link information regarding the allocated group segment.
According to an exemplary embodiment, the following table generation syntax may be input to the table generator 110 via a user interface.
When the table is generated using the above syntax, the column group generator 120 generates a G1 column group with a C1 column and a C2 column, and a G2 column group with a C4 column. A column C3 is generated as a general column.
Data of columns belonging to a column group is stored in a group page, and data of columns that do not belong to the column group is stored in a base page.
In detail, the G1 column group includes a plurality of columns. For example, the G1 column group may include the C1 column and the C2 column. Thus, a group page 320 of
In the case of the G2 column group, since only the C4 column constitutes the G2 column group, values of the C4 column are continuously listed and stored in a group page 330 of
The segment allocation unit 130 allocates a base segment and a group segment to a table generated by the table generator 110. Referring to
Here, the segment should be understood as a table generated by a user. When data is to be inserted into, deleted from, or updated in a table, a segment descriptor 200 is detected, a page indicated in extent descriptors 201 pointed to by the segment descriptor 200 are detected, and data stored in the page is accessed.
According to an exemplary embodiment, pages indicated in extent descriptors pointed to by the base segment 210 and the group segments 220 and 230 are detected and data stored in the pages is accessed.
For example, referring to
Group pages 320 and 330 indicated in extent descriptors 221 and 231 pointed to by the G1 and G2 column groups are allocated to the G1 and G2 column groups, respectively.
Referring to
In this case, the information of the records of the table T1 further includes an RID identifying a page number of a group page in which the record of the column group is stored, and offset information of the record of the column group. Referring to
A column value of at least one column (e.g., the C1, C2, and C4 columns) belonging to a column group (e.g., the G1 and G2 column groups) may be inserted into the group pages 320 and 330 by using the group segments 220 and 230.
Column values of the C1 column and the C2 column are inserted into the group page 320 pointed to by a group segment (T1, G1) 220. A column value of the C4 column is inserted into the group page 330 pointed to by a group segment (T1, G2) 230.
A process of performing an ‘insert’ operation, an ‘update’ operation, a ‘delete’ operation, and a ‘select’ operation based on a segment structure as illustrated in
Insert Operation
When a query “Insert into T1 Values(1, ‘AAA’, ‘hello’, ‘BB’)” is input, the ‘insert’ operation is performed as follows.
Pages into which columns are to be inserted are allocated using group segments 220 and 230. The G1 and G2 column groups are respectively allocated to the pages, a (1, ‘AAA’) record is recorded in the page to which the G1 column group is allocated, an RID representing the location of the (1, ‘AAA’) record is made and memorized, a (‘BB’) record is recorded in the page to which the G2 column group is allocated, and an RID representing the location of the (‘BB’) record is made and memorized. Then, a space into which a record may be inserted based on the base segment 210 of
In detail, in the base page 310 of
The values of the C1 column and the C2 column of the G1 column group are recorded in the group page 320 pointed to by the group segment (T1, G1) 220 of
Similarly, “BB” is recorded in the group page 330 pointed by the group segment (T1, G2) 230 of
According to another exemplary embodiment, a process of performing the ‘update’ operation by using a segment structure as illustrated in
(1) An Example of a Process of Performing the ‘Update’ Operation by Using a Single Column Group
In this case, the C1 column and the C2 column form the G1 column group together and are thus stored in one group page. Thus, updating may be performed by accessing only the group segment(T1.G1) 220 of
Referring to
(2) An Example of a Process of Performing the ‘Update’ Operation by Using a Plurality of Column Groups
In this case, a page in which the C4 column is stored is accessed using the RID(G2) 313 illustrated in
According to another exemplary embodiment, a process of performing the ‘delete’ operation by using a segment structure as illustrated in
According to another exemplary embodiment, a process of performing the ‘select’ operation by using a segment structure as illustrated in
(1) An Example of a Process of Performing the ‘Select’ Operation by Using a Single Column Group
In this case, a C1 column and a C2 column of a record satisfying a condition may be accessed directly by accessing only the group segment (T1, G1) 220 of
(2) An Example of a Process of Performing the ‘Select’ Operation by Using a Plurality of Column Groups
As described above, a query requesting to access all records that are mainly used in an OLTP environment is returned by forming a row by accessing the base page 310 of
According to an exemplary embodiment, a hybrid storage apparatus may be embodied such that only a most significant RID of the base page 310 of
If it is assumed that the G2 column group of the table T1 is indexed, a B-tree index may be configured as illustrated in
Referring to
For example, in the page 530 pointed to by the group segment (T1, G2), an index “BB (3,1)” indicates a first record of a third page of a base segment.
In this case, since all records of a table may be retrieved using a specific column, the index structure shows high performance for even the following OLTP query.
Since storing is performed in a group page in units of column groups, data may be compressed using dictionary or difference-based compression.
An RID of a group column record is stored in a base page. In this case, the RID of the group column record is stored in the form of <page number, offset>. However, the same page number is likely to be repeatedly used in RIDs of records since some group column records are stored in a group page. In this case, data may be compressed using dictionary or difference-based compression.
An offset may be processed similarly. For example, a base offset may be set as a reference value and the difference between the base offset and a target value may be stored, thereby reducing a storage space.
As described above, according to the one or more of the above exemplary embodiments, data may be stored in a hybrid storage apparatus based on columns while maintaining a row-based data structure.
Also, the architecture of an existing DBMS employing an N-array storage model (NSM) may be used. Also, the advantages of a column-based DBMS may be achieved. According to an exemplary embodiment, a hybrid storage apparatus has a structure in which columns are gathered and the advantages of a partition attribute across (PAX) model may be also achieved. That is, a cache miss may decrease.
According to an exemplary embodiment, a column-based approach may be performed on hybrid storage without a join operation which is needed in a column-based storage. Also, since a function of selecting a user's desired column group is provided, a storage structure may be controlled by the user. Thus, a storage structure optimum for a user's desired OLTP and OLAP may be provided.
Also, according to an exemplary embodiment, in a hybrid storage apparatus, an RID is used to easily access data in units of records.
A hybrid storage apparatus and a method of storing data in the hybrid storage apparatus based on columns while maintaining a row-based data structure may be embodied as program instructions that can be executed by various computing means and recorded on a computer-readable recording medium. The computer-readable recording medium may store program instructions, data files, data structures, etc. solely or in combination. The program instructions recorded on the computer-readable recording medium may be specially designed and configured for the inventive concept or may be well-known to those of ordinary skill in the field of computer software.
Examples of the computer-readable recording medium include a magnetic medium (such as a hard disc, a floppy disk, and a magnetic tape), an optical medium (such as a compact disc (CD)-read-only memory (ROM) and a digital versatile memory (DVD)), a magneto-optical medium (such as a floptical disk), and a hardware device specially configured to store and execute program instructions (such as a ROM, a random access memory (RAM), and a flash memory).
The program instructions include not only machine language codes prepared by a compiler but also high-level codes executable by a computer by using an interpreter. The hardware device may be configured to operate as at least one module to perform operations according to the inventive concept, or vice versa.
It should be understood that the exemplary embodiments described therein should be considered in a descriptive sense only and not for purposes of limitation. Descriptions of features or aspects within each embodiment should typically be considered as available for other similar features or aspects in other embodiments.
While one or more exemplary embodiments have been described with reference to the figures, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the inventive concept as defined by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2014-0054932 | May 2014 | KR | national |