This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2016-198915, filed on Oct. 7, 2016, the entire contents of which are incorporated herein by reference.
The embodiment discussed herein is related to an encoding program and the like.
ETL (Extract/Transform/Load) processing is performed by referring to data existing in a plurality of tables. In referring to a plurality of tables, processing using a dedicated tool and the like is performed. For example, a table is described in a general-purpose file such as csv (Comma-Separated Values) and then compressed by a zip compression format.
According to a conventional technique, an operation for accepting processing target column information, which specifies columns to be processed, and extracting the columns to be processed from a table is performed before an operation for compressing data on the extracted columns is performed. To obtain a certain compression ratio, the zip compression is performed beyond delimiters indicating column separations.
The processing target column information 11 illustrated in
According to the conventional technique, if the extracted data 12 is compressed by zip, the extracted data 12 is compressed without considering the column separations. Compressed data 13 is thereby generated.
The compression is thus performed across the columns. For example, “A00009” and “9” are compressed together.
[Patent Literature 1] Japanese Laid-open Patent Publication No. 2014-191593
[Patent Literature 2] Japanese Laid-open Patent Publication No. 09-204349
[Patent Literature 3] Japanese Laid-open Patent Publication No. 07-220051
[Patent Literature 4] Japanese Laid-open Patent Publication No. 2012-256144
As described in
According to an aspect of an embodiment, a non-transitory computer readable storage medium has stored therein an encoding program that causes a computer to execute a process including: obtaining processing target column information for identifying a plurality of processing target columns to be processed among a plurality of columns included in a table in which the plurality of columns are separated by separation information; encoding the plurality of processing target columns of the table in units of columns by using the processing target column information; and generating an encoded table in which the plurality of encoded processing target columns are connected.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
According to the foregoing conventional technique, there is a problem that column separations are not identifiable unless the encoded information is once converted into an intermediate file.
Preferred embodiments of the present invention will be explained with reference to accompanying drawings. The present invention is not limited by this embodiment.
If the encoding apparatus scans the columns of the table 10 and hits a column to be processed, the encoding apparatus extracts the hit column in units of columns, and encodes the extracted column. The encoding apparatus repeatedly executes the foregoing processing to encode columns to be processed in units of columns, and connects the pieces of encoded information to generate an encoded table 123.
The processing target column information 11 illustrated in
When the encoding apparatus scans the columns in the record of the second row of the table 10, “A00015” and “5” are hit as columns to be processed. The encoding apparatus then encodes “A00015” into “(A00015)” and “5” into “(5)”. When the encoding apparatus scans the columns in the record of the third row of the table 10, “A00003” and “14” are hit as columns to be processed. The encoding apparatus then encodes “A00003” into “(A00003)” and “14” into “(14)”.
When the encoding apparatus scans the columns in the record of the fourth row of the table 10, “A00003” and “9” are hit as columns to be processed. The encoding apparatus then encodes “A00003” into “(A00003)” and “9” into “(9)”. When the encoding apparatus scans the columns in the record of the fifth row of the table 10, “A00015” and “4” are hit as columns to be processed. The encoding apparatus then encodes “A00015” into “(A00015)” and “4” into “(4)”.
The encoding apparatus connects the pieces of encoded information to generate the encoded table 123 illustrated in
As described above, the encoding apparatus generates the encoded table 123 by identifying the columns to be processed from the table 10 in which a plurality of columns are separated by delimiters, and encodes the columns to be processed in units of columns. Even in the encoded state, column separations can thus be identified in units of pieces of encoded data. For example, in the example illustrated in
Next, an example of a configuration of a system according to the present embodiment will be described.
The collection source system 50 is a system that collects information to be processed, such as the table 10 described in
The distribution destination system 60 is a system that receives an encoded table 123 output from the encoding apparatus 100 and performs various types of processing.
The encoding apparatus 100 is an apparatus that identifies columns to be processed from the table 10 in which a plurality of columns are separated by delimiters, and encodes the columns to be processed in units of columns to generate the encoded table 123. For example, the table 10 is included in the processing data transmitted from the collection source system 50. The encoding apparatus 100 distributes the encoded table 123 to the distribution destination system 60.
The communication unit 110 is a processing unit that performs data communication with the collection source system 50 and the distribution destination system 60 via the network. The control unit 130 to be described later exchanges data with the collection source system 50 and the distribution destination system 60 via the communication unit 110. The communication unit 110 corresponds to a communication apparatus.
The storage unit 120 includes a processing procedure definition 121, processing data 122, and the encoded table 123. The storage unit 120 corresponds to a semiconductor memory element such as a RAM (Random Access Memory), a ROM (Read Only Memory), and a flash memory. Alternatively, the storage unit 120 corresponds to a storage device such as a hard disk and an optical disk.
The processing procedure definition 121 is information that defines information about the table to be processed and information about the columns to be processed (processing target column information 11). The processing procedure definition 121 is information including a data structure definition file and a data processing definition file. The processing procedure definition 121 is generated in advance by an administrator and the like.
The data structure definition file is a file in which table information about data to be processed is stored. The table information is specified by item names. For example, if the item names indicated by the table information are “slip ID, serial number, time of sale, product code, shop ID, and number of sales”, the table 10 described in
The data processing definition file is information corresponding to the processing target column information 11 described in
Definition information about input data for data extraction is set in an area 70b. The item names (columns) of the table to be processed are defined by ID names, and used in the processing described in an area 70c.
Definition information about output data (encoded table 123) is set in the area 70c. The item names of the columns to be processed are specified in the area 70c. In the example illustrated in
The processing data 122 is information generated by the collection source system 50.
The encoded table 123 is information generated by a generation unit 133 to be described below.
The control unit 130 includes a collection unit 131, an acquisition unit 132, a generation unit 133, and a distribution unit 134. The control unit 130 corresponds to an integrated device such as an ASIC (Application Specific Integrated Circuit) and a FPGA (Field Programmable Gate Array). Alternatively, the control unit 130 corresponds to an electronic circuit such as a CPU and an MPU (Micro Processing Unit).
The collection unit 131 is a processing unit that collects the processing data 122 from the collection source system 50 illustrated in
The acquisition unit 132 is a processing unit that obtains the processing procedure definition 121 stored in the storage unit 120. The acquisition unit 132 outputs the processing procedure definition 121 to the generation unit 133.
The generation unit 133 is a processing unit that extracts the table to be processed from the processing data 122 on the basis of the processing procedure definition 121, and generates the encoded table 123 from the table.
An example of processing by which the generation unit 133 extracts the table to be processed from the processing data 122 will be described. The generation unit 133 refers to the data structure definition file included in the processing procedure definition 121, and identifies items included in the table to be processed. The generation unit 133 compares the identified items with the items set in the table of the processing data 122, and extracts a table including all the identified items from the processing data 122.
An example of processing by which the generation unit 133 generates the encoded table 123 on the basis of the extracted table will be described. The generation unit 133 refers to the data processing definition file included in the processing procedure definition 121, and encodes the columns to be processed, included in the table 10, in units of columns. The generation unit 133 connects the encoded columns to generate the encoded table 123.
Specific processing of the generation unit 133 corresponds to the processing described in
For example, the items “slip ID, serial number, time of sale, product code, shop ID, and number of sales” and values corresponding to the respective items are set in the table 10. The values of the columns corresponding to the respective items are separated by delimiters. The generation unit 133 scans the columns of the table 10 on the basis of the data processing definition file.
The generation unit 133 scans the columns of the table 10. If a column to be processed is hit, the generation unit 133 extracts the hit column in units of columns, and encodes the extracted column. For example, the generation unit 133 encodes the column on the basis of a conversion rule that associates the information about columns with codes corresponding to information about the columns. The generation unit 133 repeatedly executes the foregoing processing to encode the columns to be processed in units of columns. The generation unit 133 connects the pieces of encoded information to generate the encoded table 123.
The generation unit 133 arranges and connects the encoded columns according to a positional relationship of the columns included in the table 10, whereby the encoded table 123 is generated. For example, in the not-encoded table 10, the product code “A00009” and the number of sales “9” are in the same record of the first row. When generating the encoded table 123, the generation unit 133 then arranges and connects the encoded columns “(A00009)” and “(9)” in the record of the first row.
The distribution unit 134 is a processing unit that transmits the encoded table 123 generated by the generation unit 133 to the distribution destination system 60.
Next, an example of the processing procedure of the encoding apparatus 100 according to the present embodiment will be described.
The generation unit 133 of the encoding apparatus 100 reads a table corresponding to the processing procedure definition 121 from the processing data 122 (step S102). The generation unit 133 reads a column from the table (step S103).
The generation unit 133 determines whether the read column is one to be processed (step S104). If the read column is not one to be processed (step S104, No), the generation unit 133 proceeds to step S107.
On the other hand, if the read column is one to be processed (step S104, Yes), the generation unit 133 encodes the read column in units of columns (step S105). The generation unit 133 writes the encoded information (step S106).
The generation unit 133 determines whether there is another column (step S107). If there is another column (step S107, Yes), the generation unit 133 proceeds to step S103.
On the other hand, if there is no other column (step S107, No), the generation unit 133 connects the pieces of information encoded in units of columns to generate the encoded table 123 (step S108).
Next, an effect of the encoding apparatus 100 according to the present embodiment will be described. The encoding apparatus 100 identifies the columns to be processed from the table 10 in which a plurality of columns are separated by delimiters, and performs encoding in units of columns to generate the encoded table 123. Even in the encoded state, column separations can thus be identified in units of the encoded codes. For example, in the example illustrated in
The encoding apparatus 100 scans the plurality of columns included in the table 10, and when a column to be processed is identified, encodes the identified column in units of columns. As compared to the conventional technique in which all the columns to be encoded are extracted before encoding, the encoding can be performed efficiently since the extraction and encoding can be performed concurrently.
The encoding apparatus 100 arranges the encoded columns according to the positional relationship of the columns included in the table 10, and connects the encoded columns to generate the encoded table 123. If the items of the table 10 and the columns to be processed are known, which column corresponds to which item can thus be determined without decoding the encoded table 123.
Next, an example of a computer that executes an encoding program for implementing functions similar to those of the encoding apparatus 100 described in the foregoing embodiment will be described.
As illustrated in
The hard disk device 207 reads and loads an acquisition program 207a and a generation program 207b into the RAM 206. The acquisition program 207a functions as an acquisition process 206a. The generation program 207b functions as a generation process 206b. For example, the acquisition process 206a corresponds to the acquisition unit 132. The generation process 206b corresponds to the generation unit 133.
The acquisition program 207a and the generation program 207b do not necessarily need to be stored in the hard disk device 207 from the beginning. For examples, the programs are stored in a “portable physical medium” such as a flexible disk (FD), a CD-ROM, a DVD disk, an optical magnetic disk, and an IC card inserted into the computer 200. The computer 200 may read and execute the acquisition program 207a and the generation program 207b.
Encoded information from which column separations can be determined can be generated without generating an intermediate file.
All examples and conditional language recited herein are intended for pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiment of the present invention has been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2016-198915 | Oct 2016 | JP | national |