The present application is related to the following concurrently filed U.S. patent applications that are commonly owned and that have the same inventors: (1) U.S. patent application Ser. No. 11/454,224, entitled “Apparatus and Method for Processing of COBOL Nested Data Record Schemas,” filed on Jun. 16, 2006, now abandoned; and (2) U.S. patent application Ser. No. 11/454,254, entitled “Apparatus and Method for Processing Data Corresponding to Multiple COBOL Data Record Schemas,” filed on Jun. 16, 2006, issued as U.S. Pat. No. 7,640,261.
The present invention relates generally to data processing. More particularly, this invention relates to COBOL data integration and processing based on a standardized data record schema derived from a COBOL data record schema.
The Common Business Oriented Language (COBOL) has been widely used in business computing since the 1960's. The advantages of COBOL include its maintainability and its portability across hardware platforms and operating systems. However, there is no adequate data processing system available that can flexibly process COBOL data files generated by COBOL applications sold by the many different COBOL application vendors. There are two major difficulties that have hindered the development of such a data processing system. First, the format of a COBOL data file is in part defined by the COBOL application that generated the COBOL data file. Conventional data processing systems typically need to be modified to handle data files generated by each new COBOL application based on knowledge of the COBOL data file format, which may require an understanding of the source code of the COBOL application. Second, even with an understanding of the COBOL application source code, additional understanding of the physical environment that generated the COBOL data file may be needed to read the file. For example, this understanding includes characteristics of the system generating the COBOL data files such as endianness, character encodings, and localizations. This adds further complications to the modification of conventional data processing systems to read COBOL data files.
There are other challenges in the processing of COBOL data files. COBOL data files sometimes contain interspersed data of different types, such as employee data and customer data, where each type of data is defined by a distinct COBOL data record schema within COBOL application source code. Conventional data processing systems typically cannot process such data files, since those systems assume that all data within such data files is of the same type. COBOL data files also sometimes contain nested data. Conventional data processing systems often are not able to process nested data, and also do not place data read from such data files in a flat data structure that enables handling by modern database management systems.
To address these shortcomings, it would be desirable to provide a COBOL data processing system that handles COBOL data files created by multiple vendors. It would also be desirable for the COBOL data processing system to handle COBOL data files generated by unfamiliar COBOL applications created by unfamiliar vendors based on user input, without requiring modification of the COBOL data processing system itself. This solution may enable users unfamiliar with COBOL to process COBOL data files created by multiple vendors. It would also be desirable for this solution to be capable of extracting subsets of data from COBOL data files with interspersed data of different types based on definitions of these data types in multiple COBOL data record schemas. Finally, it would be desirable for this solution to read nested data in COBOL data files, and to store this nested data in a flattened form that enables handling by database management systems, such as in accordance with a nested COBOL data record schema corresponding to this nested data.
This invention includes a computer readable medium to direct a computer to function in a specified manner. In one embodiment, a computer-readable medium comprises instructions to receive a description of a COBOL copybook represented in one of a plurality of disparate formats, where the description of the COBOL copybook includes information about the disparate format of the COBOL copybook; to parse the COBOL copybook based on the description of the COBOL copybook; and to create a standardized data record schema based on the COBOL copybook. The computer-readable medium may further comprise instructions to receive a parameterized location of a data record corresponding to the COBOL copybook and information about the data record format, and to process the data record based on the standardized data record schema.
In another embodiment, the computer-readable medium comprises instructions to receive a description of a data record schema represented in one of a plurality of disparate formats, where the description of the data record schema includes information about the disparate format of the data record schema; to parse the data record schema based on the description of the data record schema; and to create a standardized data record schema based on the data record schema.
For a better understanding of the nature and objects of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings, in which:
Though in this example the COBOL data record schema 122 defines the fields of the data record 108, additional information about the format of the COBOL data record schema 122 may be needed to read the COBOL data record schema 122. This format information may be provided by the user to the data integration and processing system 100, and may include characteristics that vary across computer architectures and/or applications, including but not limited to endianness, character encodings such as American Standard Code for Information Interchange (ASCII) and Extended Binary Coded Decimal Interchange Code (EBCDIC), and localizations. These characteristics may be computer system-specific, and may be application-specific. With the information in the COBOL data record schema description 120, the data integration and processing system 100 can process COBOL data record schemas 122 in disparate formats, and without users having knowledge of COBOL.
The data integration and processing system 100 applies the information in the COBOL data record schema descriptions 120 to the COBOL data record schemas 122 to transform (block 102) the COBOL data record schemas 122 into standardized data record schemas 104. The standardized data record schema 104 is a representation of the COBOL data record schema 122 that can be commonly processed and used for processing by the data integration and processing system 100. The creation of the standardized data record schema 104 may be achieved by converting variations in data record schema format and localizations across the COBOL data record schemas 122 into a common syntax and/or encoding substantially independent of those variations. The standardized data record schema 104 may be written in a human-readable form, such as in a script language such as Advanced Transformation Language (ATL). The standardized data record schema 104 may also contain format information that varies across systems running COBOL applications for COBOL data files containing one or more data records 108. This format information may include but is not limited to endianness, character encodings, localizations, record format (such as fixed length or variable length), whether variable length records contain record length information, and record length for fixed length records. This format information may be system-specific, may be application-specific, and may also be a user input.
The standardized data record schemas 104 are then used by the data integration and processing system to process (block 106) data records 108. The use of standardized data record schemas 104 that may contain format information may enable the data integration and processing system 100 to process data records 108 generated by unfamiliar COBOL applications created by unfamiliar vendors based on user input, without requiring modification of the data integration and processing system 100 itself. Format information for COBOL data files containing one or more data records 108 may be provided as a separate input to block 106 if that information is not already included in the standardized data record schemas 104. As part of processing (block 106) data records 108, an association may be made between a particular standardized data record schema 104N and a data record 108N, so that processing (block 106) applies that particular standardized data schema 104N to the data record 108N. The result of block 106 may include, but is not limited to, one or more database tables 110A-110N containing at least some portion of the contents of the data record 108, and one or more exported files 112A-112N in a format such as Extensible Markup Language (XML) and containing at least some portion of the contents of the data record 108. The database tables 110 based on data records 108 may also be jointly processed using a data manipulation language that may take the form of joins and queries in the Structured Query Language (SQL).
The standardized data record schema 104 is also a flat data structure that enables handling by modern database management systems. Database management systems do not allow database table fields to be nested within other fields. A COBOL data record schema 122A (see
The data integration and processing system 100 also may handle COBOL copybooks containing more than one COBOL data record schema 122, and may be capable of extracting subsets of data from COBOL data files with interspersed data records 108 of different types. Each of multiple COBOL data record schemas 122 contained in a COBOL copybook is transformed into a corresponding standardized data record schema 104. As part of processing (block 106), a particular standardized data record schema 104N may be designated as a selected standardized data record schema for use in processing a COBOL data file containing data records 108 of multiple distinct types, where each type corresponds to an individual standardized data record schema 104. The result of block 106 may then include, but is not limited to, a database table 110 containing at least some portion of the contents of the data records 108 of the type corresponding to the selected standardized data record schema 104N, and an exported file 112 in a format such as Extensible Markup Language (XML) and containing at least some portion of the contents of the data records 108 of the type corresponding to the selected standardized data record schema 104N.
The data integration and processing system 100 may reside on the same computer with one or more clients 201 and one or more data sources 202, or may reside on a separate computer. The data integration and processing system 100 includes standard components, such as a network connection 214, a CPU 206, and an input/output module 208, which communicate over a bus 212. A memory 210 is also connected to the bus 212. The memory 210 stores a set of executable programs that are used to implement the functions of the invention. The clients 201 and the data sources 202 may include the same standard components.
In an embodiment of the invention, the memory 210 stores executable instructions establishing a client interface module 214, the COBOL data integrator and processor 220, a database management module 234, a data format converter 236, a data store 238, and a data source interface module 240. The client interface module 214 has modules including a graphical user interface 216 and a data flow creator 218. The COBOL data integrator and processor 220 has modules including a data record schema description receiver 222, a data record schema parser 224, a standardized data record schema creator 226, a data record description receiver 228, a data record schema selector 230, and a data record processor 232.
The data record schema parser 224 then parses the COBOL data record schema 122 (block 302). The data record schema parser 224 may use portions of the COBOL data record schema description 120 from the data record schema description receiver 222, such as an identification of the COBOL data record schema 122, to access data sources 202 via the data source interface 240 to obtain the COBOL data record schema 122. The COBOL data record schema 122 may be contained in a COBOL copybook file. The parsing (block 302) may include reading a COBOL statement, such as a line of COBOL code, from the COBOL data record schema 122, and processing the COBOL statement upon determining that the COBOL statement is a COBOL data definition. A COBOL data definition may be a COBOL statement that contains a level number that is an integer from 1 to 49 that is not inside a comment. The data record schema parser 224 may also use format information that may be provided with the COBOL data record schema description 120 to transcode each COBOL statement so that the COBOL statement can be read. This transcoding, for example, may be from EBCDIC to ASCII character encoding.
The parsing (block 302) may include a check that the COBOL data record schema 122 is a nested data record schema. One way of identifying a nested data record schema is through the use of an “OCCURS” clause in a COBOL statement 406 (see
The parsing (block 302) may also include a check for multiple COBOL data record schemas 122 in a COBOL copybook file. These different COBOL data record schemas 122 in the COBOL copybook file are represented by multiple level “01” data entries (see COBOL statements 1201 of
After parsing (block 302), the standardized data record schema creator 226 uses the information obtained by the data record schema parser 224 to create the standardized data record schema 104 (block 304). The standardized data record schema 104 may be stored in the data store 238. The creating (block 304) may include generating a hierarchical structure of field definitions for the standardized data record schema 104 corresponding to the COBOL data record schema 122, and attaching information extracted from a COBOL data definition contained in the COBOL data record schema 122 to a corresponding field definition contained in the standardized data record schema 104. Generating a hierarchical structure of field definitions for the standardized data record schema 104 may include converting a COBOL field definition type from the COBOL data record schema 122, such as “pic x(3)” in the data definition “05 OrderID pic x(3)” (see COBOL statement 404 of
The creating (block 304) may also include an option to select, for a nested COBOL data record schema 122, whether the standardized data record schema 104 is expanded or collapsed. An expanded data record schema contains a single row for each data record 108, and one column per repetition of nested items in the data record 108. For example, “ItemID” in COBOL statement 408 (see
A collapsed data record schema contains one column per nested item, with the number of rows dependent on both the number of data records 108 and the number of repetitions of nested items in the data record 108. For example, there is one column in the display 908 (see
The creating (block 304) may also include an option to specify a data record key for the standardized data record schema 104 when there are multiple data record schemas 122 in a COBOL copybook file. For example, the standardized field definition 1402 labeled “KEY” (see
In one embodiment, the data record description receiver 228 then receives a description of the data record 108 (block 306). The description of the data record 108 may be associated with the standardized data record schema 104 after creation (block 304), such as during processing of the data record 108 (block 312). In another embodiment, the description of the data record 108 may be associated with the COBOL data record schema 122 prior to creation of the standardized data record schema 104 (block 304), such as before parsing of the COBOL data record schema 122 (block 302). The description of a data record 108 may contain a parameterized location of the data record 108 corresponding to the data record schema 122, and may contain format information for a COBOL data file containing the data record 108.
In one embodiment, the data record processor 232 then receives a request to process the data record 108 (block 308). The client 201 may input this request via the graphical user interface 218 by selecting a first image 902 representing an association of the standardized data record schema 104 and the data record 108 (see
The data record processor 232 then processes the data record 108 based on the standardized data record schema 104 (block 310). The data record processor 232 may read a sliding buffer of data from a data file containing one or more data records 108, and may take into account format information for the data record 108 for purposes such as data transcoding. The data record processor 232 may process fields in the data record 108 taking into account COBOL attributes, such as “COMP”, that may have vendor-specific meanings. The data record processor 232 may process a variable-length data record 108 or fields within the data record 108 of variable length, and may perform this processing based on a record length field.
The result of the processing (block 310) may be displayed to the user via the graphical user interface 216, and may be exported to an output file such as an XML file. The data format converter 236 may operate on the result to generate the output file. A database table may be created by the database management module 234 to store the result. The database management module may execute a data manipulation language statement dependent on the database table. The data manipulation language statement may be in a query language such as SQL, and may be provided by the client 201 via the graphical user interface 216. The output file and the database table resulting from the processing may be stored in the data store 238, and may be transmitted to clients 201 via the client interface 214.
In one embodiment of processing (block 310), the data record schema selector 230 may designate a standardized data record schema 104 as a selected standardized data record schema 104N for use in processing a COBOL data file containing data records 108 of multiple distinct types, where each type corresponds to an individual standardized data record schema 104. The processing may apply the data record key that may be specified as part of creating the standardized data record schema 104 (block 304). The result of the processing may then include at least some portion of the contents of the data records 108 of the type corresponding to the selected standardized data record schema 104N.
Certain embodiments of the invention relate to a computer storage product with a computer-readable medium including data structures and computer code for performing a set of computer-implemented operations. The medium and computer code can be those specially designed and constructed for the purposes of the invention, or they can be of the kind well known and available to those having ordinary skill in the computer software arts. Examples of computer-readable media include: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as Compact Disc-Read Only Memories (“CD-ROMs”) and holographic devices; magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and execute computer code, such as Application-Specific Integrated Circuits (“ASICs”), Programmable Logic Devices (“PLDs”), Read Only Memory (“ROM”) devices, and Random Access Memory (“RAM”) devices. Examples of computer code include machine code, such as produced by a compiler, and files including higher-level code that are executed by a computer using an interpreter. For example, an embodiment of the invention can be implemented using Java, C++, or other object-oriented programming language and development tools. Additional examples of computer code include encrypted code and compressed code. Another embodiment of the invention can be implemented in hardwired circuitry in place of, or in combination with, computer code.
From the foregoing, it can be seen that an apparatus and method for processing COBOL data record schemas having disparate formats are described. The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. It will be appreciated, however, that embodiments of the invention can be in other specific forms without departing from the spirit or essential characteristics thereof. The described embodiments are not intended to be exhaustive or to limit the invention to the precise forms disclosed; obviously, many modifications and variations are possible in view of the above teachings. The presently disclosed embodiments are, therefore, considered in all respects to be illustrative and not restrictive. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications; they thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the following claims and their equivalents define the scope of the invention.
Number | Name | Date | Kind |
---|---|---|---|
4567574 | Saade et al. | Jan 1986 | A |
5230049 | Chang et al. | Jul 1993 | A |
5428792 | Conner et al. | Jun 1995 | A |
5432930 | Song | Jul 1995 | A |
5640550 | Coker | Jun 1997 | A |
5742827 | Ohkubo et al. | Apr 1998 | A |
5778232 | Caldwell et al. | Jul 1998 | A |
5826076 | Bradley et al. | Oct 1998 | A |
5838965 | Kavanagh et al. | Nov 1998 | A |
5878422 | Roth et al. | Mar 1999 | A |
6209124 | Vermeire et al. | Mar 2001 | B1 |
6237140 | Carter et al. | May 2001 | B1 |
6356285 | Burkwald et al. | Mar 2002 | B1 |
6453464 | Sullivan | Sep 2002 | B1 |
6523172 | Martinez-Guerra et al. | Feb 2003 | B1 |
6687873 | Ballantyne et al. | Feb 2004 | B1 |
6704747 | Fong | Mar 2004 | B1 |
6775680 | Ehrman et al. | Aug 2004 | B2 |
6782540 | Chow et al. | Aug 2004 | B1 |
6820135 | Dingman et al. | Nov 2004 | B1 |
6836777 | Holle | Dec 2004 | B2 |
6901403 | Bata et al. | May 2005 | B1 |
6904598 | Abileah et al. | Jun 2005 | B2 |
6920461 | Hejlsberg et al. | Jul 2005 | B2 |
6959300 | Caldwell et al. | Oct 2005 | B1 |
6961721 | Chaudhuri et al. | Nov 2005 | B2 |
6980995 | Charlet et al. | Dec 2005 | B2 |
7016906 | Janzig et al. | Mar 2006 | B1 |
7020661 | Cruanes et al. | Mar 2006 | B1 |
7111284 | Takagi et al. | Sep 2006 | B2 |
7194479 | Packham | Mar 2007 | B1 |
7472137 | Edelstein et al. | Dec 2008 | B2 |
7584422 | Ben-Yehuda et al. | Sep 2009 | B2 |
7640261 | Belyy et al. | Dec 2009 | B2 |
7681118 | Dasari et al. | Mar 2010 | B1 |
7707561 | Vera | Apr 2010 | B2 |
7730011 | Deninger et al. | Jun 2010 | B1 |
7730471 | Cauvin et al. | Jun 2010 | B2 |
7761406 | Harken | Jul 2010 | B2 |
7970729 | Cozzi | Jun 2011 | B2 |
8121976 | Kalia et al. | Feb 2012 | B2 |
8255794 | Dasari et al. | Aug 2012 | B2 |
8548938 | Amaru et al. | Oct 2013 | B2 |
20010018684 | Mild et al. | Aug 2001 | A1 |
20010025372 | Vermeire et al. | Sep 2001 | A1 |
20010047365 | Yonaitis | Nov 2001 | A1 |
20010047372 | Gorelik et al. | Nov 2001 | A1 |
20020038335 | Dong et al. | Mar 2002 | A1 |
20020038336 | Abileah et al. | Mar 2002 | A1 |
20020042849 | Ho et al. | Apr 2002 | A1 |
20020046294 | Brodsky et al. | Apr 2002 | A1 |
20020056012 | Abileah et al. | May 2002 | A1 |
20030005410 | Harless | Jan 2003 | A1 |
20030018660 | Martin et al. | Jan 2003 | A1 |
20030033317 | Ziglin | Feb 2003 | A1 |
20030131109 | Rosensteel et al. | Jul 2003 | A1 |
20030163585 | Elderon et al. | Aug 2003 | A1 |
20040006739 | Mulligan | Jan 2004 | A1 |
20040044678 | Kalia et al. | Mar 2004 | A1 |
20040073565 | Kaufman et al. | Apr 2004 | A1 |
20040111464 | Ho et al. | Jun 2004 | A1 |
20040221292 | Chiang et al. | Nov 2004 | A1 |
20050038629 | Amaru et al. | Feb 2005 | A1 |
20050097118 | Slutz | May 2005 | A1 |
20050097537 | Laura | May 2005 | A1 |
20050097538 | Laura | May 2005 | A1 |
20050097539 | Laura | May 2005 | A1 |
20050097564 | Laura | May 2005 | A1 |
20050125730 | Goddard et al. | Jun 2005 | A1 |
20050192994 | Caldwell et al. | Sep 2005 | A1 |
20050228808 | Mamou et al. | Oct 2005 | A1 |
20050234889 | Fox et al. | Oct 2005 | A1 |
20050235274 | Mamou et al. | Oct 2005 | A1 |
20060031820 | Li | Feb 2006 | A1 |
20060041862 | Moussallam et al. | Feb 2006 | A1 |
20060064666 | Amaru et al. | Mar 2006 | A1 |
20070055678 | Fung et al. | Mar 2007 | A1 |
20070156737 | Barnes | Jul 2007 | A1 |
20070294267 | Belyy et al. | Dec 2007 | A1 |
20070294268 | Belyy et al. | Dec 2007 | A1 |
20090222467 | Kalia et al. | Sep 2009 | A1 |
20100185937 | Dasari et al. | Jul 2010 | A1 |
Number | Date | Country |
---|---|---|
753819 | Jan 1997 | EP |
Entry |
---|
Intel Endianness White Paper, published by Intel, Nov. 15, 2004, pp. 1-22. |
Database Endian Conversion, published by InterSystems Corp., Sep. 15, 1999, pp. 1-3. |
Cobol Copybook Converter User's Guide, Release 5.0.3, published by SEEBEYOND, 2004, p. 1-28. |
Henrard et al., Strategies for Data Reengineering, published by IEEE Computer Society, Proceedings of the Ninth Working Conference on Reverse Engineering (WCRE'02), 2002, pp. 1-10. |
Merten et al., A Data Description Language Approach To File Translation, published in: SIGFIDET '74 Proceedings of the 1974 ACM SIGFIDET, 1974, pp. 191-205. |
Number | Date | Country | |
---|---|---|---|
20070294677 A1 | Dec 2007 | US |