Claims
- 1. A data structure for access by a computer programs, said data structure embodied in a computer readable medium, comprisinga table for use in a database, said table comprising a plurality of rows and columns said table storing information imported into said table from a source external to the database; a data lineage data type associated with each of said plurality of rows, said data lineage data type storing data indicative of the external source of said row.
- 2. The data structure as recited in claim 1 wherein said data type is selected to have a size sufficient store a universally unique identifier.
- 3. The data structure as recited in claim 2 wherein said data type comprises about 16 bytes of storage.
- 4. The data structure as recited in claim 1 wherein said data type comprises an integer number.
- 5. The data structure as recited in claim 1 wherein said database comprises a relational databases.
- 6. The data structure as recited in claim 1 wherein said data type comprises a pointer to a storage location of lineage information.
- 7. A method for tagging data in a relational database system, comprising:providing a table of data organized in rows and columns; importing data into said table from an external data source; and providing a lineage transform that attaches an identifier to substantially every row in the table.
- 8. The method as recited in claim 7 wherein said identifier is an identifier intended to uniquely identify a set of rows being moved into the table from a common source.
- 9. The method as recited in claim 8 wherein said identifier comprises a four-byte identifier.
- 10. The method as recited in claim 9 wherein said four-byte identifier is derived as a compression function of a system generated unique identifier having a length greater than four bytes.
- 11. The method as recited in claim 10 wherein the compression function comprises a cyclical redundancy check algorithm.
- 12. A computer-readable medium bearing computer-readable instructions for carrying out the steps recited in claim 7.
- 13. A computer-readable medium bearing a data structure accessible by a database management program for providing data lineage to data in a database, comprising:a table comprising rows and columns said table storing information imported into said table from a source external to the database; an identifier bound to each row by said database management program for identifying rows moved into the table from a common external data source.
- 14. The data structure as recited in claim 13 wherein the identifier comprises a four-byte integer value.
- 15. The data structure as recited in claim 13, wherein the identifier comprises a sixteen-byte GUID.
- 16. The data structure as recited in claim 13 wherein the identifier comprises a four-byte integer value representing a compressed GUID.
- 17. The data structure as recited in claim 16 wherein the compressed GUID is based on a cyclical redundancy check value derived from a sixteen-byte GUID.
CROSS REFERENCE TO RELATED APPLICATIONS
This application is related by subject matter to the inventions disclosed in commonly assigned pending U.S. patent application Ser. No. 09/212,218, filed on Dec. 16, 1998, entitled “DATA LINEAGE” and pending U.S. patent application Ser. No. 09/213,069, filed on Dec. 16, 1998, entitled “DATA PUMP FOR DATABASE.”
US Referenced Citations (6)
Non-Patent Literature Citations (2)
Entry |
Shek, et al. (IEEE publication, Mar. 1999) discloses exploiting data lineage fro parallel optimization in external DBMSs in Data Engineering, 1999 proc. p. 256.* |
Marathe, Ap. et al. (IEEE publication, 2001) discloses tracing lineage of array data in Scientific and Statistical Database Management, 2001, pp. 69-78. |