Claims
- 1. A method of structuring data in a data-mining-ready format, wherein said data has been previously organized in a bit-Sequential (bSQ) format that comprises a plurality of binary files identified by a bit position, said method comprising the steps of:
dividing each of said plurality of binary files into first quadrants; recording the count of 1-bits for each first quadrant on a first level; dividing each of said first quadrants into new quadrants; recording the count of 1-bits for each of said new quadrants on a new level; repeating the two steps immediately above until all of said new quadrants comprise a pure-1 quadrant or a pure-0 quadrant to form a basic tree structure; taking a plurality of pairs of samples in said data; and measuring similarity among said plurality of pairs of samples in said data, wherein similarity among said plurality of pairs of samples in said data is measured using a highest order bit position of inequality.
- 2. A system for structuring data in a data-mining-ready format, wherein said data has been previously organized in a bit-Sequential (bSQ) format that comprises a plurality of binary files identified by a bit position, said system comprising:
a computer system and a set of computer readable instructions, wherein said set of instructions include directing said computer to system to:
divide each of said plurality of binary files into first quadrants; record the count of 1-bits for each first quadrant on a first level; divide each of said first quadrants into new quadrants; record the count of 1-bits for each of said new quadrants on a new level; repeat recursively until all of said new quadrants comprise a pure-1 or pure-0 quadrant to form a basic tree structure; take a plurality of pairs of samples in said data; and measure similarity among said plurality of pairs of samples in said data wherein similarity among said plurality of pairs of samples in said data is measured using a highest order bit position of inequality.
CLAIM TO PRIORITY
[0001] The present application is a Continuation-in-Part application of U.S. patent application Ser. No. 09/957,637, filed Sep. 20, 2001, and entitled “System and Method for Organizing, Compressing and Structuring Data for Data Mining Readiness,” which claims priority to U.S. Provisional Patent Application No. 60/234,050, filed Sep. 20, 2000, and entitled “System and Method for Imagery Organization, Compression, and Data Mining” and to U.S. Provisional Patent Application No. 60/237,778, filed Oct. 4, 2000, and entitled “System and Method for Imagery Organization, Compression, and Data Mining.” The present application additionally claims priority to U.S. Provisional Patent Application No. 60/357,250, filed Feb. 14, 2002, and entitled “System and Method for K-Nearest Neighbor Classification and K-Means Clustering Using Peano Count Trees for Data Mining” and to U.S. Provisional Patent Application No. 60/365,731, filed Mar. 19, 2002, entitled “Biological System and Data Mining for Phylogenomic Expression Profiling.” All of the identified United States utility and provisional patent applications are hereby incorporated by reference.
Provisional Applications (4)
|
Number |
Date |
Country |
|
60234050 |
Sep 2000 |
US |
|
60237778 |
Oct 2000 |
US |
|
60357250 |
Feb 2002 |
US |
|
60365731 |
Mar 2002 |
US |
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
09957637 |
Sep 2001 |
US |
Child |
10367644 |
Feb 2003 |
US |