Claims
- 1. A method of record de-identification for use with a first data source having a plurality of first records having one or more first fields, said first fields having at least one corresponding first value, comprising:
prioritizing said first fields according to a user preference of a user; using a second data source, wherein said second data source comprises a plurality of second records having one or more second fields, said second fields having at least one corresponding second value, comparing said first fields and said corresponding first values of each said first record to said second fields and said corresponding second values of all of said second records; and based on said comparing, extracting said first records and said first corresponding values of the highest priority first fields from said first data source to a third data source, wherein said extracting results in a k-anonymity value for said third data source approximating a pre-defined k-anonymity value.
- 2. The method of claim 1, wherein said pre-defined k-anonymity value is selected by said user.
- 3. The method of claim 1, further comprising modifying said first data source prior to said comparing.
- 4. The method of claim 1, wherein said prioritizing further comprises measuring record uniqueness in said first data source.
- 5. The method of claim 1, further comprising measuring identification risk using said second data source and modifying said prioritizing accordingly.
- 6. The method of claim 5, further comprising displaying the change in said risk as said pre-defined k-anonymity value is varied by said user.
- 7. The method of claim 1, wherein said extracting is performed contemporaneously with said comparing.
- 8. The method of claim 1, wherein said extracting further comprises
copying said first records; changing selected first corresponding values to form a plurality of modified records; and storing said modified records in said third data source.
- 9. The method of claim 8, wherein said changing further comprises deleting one or more of said selected first values in one or more of said first fields and in one or more of said first records.
- 10. The method of claim 8, wherein said changing further comprises encrypting one or more of said selected first values in one or more of said first fields and in one or more of said first records.
- 11. The method of claim 1, wherein one or more of said prioritizing, comparing, and extracting are carried out over a computer network.
- 12. The method of claim 1, further comprising delivering all or selected portions of said third data source in electronic form.
- 13. The method of claim 1, wherein said pre-defined k-anonymity value is determined by measuring a re-identification risk using a reference database and modifying said pre-defined k-anonymity value accordingly.
- 14. The method of claim 13, further comprising automatically checking said re-identification risk when more data are added to the first data source, and decreasing the pre-defined k-anonymity value, if the re-identification risk decreases after addition of the data.
- 15. An apparatus for record de-identification, comprising:
a data capture system, wherein the data is placed in a first data source on capture, and wherein said first data source comprises a plurality of first records having one or more first fields, said first fields having at least one corresponding first value; a reference data source comprising a plurality of second records having one or more second fields, said second fields having at least one corresponding second value; comparison means for comparing said first fields and said corresponding first values of each said first records to said second fields and corresponding second values of all said second records; a control interface to a user, operably coupled to said data capture system, said first data source, and said comparison means whereby:
said user pre-defines a resulting k-anonymity value for an output data source; and said user prioritizes said first fields according to said user's preference for preservation; and extraction means, operably coupled to said control interface and said output data source, for extracting the highest priority first fields from said first data source to said output data source based on said comparing; wherein said extracting results in a k-anonymity value for said output data source that approximates said pre-defined k-anonymity value
- 16. The apparatus of claim 15, further comprising a biochip device coupled to said data capture system and providing the data captured thereby.
- 17. An apparatus for record de-identification for use with a first data source having a plurality of first records having one or more first fields, said first fields having at least one corresponding first value, comprising:
means for prioritizing said first fields according to a user preference; using a second data source, wherein said second data source comprises a plurality of second records having one or more second fields, said second fields having at least one corresponding second value, means for comparing said first fields and said corresponding first values of each said first record to said second fields and said corresponding second values of all of said second records; and based on said comparing, means for extracting said first records and said first corresponding values of the highest priority first fields from said first data source to a third data source, wherein said extracting results in a k-anonymity value for said third data source approximating a pre-defined k-anonymity value.
- 18. A computer system for use in record de-identification for use with a first data source having a plurality of first records having one or more first fields, said first fields having at least one corresponding first value, comprising computer instructions for:
prioritizing said first fields according to a user preference; using a second data source, wherein said second data source comprises a plurality of second records having one or more second fields, said second fields having at least one corresponding second value, comparing said first fields and said corresponding first values of each said first record to said second fields and said corresponding second values of all of said second records; and based on said comparing, extracting said first records and said first corresponding values of the highest priority first fields from said first data source to a third data source, wherein said extracting results in a k-anonymity value for said third data source approximating a pre-defined k-anonymity value.
- 19. A computer-readable medium storing a computer program executable by a plurality of server computers for use with a first data source having a plurality of first records having one or more first fields, said first fields having at least one corresponding first value, the computer program comprising computer instructions for:
prioritizing said first fields according to a user preference; using a second data source, wherein said second data source comprises a plurality of second records having one or more second fields, said second fields having at least one corresponding second value, comparing said first fields and said corresponding first values of each said first record to said second fields and said corresponding second values of all of said second records; and based on said comparing, extracting said first records and said first corresponding values of the highest priority first fields from said first data source to a third data source, wherein said extracting results in a k-anonymity value for said third data source approximating a pre-defined k-anonymity value.
CROSS-REFERENCE(S) TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Applications Nos. 60/315751, 60/315753, 60/315754, and 60/315755, all filed on 30 Aug. 2001, and No. 60/335787, filed on 5 Dec. 2001, hereby incorporated herein by reference in their entireties.
Provisional Applications (5)
|
Number |
Date |
Country |
|
60315751 |
Aug 2001 |
US |
|
60315753 |
Aug 2001 |
US |
|
60315754 |
Aug 2001 |
US |
|
60315755 |
Aug 2001 |
US |
|
60335787 |
Dec 2001 |
US |