At least some embodiments disclosed herein relate to data storage and retrieval in general and more particularly but not limited to protection of identity information in data storage and retrieval.
Personally identifiable information (PII) is data that could potentially identify a specific individual. Information that can be used to distinguish one person from another and can be used for de-anonymizing anonymous data may be considered PII. PII can be used on its own or with other information to identify, contact, or locate a single person, or to identify an individual in context. From PII the identity of a corresponding person can be reasonably ascertainable.
Examples of PII include full name, home address, email address, national identification number, passport number, driver's license number, telephone number, credit card numbers, digital identity, IP address, login name, screen name, nickname, date of birth, birthplace, genetic information, facial image, fingerprints, or handwriting.
There is a need to protect PII for privacy, anonymity, and/or compliance with rules, laws and regulations.
U.S. Pat. No. 7,933,841 discloses a system to track member consumer credit card transactions without receiving personal information for non-members by using a one way hash function. In such a system, a one-way hash function is applied to personal information (e.g., a credit card number) to obtain fingerprints that represent the personal information. The personal information in transaction data of credit card users is replaced by the fingerprints, where some of the users are members and some of the users are non-members. A computer having the personal information of the members can used the personal information to generate the corresponding fingerprints to identify the transactions of the members without access to the personal information of the non-members. The one way hash function makes it nearly impossible to reverse the fingerprints to the corresponding personal information that the computer does not already have.
The embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements.
The following description and drawings are illustrative and are not to be construed as limiting. Numerous specific details are described to provide a thorough understanding. However, in certain instances, well known or conventional details are not described in order to avoid obscuring the description. References to one or an embodiment in the present disclosure are not necessarily references to the same embodiment; and, such references mean at least one.
The system in
In
Examples of identification information (e.g., 121, or 123) include personally identifiable information (PII) and other sensitive information.
In
For example, after obtaining the identification information A (121) that identifies a person/entity, the data source X (107) submits the identification information A (121) to the data bank (101). In response the data bank (101) assigns a token A (111) to represent the identification information A (121), stores data associating the token A (111) and the identification information A (121), and provides the token A (111) to the data source X (107) as a response to receiving the identification information A (121). Thus, the data source X (107) stores data items (e.g., 131) in association with the token A (111) to indicate the association between the data items (e.g., 131) and the identification information A (121).
In one embodiment, each piece of identification information (e.g., 121, or 123) received from a separate request from a data source (e.g., 107, . . . , or 109) is assigned a separate token (111, or 113). The same identification information submitted by different data sources (e.g., 107, . . . , 109) can be assigned different tokens. Further, the same identification information submitted by the data sources (e.g., 107, . . . , or 109) in different requests for tokens can be assigned different tokens. Thus, the same identification information can be represented in the same data source (107, . . . , or 109) and/or different data sources (107, . . . , 109) by different tokens (e.g., 111, . . . , 113, . . . , 115).
In
In one embodiment, the data bank (101) is a highly secured facility that prevents unauthorized access. Thus, the data security of the entire system in protecting the identification information (121, . . . , 123, . . . , 125) is improved.
In
For example, the data exchange (141) transmits a token matching request (141) to the data bank (101). In response, the data bank (101) identifies, based on the identification information (121, . . . , 123, . . . , 125) stored in the data bank (101), a set of tokens (e.g., 111, . . . , 113) are assigned to represent the same person/entity and assigns a token (119) to represent the set of identified tokens (e.g., 111, . . . , 113) of the same person/entity. The data exchange (103) than replaces, in the data records retrieved from the data sources (107, . . . , 109), the identified tokens (e.g., 111, . . . , 113) of the same person/entity with the token (119) provided in the matching response (143). In such a way the data exchange (103) generates, for the data user (105), a data bundle (145) that links the data items (131, . . . , 133) with the same token (119) representing the different tokens (111, . . . , 113) used in the data sources (107, . . . , 109) to represent the person/entity. Thus, the data items of the person/entity across the data sources (107, . . . , 109) are aggregated according to the identities of the persons/entities, without revealing the identification information (121, . . . , 123, . . . , 125) outside the data bank (101).
Different tokens (e.g., 119) can be used represent the same set of tokens (111, . . . , 123) of a person/entity in data bundles (e.g., 145) provided to different data users (e.g., 105) and/or to the same data user (105) for different data using projections for enhanced identity protection.
In
For example, the same entity can be represented by different tokens (e.g., 111, 113) in different data sources (e.g., 107, 119). Further, the same entity associated with different data items in a same data source can be represented by different tokens. Thus, privacy of the entities involved in the data items stored in the data sources (e.g., 107, 119) is improved.
In one embodiment, a data source (e.g., 107 or 109) does not store the identification information (e.g., 121 or 123) that is represented by the respective tokens (e.g., 111 or 113). Thus, the damage of a data breach in the data source (e.g., 107 or 109) is limited.
In
The tokens (e.g., 121, . . . , 123, . . . , 125) are generated in a way that cannot be reversed to reveal the identification information (e.g., 121, . . . , 123, . . . , 125) represented by the respective tokens (e.g., 121, . . . , 123, . . . , 125). For example, the tokens (e.g., 121, . . . , 123, . . . , 125) can be selected from random numbers generated by the data bank (101). Alternatively or in combination, the tokens (e.g., 121, . . . , 123, . . . , 125) can be selected further based on the identification information (e.g., 121, . . . , 123, . . . , 125) and/or the requests for tokens. For example, the token (111) can be computed from a one-way hash of a combination of the identification information (121), a random number, an identification of the data source (107) that submits the identification information (121) to obtain the token (111), the date and/or time of the request for the token (111), and/or the date and/or time of the generation of the token (111), etc.
In
For example, the data exchange (103) illustrated in
For example, a data source (e.g., 107 or 109) illustrated in
For example, the data bank (101) illustrated in
In
In one embodiment, the inter-connect (171) interconnects the microprocessor(s) (173) and the memory (176) together and also interconnects them to input/output (I/O) device(s) (175) via I/O controller(s) (177). I/O devices (175) may include a display device and/or peripheral devices, such as mice, keyboards, modems, network interfaces, printers, scanners, video cameras and other devices known in the art. In one embodiment, when the data processing system is a server system, some of the I/O devices (175), such as printers, scanners, mice, and/or keyboards, are optional.
In one embodiment, the inter-connect (171) includes one or more buses connected to one another through various bridges, controllers and/or adapters. In one embodiment the I/O controllers (177) include a USB (Universal Serial Bus) adapter for controlling USB peripherals, and/or an IEEE-1394 bus adapter for controlling IEEE-1394 peripherals.
In one embodiment, the memory (176) includes one or more of: ROM (Read Only Memory), volatile RAM (Random Access Memory), and non-volatile memory, such as hard drive, flash memory, etc.
Volatile RAM is typically implemented as dynamic RAM (DRAM) which requires power continually in order to refresh or maintain the data in the memory. Non-volatile memory is typically a magnetic hard drive, a magnetic optical drive, an optical drive (e.g., a DVD RAM), or other type of memory system which maintains data even after power is removed from the system. The non-volatile memory may also be a random access memory.
The non-volatile memory can be a local device coupled directly to the rest of the components in the data processing system. A non-volatile memory that is remote from the system, such as a network storage device coupled to the data processing system through a network interface such as a modem or Ethernet interface, can also be used.
In this description, some functions and operations are described as being performed by or caused by software code to simplify description. However, such expressions are also used to specify that the functions result from execution of the code/instructions by a processor, such as a microprocessor.
Alternatively, or in combination, the functions and operations as described here can be implemented using special purpose circuitry, with or without software instructions, such as using Application-Specific Integrated Circuit (ASIC) or Field-Programmable Gate Array (FPGA). Embodiments can be implemented using hardwired circuitry without software instructions, or in combination with software instructions. Thus, the techniques are limited neither to any specific combination of hardware circuitry and software, nor to any particular source for the instructions executed by the data processing system.
While one embodiment can be implemented in fully functioning computers and computer systems, various embodiments are capable of being distributed as a computing product in a variety of forms and are capable of being applied regardless of the particular type of machine or computer-readable media used to actually effect the distribution.
At least some aspects disclosed can be embodied, at least in part, in software. That is, the techniques may be carried out in a computer system or other data processing system in response to its processor, such as a microprocessor, executing sequences of instructions contained in a memory, such as ROM, volatile RAM, non-volatile memory, cache or a remote storage device.
Routines executed to implement the embodiments may be implemented as part of an operating system or a specific application, component, program, object, module or sequence of instructions referred to as “computer programs.” The computer programs typically include one or more instructions set at various times in various memory and storage devices in a computer, and that, when read and executed by one or more processors in a computer, cause the computer to perform operations necessary to execute elements involving the various aspects.
A machine readable medium can be used to store software and data which when executed by a data processing system causes the system to perform various methods. The executable software and data may be stored in various places including for example ROM, volatile RAM, non-volatile memory and/or cache. Portions of this software and/or data may be stored in any one of these storage devices. Further, the data and instructions can be obtained from centralized servers or peer to peer networks. Different portions of the data and instructions can be obtained from different centralized servers and/or peer to peer networks at different times and in different communication sessions or in a same communication session. The data and instructions can be obtained in entirety prior to the execution of the applications. Alternatively, portions of the data and instructions can be obtained dynamically, just in time, when needed for execution. Thus, it is not required that the data and instructions be on a machine readable medium in entirety at a particular instance of time.
Examples of computer-readable media include but are not limited to recordable and non-recordable type media such as volatile and non-volatile memory devices, read only memory (ROM), random access memory (RAM), flash memory devices, floppy and other removable disks, magnetic disk storage media, optical storage media (e.g., Compact Disk Read-Only Memory (CD ROMS), Digital Versatile Disks (DVDs), etc.), among others. The computer-readable media may store the instructions.
The instructions may also be embodied in digital and analog communication links for electrical, optical, acoustical or other forms of propagated signals, such as carrier waves, infrared signals, digital signals, etc. However, propagated signals, such as carrier waves, infrared signals, digital signals, etc. are not tangible machine readable medium and are not configured to store instructions.
In general, a machine readable medium includes any mechanism that provides (i.e., stores and/or transmits) information in a form accessible by a machine (e.g., a computer, network device, personal digital assistant, manufacturing tool, any device with a set of one or more processors, etc.).
In various embodiments, hardwired circuitry may be used in combination with software instructions to implement the techniques. Thus, the techniques are neither limited to any specific combination of hardware circuitry and software nor to any particular source for the instructions executed by the data processing system.
The description and drawings are illustrative and are not to be construed as limiting. The present disclosure is illustrative of inventive features to enable a person skilled in the art to make and use the techniques. Various features, as described herein, should be used in compliance with all current and future rules, laws and regulations related to privacy, security, permission, consent, authorization, and others. Numerous specific details are described to provide a thorough understanding. However, in certain instances, well known or conventional details are not described in order to avoid obscuring the description. References to one or an embodiment in the present disclosure are not necessarily references to the same embodiment; and, such references mean at least one.
The use of headings herein is merely provided for ease of reference, and shall not be interpreted in any way to limit this disclosure or the following claims.
Reference to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, and are not necessarily all referring to separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by one embodiment and not by others. Similarly, various requirements are described which may be requirements for one embodiment but not other embodiments. Unless excluded by explicit description and/or apparent incompatibility, any combination of various features described in this description is also included here. For example, the features described above in connection with “in one embodiment” or “in some embodiments” can be all optionally included in one implementation, except where the dependency of certain features on other features, as apparent from the description, may limit the options of excluding selected features from the implementation, and incompatibility of certain features with other features, as apparent from the description, may limit the options of including selected features together in the implementation.
The disclosures of the above discussed patent documents are hereby incorporated herein by reference.
In the foregoing specification, the disclosure has been described with reference to specific exemplary embodiments thereof. It will be evident that various modifications may be made thereto without departing from the broader spirit and scope as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.