The present embodiments relate to security associated with application service provider in a shared resource environment. More specifically, the embodiments relate to privacy assessment and privacy preservation as related to the application service provider and associated data.
A data steward is an entity responsible for management and proficiency of stored data to ensure fitness of data elements. An example of a data steward is a hospital collecting information from multiple patients and medical professional, where data collected needs to be protected according to privacy and legislation requirements. More specifically, a data steward is responsible for data processing, data policies, data guidelines and administration of information in compliance with policy and regulatory obligations. A data steward is also known as the data controller in certain legislations such as General Data Protection Regulation (GDPR). Thus, the data steward is an entity responsible for ensuring that security and privacy policies comply with regulatory and governance initiatives to manage privacy and confidentiality of data.
The role and responsibilities of the data steward may include serving as a data custodian, which includes addressing classification of data and associated risk tolerance. The data steward is responsible for provisioning access to data, including reviewing and authorizing data access requests individually or defining a set of rules to determine eligibility for the access. For example, eligibility may be based on a business function, roles, etc. The data steward is an aspect within the information technology platform to ensure privacy and appropriate access of associated data. However, the data steward is one entity in a system of a plurality of entities. It is understood that with the development and growth of information technology and associated infrastructure, such as shared sources, there are locations where security of data and/or associated services may be compromised in a manner that is beyond the control of the data steward.
While privacy legislations such as the Health Insurance Portability and Accountability Act (HIPAA) and GDPR impose obligations on data stewards to protect the privacy of data owners, data stewards also make use of services to maintain data and help provide anonymized queries to private datasets. Because private data can be queried in an anonymized fashion, for example to perform research studies, data stewards need to ensure they detect any potential data leakages and that in case those arise, that they act promptly to stop them.
The embodiments include a system, computer program product, and method for facilitating auditing of private data through the collection, test, computation of a privacy score based on auxiliary information and specific characteristics of the service used for anonymization and private data, notification processing and delivery.
In one aspect, a system is provided with a computer platform and one or more associated tools for managing privacy, and more specifically for assessment of privacy preservation. A processing unit is operatively coupled to memory and is in communication with a tool in the form of an Auditing and Privacy Verification Evaluator, hereinafter referred to as the Evaluator. The Evaluator functions to receive a preferred level of privacy for a computing resource. In addition, the Evaluator performs a confidence level assessment of candidate inferences, and from this assessment forms a set of inferred entities and selectively assigns individual candidate inferences to an inferred entity set. The Evaluator performs a privacy preservation assessment for the formed set. This assessment returns a privacy score that can be used as a leakage indicator. The Evaluator populates a data container with inferred entities that violate the preferred privacy level.
In another aspect, a computer program device is provided to perform a privacy preservation assessment. The device has program code embodied therewith. The program code is executable by a processing unit to receive a preferred level of privacy for a computing resource. The program code performs a confidence level assessment of candidate inferences, and from this assessment forms a set of inferred entities and selectively assigns individual candidate inferences to an inferred entity set. Program code is also provided to perform a privacy preservation assessment for the formed set. This assessment returns a privacy score directed at a leakage indicator. A data container is provided operatively coupled to the program code and program code device. The program code populates the data container with inferred entities that violate the preferred privacy level.
In yet another aspect, a method is provided for supporting and performing a privacy preservation assessment. A preferred level of privacy for a computing resource is received, and a confidence level assessment of candidate inferences is performed. From this assessment, a set of inferred entities is formed and individual candidate inferences are selectively assigned to an inferred entity set. Thereafter, a privacy preservation assessment is performed for the formed set. This assessment returns a privacy score directed at a leakage indicator. A data container is populated with inferred entities that violate the preferred privacy level.
These and other features and advantages will become apparent from the following detailed description of the presently preferred embodiment(s), taken in conjunction with the accompanying drawings.
The drawings referenced herein form a part of the specification. Features shown in the drawings are meant as illustrative of only some embodiments, and not of all embodiments, unless otherwise explicitly indicated.
It will be readily understood that the components of the present embodiments, as generally described and illustrated in the Figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the apparatus, system, method, and computer program product of the present embodiments, as presented in the Figures, is not intended to limit the scope of the embodiments, as claimed, but is merely representative of selected embodiments.
Reference throughout this specification to “a select embodiment,” “one embodiment,” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiments. Thus, appearances of the phrases “a select embodiment,” “in one embodiment,” or “in an embodiment” in various places throughout this specification are not necessarily referring to the same embodiment.
The illustrated embodiments will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of devices, systems, and processes that are consistent with the embodiments as claimed herein.
A service that is run on a shared computing resource may be available to a plurality of devices across one or more network connections. Such services are referred to as application service providers (ASPs), which provide a computer-based service to customers over a network. As shown, a data steward is a client of a service supported by the ASP. The supported service is referred to herein as a cloud service. The evaluation of the cloud service is supported by a separate service, referred to herein as an Auditing and Privacy Verification Evaluator, hereinafter referred to as the Evaluator. The Evaluator is responsible for checking privacy preservation of the cloud service at any location used by the steward. The Evaluator is responsible for running or executing privacy tests directed at the cloud service. Such tests may be executed on a periodic basis, when a data set is substantially modified, when a privacy method or setting is subject to change, or when a request is received from the data steward. In one embodiment, the Evaluator is also cognizant of the resources and associated strain on the cloud service, and avoids disruption of workloads supported by the cloud service, and/or cloud service supported hardware.
Data privacy, also referred to herein as privacy, is an aspect of information technology that addresses the ability of an organization or individual to disseminate data. A system that protects privacy has as goals to restrict who, how, what and for what purpose an entity can access the information. Measuring privacy is directed at a non-operational and non-functional requirement. Infrastructure is a term employed in information technology that refers to a collection of hardware, software, networks, data centers, and related equipment that support information technology services. As such infrastructures are subject to growth and inter-connection, there is a concern for maintaining data privacy. While hardware and software vulnerabilities may compromise the confidentiality and privacy of data trivially, understanding and detecting the inference risks of private data is also paramount. The latter requires additional effort as inference of information is not necessarily well understood and may depend on how much an adversary may know about the data protected, auxiliary information widely or easily obtainable and the anonymization techniques used to protect the private data.
Referring to
As further shown, the Evaluator (140) is operatively coupled to the shared resource provider (130). The Evaluator (140) functions to evaluate privacy associated with the service supported and enabled by the provider (130). More specifically, the Evaluator (140) is responsible for assessing if the shared resource provider (130) is properly preserving the privacy of the data at any point of time. The shared resource provider (130) should be performing adequate anonymization controls to ensure that the client(s) (110) is only obtaining anonymized information as per the requirements of the data steward (120). The Evaluator (140) is shown herein configured with a pool of tests (142) designed to infer or anonymize data sets for common privacy preservation. Examples of data privacy preservation include, but are not limited to, k-anonymity, 1-diversity, and differential private mechanisms. Each of these example data privacy preservation tools employs techniques to guarantee the privacy of the data subject to assessment by the Evaluator (140). As shown and described below, the Evaluator (140) is responsible for running tests to evaluate the privacy preservation techniques run by the shared resource provider (130). These tests can be conducted on a periodic basis, responsive to modification of an associated data set, responsive to a modification of an associated privacy setting, or responsive to a request received from the data steward. In one embodiment, execution of one or more privacy tests by the Evaluator (140) takes place in a manner that avoids disruption to workloads. The goal of the privacy test execution is to imitate behavior of an adversary that tries to obtain private data and proactively address any identified privacy weakness(es). Accordingly, as shown, the Evaluator (140) functions as an auditor of the service provider (130) to assess privacy services.
Referring to
The Evaluator (250) is configured to communicate with remote shared resources over a communication network. For example, Evaluator (250) may communicate with one or more of the computing devices (280)-(288), and associated data storage. As further shown, the Evaluator (250) is in communication with a shared remote computing device (290), operatively coupled to a data center (290a), also referred to herein as a shared data storage location. The Evaluator (250) is operatively coupled to a knowledge base (260) of one or more tests (264). In select embodiments, the knowledge base (260) also referred to herein as corpus, may include structured, semi-structured, and/or unstructured content that are contained in one or more large knowledge databases or corpus. The various computing devices (280), (282), (284), (286), and (288) shown in communication with the network (205) may include access points for data stewards to provide data and clients to query the data. In one embodiment, the data steward(s) is referred to as the content creators and the client(s) is referred to as the content user(s). The network (205) may include local network connections and remote connections in various embodiments, such that the Evaluator (250) may operate in environments of any size, including local and global, e.g. the Internet.
The Evaluator (250) may serve as a back-end system that can assess privacy preservation from a variety of knowledge extracted from or represented in documents, network accessible sources and/or structured data sources. It is understood that data anonymization is the use of one or more techniques designed to make it impossible or at least difficult, to identify a particular individual from stored data related to them. The purpose of data anonymization is to protect the privacy of the individual and to make it legal for entities such as governments and businesses to share their data, which in one embodiment includes getting permission according to rules and or regulations, such as GDPR and HIPAA. De-anonymization is a reverse engineering process used to detect the sensitive source data. As shown and described, the Evaluator (250) infers private information to assess de-anonymization of a data set. More specifically, the Evaluator (250) functions as an assessor of the quality of data anonymization.
With respect to privacy and associated private data, it is understood that confidentiality of the data is expected and as such is an inherent if not an express characteristic. As shown in
The Evaluator (250) functions to assess the quality and level of privacy of the entities that comprise the initial inferred set that has been identified and compiled. More specifically, the Evaluator performs a privacy preservation assessment for the formed set of inferred entities. The privacy assessment conducted herein is directed at the preferred level of privacy as directed by the steward and a load in the associated service provider. In one embodiment, the steward has an expected level of privacy with the data being supported and managed by the service provider, and the privacy assessment is a third party review to determine if the expectation of the steward is being met by the service provider. The assessment returns a leakage indicator in the form of a privacy score related to privacy information associated with the service. As indicated the Evaluator (250) is in receipt of the preferred level of privacy of service associated with the data. The privacy assessment determines if the preferred level of privacy for the services as related to the inferred set has been attained. The Evaluator (250) populates a data container with the formed set of inferred entities (266), also referred to as an inferred entity set, that violate the confidence level to dictate the entity set should be reported.
It is understood that the performed assessment is based on inferences, and that further assessment directed at raw data may produce different results, which may affect the population of the data container. Referring to
It is understood that the privacy preservation assessment conducted by the Evaluator (250) is an initial privacy assessment based on an initial set of inferred data. After the initial container population, the Evaluator (250) is employed to conduct a further assessment directed by a comparison of the inferred entities to the raw data, and from this further assessment, the Evaluator (250) creates an adjusted set of inferences. In one embodiment, the further assessment is conducted local to the data steward. The formed adjusted set of inferences may include removal of one or more entries from the inferred entity set (266). More specifically, the initial population of the inferred entity set (266) is based on predictions, and the adjustment of the population of the data container is based on a raw data assessment and includes selective removal of one or more failed predictions. In one embodiment, the Evaluator (250) selectively removes one or more entries from the formed set of inferred entities (266) at any time that the associated data may be considered public data. This selective removal may take place prior to or after receipt of the privacy assessment by the Evaluator (250). The Evaluator (250) may dynamically re-compute the privacy preservation assessment for the adjusted set of inferred entities. In addition, following the re-computation of the privacy preservation assessment, the Evaluator (250) iteratively evaluates the candidate inferences based on the updated or changed privacy score to create a modified set of candidate inferences. In one embodiment, the modified set includes a change of one or more of the candidate inference entries shown in
The privacy assessment is shown herein conducted by the Evaluator (250). The Evaluator (250) may conduct the assessment based on a data set, a privacy method modification, or an explicit request from a cloud service steward. Accordingly, the communications and functionality of the Evaluator (250) allows the data steward to upload sensitive data sets, and to perform privacy preserving queries on the data sets.
Referring to
The Evaluator (140) receives the expected level of privacy (404). In one embodiment, the data steward identifies a preferred level of privacy for the associated cloud or remote service. Input is obtained from the data owner with respect to selection of tests (406). The input may include identification of sensitive data, e.g. identifiers, potential quasi-identifier, and types of data stored, e.g. images, location information. It is understood that different data types may require different tests. In one embodiment, if this information is not initially provided, a test or query will probe the system to identify the data types. In addition, potential background information is solicited, such as social networks, databases, etc. The thoroughness of the evaluation may also be solicited at step (406) as input, with the thoroughness including but not limited to amount of time to spend in the probing process, a load on a corresponding service provider, and size of data to be crawled. Accordingly, the input solicited at step (406) will dictate the inferred and associated security evaluation.
Following step (406), the Evaluator conducts an initial privacy assessment with the quality and level of privacy of the entities that comprises an initially compiled set of initial inferred entities (408). An output set, T, of inferred entities is produced, with each entity having an associated confidence level (410). A privacy score is computed for the associated service being evaluated for privacy, as shown and described below. The tool performing the assessment has a threshold, T, that determines when a candidate inference is considered an inferred entity, e.g. a high enough confidence level. A set of inferred entities, Set, is initialized (412). An initial privacy test is conducted and produces an associated test result for each inferred entity (414). The variable XTotal is assigned to the quantity of candidate entities (416), and an associated counting variable is initialized (418). Each initial privacy test result is assessed with respect to a threshold, τ (420). More specifically, at step (420), for each candidate entity that is a member of the inferred entity set, it is determined if the threshold, τ, is less than or equal to the respective confidence level. A positive response to the determination at step (420) is followed by entry of the candidate entity, e.g. candidate entityX, as a member in the inferred entity set, Set, (424). Otherwise, the candidate entry, e.g. candidate entityx, is not entered into the Set (422). Following either step (422) or step (424), the candidate entity counting variable is incremented (426), and it is determined if all of the candidate entities have been processed (428) with respect to the privacy test result and associated threshold at step (420). A negative response to the determination at step (428) is followed by a return to step (420) for continued assessment. Following the evaluation of the candidate entities, as demonstrated by a positive response to the determination at step (428), a privacy score is computed for the entity set, Set, (430). In one embodiment, the privacy score is as follows:
1−|Set|/Total entries,
where Set is the number of unique inferred entries found by the evaluation, and Total entries is the number of entries in the dataset, e.g. number of rows. Following step (430) the computed privacy score is compared with the steward's expectation of privacy (432). As demonstrated, it is determined if the computed privacy score meets the expected privacy level (434). If the comparison meets the expected privacy level as demonstrated by a positive response to the determination conducted at step (434), a communication is transmitted that the privacy expectation has been met (436). In one embodiment, the communication at step (436) is a report, and may include logs or other types of data supporting the lack of violations of privacy expectations. Conversely, if at step (434) it is determined that the comparison does not meet the expected level of privacy, the data container is populated and a warning is communicated that the privacy expectation or possible security assessment has been violated or not met (438). Accordingly, as shown herein an initial inference entry set is created and evaluated to assess expectation of a service provider, together with formation of a container of the inferred entry set at least meeting the initial assessment protocol(s).
The evaluation shown and described in
It is understood that any adjustment of the inference set in
The assessments shown and described in
Referring to
As shown and described in
The data steward is responsible for uploading data to a shared service provider and marking or otherwise identifying the data as private or confidential. Due to the intrinsic and sensitive characteristics of the data, one or more clients are selectively allowed to perform privacy preserving queries on the uploaded data. The evaluation shown and described in
The testing and evaluation is responsive to characteristics of the subject data and the selected tests. It is understood that different data may have different levels of privacy or privacy expectations. Similarly, it is understood that such privacy characteristics may not be static or uniform. For example, in one embodiment, different clients may have different levels of access with respect to the data and the privacy settings. The privacy evaluation may return a score that may or may not commensurate with different privacy settings for different clients, resulting in re-assessment, raw data assessment, compilation and communication of an associated data container, etc.
The security evaluation shown and described in
Host (702) may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types. Host (702) may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
As shown in
Memory (706) can include computer system readable media in the form of volatile memory, such as random access memory (RAM) (730) and/or cache memory (732). By way of example only, storage system (734) can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive”). Although not shown, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a “floppy disk”), and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media can be provided. In such instances, each can be connected to bus (708) by one or more data media interfaces.
Program/utility (740), having a set (at least one) of program modules (742), may be stored in memory (706) by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating systems, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment. Program modules (742) generally carry out the functions and/or methodologies of embodiments to security classification and evaluation and output directed at container compilation. For example, the set of program modules (742) may be configured as the Auditing and Privacy Verification Evaluator as described in
Host (702) may also communicate with one or more external devices (714), such as a keyboard, a pointing device, a sensory input device, a sensory output device, etc.; a visual display (724); one or more devices that enable a user to interact with host (702); and/or any devices (e.g., network card, modem, etc.) that enable host (702) to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interface(s) (722). Still yet, host (702) can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter (720). As depicted, network adapter (720) communicates with the other components of host (702) via bus (708). In one embodiment, a plurality of nodes of a distributed file system (not shown) is in communication with the host (702) via the I/O interface (722) or via the network adapter (720). It should be understood that although not shown, other hardware and/or software components could be used in conjunction with host (702). Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc.
In this document, the terms “computer program medium,” “computer usable medium,” and “computer readable medium” are used to generally refer to media such as main memory (706), including RAM (730), cache (732), and storage system (734), such as a removable storage drive and a hard disk installed in a hard disk drive.
Computer programs (also called computer control logic) are stored in memory (706). Computer programs may also be received via a communication interface, such as network adapter (720). Such computer programs, when run, enable the computer system to perform the features of the present embodiments as discussed herein. In particular, the computer programs, when run, enable the processing unit (704) to perform the features of the computer system. Accordingly, such computer programs represent controllers of the computer system.
In one embodiment, host (702) is a node of a cloud computing environment. As is known in the art, cloud computing is a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service. This cloud model may include at least five characteristics, at least three service models, and at least four deployment models. Example of such characteristics are as follows:
On-demand self-service: a cloud consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with the service's provider.
Broad network access: capabilities are available over a network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs).
Resource pooling: the provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to demand. There is a sense of location independence in that the consumer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher layer of abstraction (e.g., country, state, or datacenter).
Rapid elasticity: capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.
Measured service: cloud systems automatically control and optimize resource use by leveraging a metering capability at some layer of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported providing transparency for both the provider and consumer of the utilized service.
Service Models are as follows:
Software as a Service (SaaS): the capability provided to the consumer is to use the provider's applications running on a cloud infrastructure. The applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based email). The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
Platform as a Service (PaaS): the capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including networks, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.
Infrastructure as a Service (IaaS): the capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
Deployment Models are as follows:
Private cloud: the cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on-premises or off-premises.
Community cloud: the cloud infrastructure is shared by several organizations and supports a specific community that has shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be managed by the organizations or a third party and may exist on-premises or off-premises.
Public cloud: the cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.
Hybrid cloud: the cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).
A cloud computing environment is service oriented with a focus on statelessness, low coupling, modularity, and semantic interoperability. At the heart of cloud computing is an infrastructure comprising a network of interconnected nodes.
Referring now to
Referring now to
Virtualization layer (920) provides an abstraction layer from which the following examples of virtual entities may be provided: virtual servers; virtual storage; virtual networks, including virtual private networks; virtual applications and operating systems; and virtual clients.
In one example, management layer (930) may provide the following functions: resource provisioning, metering and pricing, user portal, service layer management, and SLA planning and fulfillment. Resource provisioning provides dynamic procurement of computing resources and other resources that are utilized to perform tasks within the cloud computing environment. Metering and pricing provides cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may comprise application software licenses. Security provides identity verification for cloud consumers and tasks, as well as protection for data and other resources. User portal provides access to the cloud computing environment for consumers and system administrators. Service layer management provides cloud computing resource allocation and management such that required service layers are met. Service Layer Agreement (SLA) planning and fulfillment provides pre-arrangement for, and procurement of, cloud computing resources for which a future requirement is anticipated in accordance with an SLA.
Workloads layer (940) provides examples of functionality for which the cloud computing environment may be utilized. Examples of workloads and functions which may be provided from this layer include, but are not limited to: mapping and navigation; software development and lifecycle management; virtual classroom education delivery; data analytics processing; transaction processing; and security processing.
It will be appreciated that there is disclosed herein a system, method, apparatus, and computer program product for evaluating and processing data and associated data security protocols and ascertaining an inferred set of entities that may a preferred level of privacy. While particular embodiments of the present embodiments have been shown and described, it will be obvious to those skilled in the art that, based upon the teachings herein, changes and modifications may be made without departing from these embodiments and their broader aspects. Therefore, the appended claims are to encompass within their scope all such changes and modifications as are within the true spirit and scope of the embodiments. Furthermore, it is to be understood that the embodiments are solely defined by the appended claims. It will be understood by those with skill in the art that if a specific number of an introduced claim element is intended, such intent will be explicitly recited in the claim, and in the absence of such recitations no such limitation is present. For non-limiting examples, as an aid to understanding, the following appended claims contain usage of the introductory phrases “at least one” and “one or more” to introduce claim elements. However, the use of such phrases should not be construed to imply that the introduction of a claim element by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim element to embodiments containing only one such element, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an”; the same holds true for the use in the claims of definite articles.
The present embodiments may be a system, a method, and/or a computer program product. In addition, selected aspects of the present embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and/or hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present embodiments may take the form of computer program product embodied in a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present embodiments. Thus embodied, the disclosed system, a method, and/or a computer program product are operative to improve the functionality and operation of notification processing and delivery.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a dynamic or static random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a magnetic storage device, a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
Computer readable program instructions for carrying out operations of the present embodiments may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server or cluster of servers. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present embodiments.
Aspects of the present embodiments are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the embodiments. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
The Evaluator shown and described herein may run multiple tests that try to de-anonymize the data based on Homogeneity attacks, which exploit potentially small domain of confidential fields; Background attacks which combine other sources of information with the private dataset, location injection attacks where the Evaluator may add entries to geospatial data service when location data is considered to be private, or a combinations of the above attack methods, or more. Additionally, tests could include data queries, statistical analysis, machine learning analytics, SQL and NoSQL queries, among others. In one embodiment, the tests run by the Evaluator may be tailored to the particular anonymization techniques followed by the Share Resourced Provider. Examples of anonymization techniques include but are not limited to k-anonymity, l-diversity, and t-closeness. K-anonymity splits a table into groups, such that each group has at least k records with the same quasi-identifiers (non-sensitive data). Similar to k-anonymity, l-diversity also utilizes a table, but the l-diversity ensures l distinct values in the sensitive column for each group. T-closeness is also similar to k-anonymity, but ensures distribution of the sensitive columns of each group is t-close to that of the table.
In another embodiment, the tests run by the Evaluator may be tailored to the specific data stored by the Share Resource Provider. For example, for geo-spatial information background information on landmarks and percentage of population is included to help tests de-anonymize data sets. Furthermore, some of the tests run may try to query the Share Resource provider with fabricated information to try to produce privacy leakages.
The following use case is an example for assessing a privacy score where a Share Resource Provider is utilizing k-anonymity to protect privacy as represented in Table 1.
Using the data from Table 1 provided by the Share Resource Provider and using as auxiliary information the fact that it is well known that a set of individuals' data are in Table 1, as well as their zip code or that they are in their twenties, the test can easily infer that they have heart disease. This follows because only people in their twenties are present on the first group of the table. If this is the only test run, then: the test infers entries 1-3 with confidence level for each equal to 1, and the tester can then compute the privacy score as: 1−3/9=1−0.33≈0.66.
The following use case is an example for assessing a privacy score utilizing a test tailored for 1-diversity where the data provided by the Shared Resource Provider is represented in Table 2.
A test using the data from Table 2, and as auxiliary information the fact that someone is part of the data and their zip code and their age, we can infer their disease if they are present on the first group of the table and we can infer with certain probability if they are present on the third group. Further using as auxiliary information the fact that one such a person is unlikely to have heart disease, the test can further infer that they have cancer with high probability. If this is the only test run, then: the test infers entries 7-9 with confidence level for each equal to 0.9, and assuming the tester has a threshold of 0.9, the privacy score as: 1−3/9=1−0.33≈0.66.
The following is a use case where multiple tests are combined to find a privacy score. In this example, the output from a first test, Test1, infers entries 1-3 with a confidence level for each entry being 1, and the output from a second test, Test2, infers entries 1-3 with a confidence level for each entry being 1, infers entries 7-8 with a confidence level for each entry being 0.9, and infers entry 9 with a confidence level of the entry being 0.8. Assuming the tester has a threshold of 0.9, the privacy score computed by the Evaluator is as follows: 1−|{1,2,3,7,8}|/9=1−0.55≈0.44.
It will be appreciated that, although specific embodiments have been described herein for purposes of illustration, various modifications may be made without departing from the spirit and scope of the embodiments. In particular, the privacy assessment certifies privacy provided by a shared resource service provider. The data steward may own the Evaluator, or in one embodiment, the Evaluator is a third-party service. The data steward may fully trust the Evaluator with all private data, hence it may provide ground truth data to the Evaluator to perform the verification of inferences presented in