The present invention relates generally to information sharing, and particularly, to a system and method for enabling the sharing of information under uncertain environments.
In both the commercial and defense sectors a compelling need is emerging for rapid, yet secure, dissemination of information to the concerned actors. For example, in a commercial setting, the ability of multiple partners to come together, share sensitive business information and coordinate activities to rapidly respond to business opportunities is becoming a key driver for success. Similarly, in a military setting, traditional wars between armies of nation-states are being replaced by highly dynamic missions where teams of soldiers, strategists, logisticians, and support personnel, fight against elusive enemies that easily blend into the civilian population. Securely disseminating mission critical tactical intelligence to the pertinent people in a timely manner will be a critical factor in a mission's success.
Within a single organization, it is possible to allow sharing of information while managing the risk of information disclosure by appropriately labeling (or classifying) information with its secrecy characteristics and performing an in-depth security assessment of its systems and users to create controls necessary to protect information commensurate with its label. Such a security/risk assessment will typically comprise a number of stakeholders and be carried out in a number of stages, including: system characterization, threat and vulnerability identification, control analysis, likelihood determination and impact analysis. Subsequently, policies can be put in place that will permit information to be shared within different parts of the organizations, provided that the recipient has necessary controls in place to protect the information. However, such an approach may not be viable for information sharing across organizations as one organization will typically not permit another to perform a security assessment of its internal systems, controls and people. In dynamic settings, where systems and processes evolve rapidly and there are transient needs for sharing tactical, time-sensitive information across organizational boundaries, a new approach of securing information flows is required.
Recently, new approaches based on risk estimation and economic mechanisms have been proposed for enabling the sharing of information in uncertain environments [P.-C. Cheng and P. Rohatgi and C. Keser and P. A. Karger and G. M. Wagner and A. S. Reninger in a referenced entitled “Fuzzy Multi-Level Security: An Experiment on Quantified Risk-Adaptive Access Control,” in Proceedings of the 2007 IEEE Symposium on Security and Privacy (SP 2007), 2007, pp. 222-230, Jason Program Office in a reference entitled “HORIZONTAL INTEGRATION: Broader Access Models for Realizing Information Dominance,” MITRE Corporation, Special Report JSR-04-13, 2004, and, M. Srivatsa and D. Agrawal and S. Balfe in a reference entitled “Trust Management for Secure Information Flows,” in Proceedings of 15th ACM Conference on Computer and Communication Security (CCS), 2008. These approaches are based on the idea that the sender constantly updates the estimate of the risk of information disclosure when providing information to a receiver based on the secrecy of the information to be divulged and the sender's estimate on the trustworthiness of the recipient. The sender then “charges” the recipient for this estimated risk. The recipient, in turn, can decide which type of information is most useful to him and “pay” (using its line of risk credit) only to access those pieces of information. However, past work is largely empirical in estimating the risk of information disclosure and in addition, it fails to holistically model the uncertainty in detecting information leakage.
As an alternative to economic mechanisms, in order to encourage behavioral conformity in ad-hoc groups one can also employ incentive mechanisms which have received a lot of attention in recent years. To date, the goal of such works has been to either reward “good” behavior, or punish “bad” behavior. In one conventional technique, for example, entities exchange tokens as a means of charging for/rewarding service usage/provision. Entities which behave correctly and forward packets are rewarded with additional tokens which, in turn, may be spent on forwarding their own packets. However, these approaches also fail to model the uncertainty in detecting good/bad behavior when making appropriate reward/punishment decisions.
There currently exists an increase in demand for solutions that allow for rapid yet secure sharing of information.
It would be highly desirable to provide a system and method that enables the generation of a decision theoretic model for securing such information flows by reducing the risk of data leakage.
The present invention addresses the above-mentioned shortcomings of the prior art approaches by providing a system and method that is designed to make optimal information sharing decisions based on only partial or imperfect monitoring data.
The system and method is designed to make optimal information sharing decisions based on only partial or imperfect monitoring data, while ensuring that the efficacy of the decisions degrades gracefully with that of the monitoring mechanism.
The system and method addresses such planning problems in two steps: First, it provides a first a formulation of the complex information sharing problems discussed above by combining Partially Observable Markov Decision Processes (POMDPs) with digital watermarking, a monitoring mechanism for data leakage detection. Second, it derives the optimal information sharing strategies for the sender and the optimal information leakage strategies for a rational-malicious recipient as a function of the efficacy of the underlying monitoring mechanism. In addition, the disclosure also provides a mechanism for analyzing the thresholds on the efficacy of a monitoring system in order to encourage information sharing under imperfect monitoring conditions for various reward models.
In one aspect, there is provided a system method and computer program product for optimally sharing information from a sender to one or more recipients. The method comprises: receiving, at a processor device, a parameter information including: identifiers of one or more recipients to receive shared information sent by a sender entity, number of information sharing decision time intervals n where 0≦n<N, and, a reward value for successful sharing and a penalty value for detecting a leakage; and building, using the processor device, a model of a dynamic trustworthiness of each the one or more recipients as a Partially Observable-Markov Decision Process (POMDP), the POMDP model including an initial sender belief state of trustworthiness of each of the recipients; deriving based on the model, an optimal information sharing policy for sharing with the one or more recipients that maximizes an expected reward for the sender; sharing the information with the one or more recipients, and, updating said belief state of trustworthiness of each recipient in the POMPD model in each decision time interval n by:
In a further aspect, there is provided a computer system for optimally sharing information from a sender to one or more recipients comprising:
A computer program product is provided for performing operations. The computer program product includes a storage medium readable by a processing circuit and storing instructions run by the processing circuit for running a method. The method is the same as listed above.
The objects, features and advantages of the present invention will become apparent to one skilled in the art, in view of the following detailed description taken in combination with the attached drawings, in which:
There is provided a decision theoretic approach for securing such information flows by reducing the risk of data leakage. The approach makes optimal information sharing decisions based on only partial or imperfect monitoring data, while ensuring that the efficacy of the decisions degrades gracefully with that of the monitoring mechanism.
The approach is applicable to information sharing domains that involve one information source (sender) and K information sinks (recipients) under the following generalized settings: (i) Information sharing occurs over a fixed period of N decision epochs and is mutually beneficial for the sender and each of the recipients; (ii) In each decision epoch a sender can share only one information object (e.g., packet), with a chosen recipient (i.e., it is understood that that the packets to be shared can be arranged in a serial order. By considering multiple copies of a packet, situations are modeled where a packet is to be shared with multiple recipients.); (iii) Leaking a shared packet results in a positive reward for the recipient and a penalty to the sender; (iv) Sharing a packet is instantaneous and the recipient leaks (or not) a packet immediately upon receiving it; (v) Sender uses a monitoring mechanism (e.g., digital watermark detector) to detect an (un)intended packet leakage by the recipients, and finally (vi) Subsequent sender actions (whether to share a packet and with whom) are determined using the imperfect observations made in (v).
It is understood that, without the existence of a monitoring mechanism or if an arbitrarily imperfect monitoring mechanism is used, then the system can have two trivial solutions: (a) share everything if the reward for information sharing is more than the penalty of information leakage; and (b) share nothing otherwise. In one embodiment, settings may be examined to encourage information sharing even when the penalty for information leakage is higher than that of information sharing by using a monitoring mechanism with realistic imperfections. In arriving at solutions to such planning problems, there is provided: a first of a kind formulation of the complex information sharing problems discussed above by combining Partially Observable Markov Decision Processes (POMDPs) with digital watermarking, one type of monitoring mechanism for data leakage detection. Second, the optimal information sharing strategies for the sender and the optimal information leakage strategies for a rational-malicious recipient as a function of the efficacy of the underlying monitoring mechanism are derived. Finally, the thresholds on the efficacy of a monitoring system are analyzed in order to encourage information sharing under imperfect monitoring conditions for various reward models.
In one embodiment, as shown in a system 10 depicted in
Typically, an associated DRM (digital rights management), such as a watermark, is applied by the sender to the original image file and sent as a transformed image to the recipient 19. Several commercial products are available at https://www.digimarc.com/ and include products that check for watermarks in digital content. One common method used in a leakage detector (especially for digital images) is to compute correlation between the image and a secret key. A leakage detector including correlator can be connected as part of an Internet Service Provider (ISP) infrastructure or, for example in enterprise networks at ingress/egress routers.
In an example implementation, a digital watermarking based monitoring mechanism to detect information leakage is provided.
In one embodiment, the recipient 19 is assumed to be good or bad, or good with a certain probability. However, it may be desirable for a sender to continue sharing packets or not with recipient(s), depending upon the trustworthiness of the recipient(s). Thus, in one embodiment, the sender maintains a distribution of the recipient's trustworthiness and is able to modify the recipient's trustworthiness over time (not a fixed number).
That is, as shown in
As the technique for leakage detection mechanism is not perfect, e.g., the component to detect leakage may be noisy there is inherent uncertainty in detecting information leakage. Consequently, current models fail to model the uncertainty in detecting good/bad behavior when making appropriate reward/punishment decisions. This uncertainty in detecting information leakage is modeled in the POMDP model described herein.
In one embodiment of the invention, POMDPs are employed to help the sender 14 (information source) characterize strategies of information sharing (what to share with whom?) and understand the optimal corruption strategies for a malign recipient 19.
Referring to
This is to be contrasted with a scenario 3, where in first decision epoch 52, sender 14 has taken action (share(1)) for sharing a packet with first recipient 191, and the observation indicates “no leak”. However, with respect to the second decision epoch 54 of scenario 57, there is indicated the sender sharing a packet with first recipient 191(share(1)), however, the resulting observation indicates “leak”. As a result first recipient 191 has earned a decreased confidence in trustworthiness in this scenario, and as a result, at the third decision epoch 56 of scenario 57, there is indicated the sender 14 sharing a packet with second recipient 192 (share(2)). That is, due to decreased confidence about the trustworthiness of first recipient 191, the sender 14 has decided to send the packet to another recipient, e.g., the second recipient 192, in third decision epoch 56. Thus, with respect to scenario 55, in one embodiment, a policy is implemented to maintain the distribution of the trustworthiness of the recipient (albeit the distribution is dynamic) in that the leakage detection mechanism is imperfect (i.e., the longer the observations over time, the less likely a recipient trustworthiness is decreased); while in scenario 57, a policy is implemented such that the trustworthiness of the recipient is reduced due to observed leakage over a smaller amount of decision epochs.
That is, for each scenario, there is computed, an expected total payoff for the sender depending upon which scenario it follows. For example, in Table I, shown in
Partially Observable Markov Decision Processes (POMDPs) such as described in the reference to E. J. Sondik entitled “The optimal control of partially observable Markov processes,” in Ph. D Thesis. Stanford University, 1971, the whole contents and disclosure of which is incorporated by reference as if fully set forth herein, are defined as follows: S is a finite set of discrete states of a process and A is a finite set of agent actions. The process starts in state S0 “belongs-to” S and runs for N decision epochs. In 0 “less-than-or-equal-to” n<N, the agent controlling the process chooses an action a “belongs-to” A to be performed next. The agent then receives the immediate reward R(s, a) while the process transitions with probability P(s′|s, a) to state s′ “belongs-to” S and decision epoch n+1. Otherwise, if n=N, the process terminates. The goal of the agent is to find a policy “pi” that, for each epoch 0 “less-than-or-equal-to” n<N, maximizes the sum of expected immediate rewards earned in epochs n, n+1, . . . , N when following policy “pi”. What complicates the agent's search for “pi” is that the process is only partially observable to the agent. That is, the agent receives noisy information about the current state s “belongs-to” S of the process and can therefore only maintain a probability distribution b(s) over states s “belongs-to” S (referred to as the agent belief state). Specifically, when the agent executes an action a “belongs-to” A and the process transitions to state s′, the agent receives with probability O(z| a, s′) an observation z from a finite set of observations Z. The agent then uses z to update its current belief state b, as described in the incorporated E. J. Sondik reference.
A policy “pi” of the agent therefore indicates which action “pi”(n, b) “belongs-to” A the agent should execute in decision epoch n in belief state b, for all 0 “less-than-or-equal-to” n<N and all belief states b reachable from an initial belief state b0 after n agent actions. To date, a number of efficient algorithms have been proposed to find a policy “pi” that yields the maximum expected reward for the agent. In one embodiment, a POMDP solver is used based on a point-based incremental pruning technique such as, for example, found in the reference to J. Pineau, G. Gordon, and S. Thrun, entitled “PBVI: An anytime algorithm for POMDPs,” in IJCAI, 2003, pp. 335-344 (“Pineau et al.”). Alternate algorithms that may be employed to find a policy “pi”* that yields a maximum expected reward for an agent (sender) is provided in references to M. Hauskrecht entitled “Value-function approximations for POMDPs,” JAIR, vol. 13, pp. 33-94, 2000; C. Poupart, P.; Boutilier entitled “VDCBPI: An approximate scalable algorithm for large scale POMDPs,” in NIPS, vol. 17, 2004, pp. 1081-1088; Z. Feng and S. Zilberstein, “Region-based incremental pruning for POMDPs,” in UAI, 2004, pp. 146-15; T. Smith and R. Simmons entitled “Point-based pomdp algorithms: Improved analysis and implementation,” in UAI, 2005; M. T. J. Spaan and N. Vlassis entitled “Perseus: Randomized point-based value iteration for POMDPs,” JAIR, vol. 24, pp. 195-220, 2005; and, P. Varakantham, R. Maheswaran, G. T., and M. Tambe entitled “Towards efficient computation of error bounded solutions in POMDPs: Expected value approximation and dynamic disjunctive beliefs,” in IJCAI, 2007, the whole content and disclosure of each of which is incorporated by reference as if fully set forth herein.
In one embodiment, preventing data leakage in domains of increased complexity allows to employ POMDPs to characterize optimal information sharing strategies for the sender and optimal watermark corruption strategies for a maligned recipient. There is provided a domain with a single, deterministic recipient (who either leaks out all the packets it receives or none of them). Further, the assumption that the recipient is deterministic is relaxed by considering a fuzzy recipient who leaks f % of the packets it receives. In one aspect, the POMDP models are generalized to domains where the sender shares information with multiple fuzzy recipients (each leaking a different percentage of packets it receives).
One Deterministic Recipient
For the case of a data leakage prevention domain involving a single information recipient (i.e., K=1) who acts in a deterministic way (leaks either 0% or 100% of all the packets it receives), such domain is modeled using POMDPs as follows: The set of states is S={s0, s100} where s0 denotes a state where the recipient leaks 0% of the packets it receives whereas s100 denotes a state where the recipient leaks 100% of the packets it receives. The set of sender actions is A={anoShare, aShare} where action anoShare results in the sender not sharing a packet with the recipient and aShare in sharing exactly one packet with the recipient, in some decision epoch. In one embodiment, it is assumed that the recipients never change the percentage of packets they leak out, and thus, the transition function is given by P(s0|anoShare, s0)=P(s0|aShare, s0)=P(s100|anoShare, s100)=P(s100|aShore, s100)=1. The set of sender observations is Z={zLeak, znoLeak, z0} where, according to zLeak, the last-shared packet has been leaked and, according to znoLeak, the last-shared packet has not been leaked. The sender receives an empty observation z0 when it does not share a packet with the recipient. (Note, that because z0 carries no information about the status of shared packets, it also does not affect the current sender estimate of the trustworthiness of the recipient. Also, because of the false positive/false negative observations, there may be O(zLeak|aShare, s0)>0 and O(zLeak|aShare, s100)<1.) Finally, there is had R(s0, anoShare)=R(s100, anoShare)=0 (not sharing a packet provides the sender with no reward/penalty) and R(s100, aShare)<0<R(s0, aShare) (sharing a packet is beneficial to the sender only if the packet is not leaked). To illustrate a domain with a deterministic recipient on an example assume N=10 decision epochs, rewards R(s0, aShare)=2; R(s100, aShare)=−1, observation function O(zLeak|aShare, s0)=10%, O(zNoLeak|aShare, s0)=1−O(zLeak|aShare, s0)=90%, O(zNoLeak|aShare, s100)=30%, O(zLeak|aShare, s100)=1−O(zNoLeak|aShare, s100)=70%, O(z0|aNoShare, s0)=O(z0|aNoShare, s100)=100% and initial sender belief about the trustworthiness of the recipient b0(s0)=b0 (s100)=50%. In such setting, the optimal policy of the sender yields the expected reward value “V” of 2.81. In Table I, shown in
The policy in Table 1,
One Fuzzy Recipient
In a more complex data leakage prevention domain where the recipient is fuzzy, i.e., leaks f % of the packets it receives thus appearing (to the sender) to be benevolent in some decision epochs and malevolent in other decision epochs, such domain is modeled using POMDPs that addresses the fact that the recipient's fuzziness “f” is never known to the sender, and can only be estimated by the sender, using the observations it receives. In such a domain, the sender maintains a probability distribution over all possible recipient fuzziness levels, i.e., a probability distribution over the probabilities with which the recipient can leak the packets. Because the number of all possible recipient fuzziness levels is infinite (f “belongs-to”[0, 1]), one cannot use POMDPs to model a fuzzy recipient exactly (due to an infinite POMDP state-space and the corresponding infinite transition/observation/reward functions). A problem of having to consider an infinite number of possible recipient fuzziness levels is circumvented by approximating the actual (unknown) recipient fuzziness level f within some error ε with only a finite set M of chosen fuzziness levels. Precisely, M is chosen to contain [1+(1/(2*ε))] uniformly distributed fuzziness levels so that for any (f “belongs-to” [0, 1]) there always exists some m “belongs-to” M where |f−m|<ε. The set of POMDP states is then S={sm}m ∈ M where sm is a state wherein the recipient leaks m % of the packets it receives. The set of sender actions and observations, A={anoShare, aShare) and Z={zLeak, znoLeak, z0} respectively, are the same as for a deterministic recipient. Similarly, (assuming that the recipient never changes the percentage of packets it leaks) the transition function is defined as P(sm|aShare, sm)=P(sm|aNoShare, sm)=1 for all sm “belongs-to” S. In defining the sender observation and reward functions, one needs to use the extreme values of these functions for a deterministic recipient case (when the recipient leaks 0% and 100% of packets it receives). Specifically, if the process is in state sm “belongs-to” S and the sender executes action aShare, there is m % chance that the packet will be leaked and (100−m) % chance that the packet will not be leaked and thus, R(sm, aShare)=(m/100)*R(s100, aShare)+((100−m)/100)*R(s0, ashare). Similarly, (recall that the sender detects a leak if the leak really occurred with probability O(zLeak|aShare, s100) and, if the leak did not occur, with probability O(zLeak|aShare, s0)) if the process is in state sm “belongs-to” S and the sender executes action aShare, it will observe a leak with probability O(zLeak|aShare, sm)=(m/100)*O(zLeak|aShare, s100)+((100−m)/100)* O(zLeak|aShare, s0).
To illustrate a domain with a fuzzy recipient, in the following example, it is assumed that the recipient fuzziness f is approximated with a set of fuzziness levels M={0%; 33%; 66%; 100%}. Also, let N=10, R(s0, aShare)=2, R(s100, aShare)=−1, O(zLeak|aShare, s0)=10%, O(zLeak|ashare, s100)=70%—exactly as in the deterministic recipient case. Similarly, the initial belief state of the sender is uniform, i.e., b0(sm)=0.25 for all m ∈ M. In such setting, the optimal policy of the sender yields the expected reward of 2.23. In Table II,
As can be seen in
Multiple Recipients
More complex data leakage prevention domains have the sender shares packets with multiple recipients, each recipient potentially leaking a different percentage of packets it receives. That is, there is now considered situations where a sender can choose which recipient (if any) should receive a packet in each decision epoch. In modeling such domains involving K>1 recipients there is first chosen the accuracy with which the actual (unknown) fuzziness values of each of the K recipients are to be approximated. Specifically, it is assumed that a set Mk of fuzziness levels that approximate the (unknown) fuzziness of recipient k for each recipient k “belongs-to” K. (As shown below, sets Mk need not to be equal as the sender might desire higher accuracy in approximating the fuzziness of more important recipients.)
A POMDP for a domain with multiple recipients is then defined as follows: Let m=(m1, . . . , mK) be a vector such that mk “belongs-to” Mk is the chance that recipient k leaks a packet it receives, for k “belongs-to” K. The set of states is then S={sm}m∈MIX . . . X MK. Because in each decision epoch the sender can share a packet with at most one recipient, the set of actions is A={anoShare, aShare(1), . . . , aShare(K)} where aShare(k) is an action that the sender executes to share a packet with recipient k. When the process is in state sm and the sender executes action aShare(k), the process transitions to the same state sm(recipients' fuzziness values never change) with probability 1. The sender then gets reward R(sm, aShare(k))≡R(smk, aShare) where the latter term is the sender reward in a single recipient case, as defined earlier (It is noted that the sender could vary the importance of sharing the packets with different recipients by assuming that different recipients offer different rewards for received packets). Finally, the set of observations Z={zLeak, znoLeak, z0} is the same as in the one recipient case, because the last performed action uniquely identifies the recipient who affects the sender last observation. As such, the observation function only depends on the fuzziness of the recipient that the packet was sent to, and thus, O(zLeak|aShare(k), sm)≡O(zLeak|aShare(k), smk) where the latter term is the sender observation function in a single recipient case, as defined earlier.
To illustrate a domain with multiple recipients on an example assume K=2 recipients whose fuzziness values are approximated with different accuracy, i.e., M1={0%, 100%} and M2=0%, 33%, 66%, 100%}. Also, let N=10, R(s0, aShare)=2, R(s100, aShare)=1, O(zLeak|aShare, s0)=10%, O(zLeak|aShare, s100)=70%—exactly as in a single recipient case. Similarly, let the initial belief state of the sender be uniform, i.e., b0(sm)=0.5*0.25=0.125 for all m “belongs-to” M1×M2. In such setting, the optimal policy of the sender yields the expected reward of 3.27. In Table III,
As can be seen in Table III (scenario 1), the sender always starts its optimal policy by sharing a packet with recipient 1, because recipient 1 appears to the sender to be more predictable (its fuzziness is approximated with fewer fuzziness levels) and consequently, it is easier for the sender to identify the trustworthiness of recipient 1 than to identify the trustworthiness of recipient 2. Next in Table III (scenario 2), if the sender observes no leaks while sharing the packets with recipient 1 in the 1st and 2nd decision epoch, it builds enough confidence about the trustworthiness of recipient 1 so that, even if a leak is observed after sharing a packet with recipient 1 in the 3rd decision epoch, the sender attributes this leak to its imperfect observations and decides to resume sharing packets with recipient 1 in the 4th decision epoch. However, (scenarios 3,4) if the sender observes no leak while sharing a packet with recipient 1 in the 1st decision epoch, but observes a leak while sharing a packet with recipient 1 in the 2nd decision epoch, sender confidence about the trustworthiness of recipient 1 is too low and the sender decides to switch to sharing the packets with recipient 2. In particular, (scenarios 4, 7 in Table III) if recipient 2 is also observed to be leaking the packets, the sender decides to stop sharing the packets with the recipients. Note that the sender ability to choose a recipient to share a packet with results in an increased expected reward of its optimal policy (equal to 3.27 as opposed to 2.81 and 2.23 when K=1).
Recipient Strategy
The methods for computing the sender policy assume that the number of decision epochs and the sender observation function (the accuracy of the mechanism that examines a watermark to determine if a packet is leaked or not) are fixed and known to both parties. Yet, there may be situations where the recipient can try to remove the watermarks from the packets, in an attempt to disguise the packets it leaks. In these situations, recipient's tampering with the watermark has a direct impact on the sender observation function. While this may seem to complicate the sender decision making, it is shown in the following that this is not the case: If both the sender and the recipient are rational and if they both know the domain parameters, the recipient strategy (how much it tampers with watermarks to obfuscate sender observations) is predictable, allowing the sender to compute its optimal policy when facing such a recipient. Note that it is of clear interest to the recipient to tamper with the watermarks. If the recipient leaves the watermarks intact, each time it leaks a packet, the leak will be detected with 100% accuracy by the sender (who may consequently stop sharing the packets with the recipient). On the other hand, if the recipient completely prevents the sender from detecting a leak, the sender may have little incentive to even begin sharing the packets with the recipient. Exactly how much to corrupt the watermarks therefore constitutes a decision problem in itself that every rational recipient has to face.
A decision problem on an example domain 200 with a deterministic recipient described herein above in now illustrated in view of
The POMDP solver, such as described in herein incorporated Pineau et al. reference, is implemented to conduct sensitivity and scalability analysis of the method applied to data leakage prevention. That is, the sensitivity of POMDP policies (found using the POMDP solver) towards the changes in the parameters of a data leakage prevention domain is determinable.
Regarding a scalability analysis: the POMDP solver (such as herein incorporated to Pineau et al.) provides solutions for all possible starting belief states “b”. In the data leakage prevention domain, it is already known what is going to be the value of “b” (i.e., a uniform distribution over the trustworthiness levels of the recipient, as sender does not know anything about the recipients). Although Pineau et al.'s POMDP solver can be used to find the solutions to the data leakage prevention problems, these solutions are found much faster than for a generic POMDP domain because (as stated earlier) a solution needs only to be found for a single (starting) belief state.
Referring to Table V,
In an alternate embodiment, a number of recipients may be grouped, i.e., a group of recipients can always be modeled as a single one at the expense of losing some accuracy in the decisions (e.g., either sender shares with all of them or with none). In this way, applications where the number of recipients is high could be approached with an initial classification stage wherein recipients with similar characteristics are grouped into a single “class”. As for the number of fuzziness levels, there exists a trade-off of how much computational overhead can be afforded against how accurately it is desired to approximate the recipient.
One consequence of varying the number of decision epochs (in scenarios where this is possible) is that it affects not only performance (more epochs=higher expected reward, as seen in
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a system, apparatus, or device running an instruction.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a system, apparatus, or device running an instruction. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may run entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which run via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which run on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more operable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be run substantially concurrently, or the blocks may sometimes be run in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The POMDP framework is a generalized model for planning under uncertainty [See, e.g., E. J. Sondik entitled The Optimal Control of Partially Observable Markov Processes. PhD thesis, Stanford University, 1971]. A POMDP can be represented using the following n-tuple: {S, A, O, b0, T, Ω, R, γ}, where S is a (finite) set of discrete states, A is a set of discrete actions, and O is a set of discrete observations providing incomplete and/or noisy state information. The POMDP model is parameterized by: b0(s), the initial belief distribution; T(s, a, s′):=Pr(st+1,=s|at=a, st=s), the distribution describing the probability of transitioning from state s to state s′) when taking action a; Ω(o, s, a):=Pr(ot+1=o|at=a, st+1=s), the distribution describing the probability of observing o> from state s after taking action a; R(s, a), the reward signal received when executing action a′ in state s; and γ, the discount factor.
A key assumption of POMDPs is that the state is only partially observable. Therefore we rely on the concept of a belief state, denoted b, to represent a probability distribution over states. The belief is a sufficient statistic for a given history:
b
1
:=Pr(st|b0,ao, o1, . . . , ot−1, at−1, ot) (1)
and is updated at each time-step to incorporate the latest action, observation pair:
where η the normalizing constant. The goal of POMDP planning is to find a sequence of actions {a0, . . . , at} maximizing the expected sum of rewards E[Σtγ′R(st, at)] Given that the state is not necessarily fully observable, the goal is to maximize expected reward for each belief. The value function can be formulated as:
When optimized exactly, this value function is always piecewise linear and convex in the belief. After n consecutive iterations, the solution consists of a set of a-vectors: Vn={a0, a1, . . . , am}. Each a-vector represents an |S|-dimensional hyper-plane, and defines the value function over a bounded region of the belief:
In addition, each a-vector is associated with an action, defining the best immediate policy assuming optimal behavior for the following (n−1) steps (as defined respectively by the sets {Vn−1, . . . V0}).
The n-th horizon value function can be built from the previous solution V−1 using the Backup operator, H. We use notation V=HV′ to denote an exact value backup:
A number of algorithms have been proposed to implement this backup by directly manipulating a-vectors, using a combination of set projection and pruning operations [See, e.g., Sondik, 1971; A. Cassandra, M. Littman, and N. Zhang entitled “Incremental pruning: A simple, fast, exact method for partially observable Markov decision processes” in UAI, 1997; and, N. L. Zhang and W. Zhang entitled “Speeding up the convergence of value iteration in partially observable Markov decision processes” Journal of Artificial Intelligence Research, 14:29-51, 2001]. We now describe the most straight-forward version of exact POMDP value iteration.
To implement the exact update V=HV′, we first generate intermediate sets Γa,* and
Next we create Γhu a (∀ a ∈ A), the cross-sum over observations, which includes one aa,0 (Step 2):
Γa=Γa.* ⊕ Γa,o
Finally we take the union of Γa sets (Step 3):
V=∪a∈A⊕a (7)
In practice, many of the vectors in the final set V may be completely dominated by another vector (ai·b<aj·b, ∀b), or by a combination of other vectors. Those vectors can be pruned away without affecting the solution. Finding dominated vectors can be expensive (checking whether a single vector is dominated requires solving a linear program), but is usually worthwhile to avoid an explosion of the solution size.
To better understand the complexity of the exact update, let |V′| be the number of α-vectors in the previous solution set. Step 1 creates |A∥O∥V′| projections and Step 2 generates |A∥V′||Q| cross-sums. So, in the worst case, the new solution|V| has size |A∥V′||O| (time |S|2|A∥V′||O|). Given that this exponential growth occurs for every iteration, the importance of pruning away unnecessary vectors is clear. It also highlights the impetus for approximate solutions.
The U.S. Government has a paid-up license in this invention and the right in limited circumstances to require the patent owner to license others on reasonable terms as provided for by the terms of Contract. No. W911NF-06-3-0001 awarded by the United States Army.