1. Technical Field
The invention relates to a system and method for providing efficient policy rule updates in policy based data management. More particularly, the invention relates to a system and method for restraining the size of the set of data objects to be examined after a policy rule update.
2. Background of the Invention
Many data objects such as business records, weather data, security information, and the like are now stored on digital media. Users of storage systems may have millions or even billions of data objects to manage. Manually managing such large numbers of data objects is not practical. Policy based data management automates tasks to a great extent and is essential for a system containing large numbers of data objects.
In a typical system with large numbers of data objects, policy rules are used to facilitate the management tasks. A typical policy rule includes scope, priority, condition, and action. Scope defines the domain that a rule will cover. Rules with different scopes will handle orthogonal actions and do not interfere with each other. A rule with a smaller priority number carries higher priority and overwrites lower priority rules. The action is taken on a data object if the condition is matched. To determine if the condition attribute of a data object matches the condition requirement of a rule, a calculation is performed to compare the scope and condition of the policy rule with the corresponding attributes of the data object. Large data management systems commonly include an attribute server and an attribute indexer. Data objects have attributes such as confidentiality level, age, and the like. These attributes are maintained by the attribute server. The attribute indexer maintains the indices for the data object attributes and facilitates any query process on the attributes.
Policy rules are applied to data objects to perform management functions. Table 1 gives three illustrative examples of policy rules. The system will use rule 1 to search for data objects having the condition of creation time older than one year. When those data objects are found, the action taken is deletion of the found data objects. In order to find the data objects in this example, a computation is made comparing the condition of rule 1 with the attributes of each data object to determine if the creation time is older than one year. Typically, a computation must be made for each of the very large number of data objects. Thus, computing even one policy rule against every data object requires considerable system resources and can have a very large impact on system performance and system throughput.
One feature of data objects is that attributes (such as content category, file size, ownership, retention, etc.) of the objects change over time. The policy rules also may change over time. The policy rules may be deleted, added, altered, modified, or otherwise updated depending on either system or user requirements. Typically, policy rules are applied to all the objects in the system from the highest priority to the lowest priority. In Table 1, for example, rule 2 with a scope of “confidentiality” and a priority of 1 is applied to all the data objects. Then rule 3 also with a scope of “confidentiality” but with a priority of 2 is applied to all the data objects. However, a rule of lower priority is not allowed to alter the action of a rule with higher priority. In the example from Table 1, rule 3 is not allowed to overwrite the actions taken by rule 2. In general, if any of the policy rules are updated (deleted, added, altered, or modified) then a cycle of computations is launched comparing the rules to the appropriate attributes of each one of the data objects.
The overhead of computing each rule against each data object in a typical data management system is a very expensive use of system resources. Such computations have a deleterious impact on system throughput and system performance. What is needed is a method and system wherein the number of data objects to be included in the policy rule calculations can be constrained to a smaller set thereby resulting in greater system efficiency.
In one embodiment, the invention provides for the creation of an effective policy rule. In addition, an embodiment of the invention provides for a method of restricting the number of data objects to be examined when a policy rule is updated. In one embodiment, the condition of a policy rule is calculated against the attributes of a data object to determine if the condition of the data object is a match for the specified policy rule condition. If the conditions are met then action is taken on the data object and the policy rule is stored along with the attributes of the data object. The stored policy rule, called herein an effective policy rule, is then used to restrict the number of data objects to be examined when a policy rule update is made. In one embodiment, when a new policy rule is introduced, the set of data objects is identified that have an effective priority less than the priority of the new policy rule. The new policy rule is then calculated against each data object in the set of data objects. In another embodiment, when a policy rule is deleted, the set of data objects are found having the policy rule to be deleted as an effective rule. The remaining policy rules with a priority less than the priority of the deleted rule is calculated against each of the data objects in the set of data objects. In another embodiment, when a policy rule is updated, there is a two step method of first deleting the original policy rule, then adding the updated policy rule. In all of these embodiments, a restricted set of data objects are involved in the application of policy rules resulting in improved system performance and throughput.
Referring now to the drawings which are intended to be illustrative of typical embodiments of the invention and are not considered to limit the scope of the invention nor to exclude other equally effective embodiments:
The present invention provides a method and system for restraining the number of data objects which must be inspected when either a data object is altered or a policy rule is deleted, added, altered, modified, or otherwise updated. By constraining the set of data objects to be inspected, the number of computations is limited and the system is more efficient.
In certain embodiments of the invention, a policy rule is stored along with the attributes of a data object when the conditions of the policy rule match that of the data object and an action is taken. In certain embodiments of the invention, only the identifier of the policy rule is stored. A policy rule thus stored with the data object is herein called an effective policy rule. If the identifier of the policy rule is stored, then the stored identifier is also called an effective policy rule. Either storing the policy rule or storing the identifier of the policy rule results in an effective policy rule. The priority of the policy rule thus stored is herein called an effective priority. Minimal space is required to store effective policy rules. In one embodiment, the effective rule and the effective priority are stored as additional fields along with other attributes of the data objects in a database table. The information for policy rules stored with each data object is conveniently indexed and queried through known methods and techniques such as using structured query language (SQL) or the like. Using a query language such as SQL is much less consumptive of system resources than performing the calculations of policy rules against data objects. The results from a query language search of the effective policy rule information is used to significantly constrain the number of data objects to be calculated against a policy rule.
When first using the invention in a policy based data management system, it is preferable to initialize the system. One embodiment 200 of an initialization is illustrated in
When a policy rule is deleted, the actions from lower priority policy rules will be allowed. In
In
In each of the examples discussed above for policy rule deletion, addition, and update, the set of data objects to be calculated against is less than the total number of data objects. Thus in these examples, the embodiments of the invention result in greater efficiency.
The described embodiments of the invention may be implemented as a method, computer program product, apparatus, or system using standard programming and related engineering techniques to produce software, firmware, hardware, and any combination of these. Each of the embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment contain both hardware and software elements. The embodiments may be implemented in software, which includes but is not limited to firmware, resident software, microcode, and the like.
The embodiments of the present invention may take the form of a computer program product accessible form a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer-readable medium may be any apparatus that may contain, store, communicate, propagate, or transport the program for use by or in connection with the execution system, apparatus, or device.
The described operations may be implemented as code maintained in a computer-usable or computer-readable medium, where a processor may read and execute the code from the computer readable medium. The medium may be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a magnetic disk drive, a removable disk, an optical disk, volatile and non-volatile memory devices, and the like.
The code implementing the desired operations may further be implemented in hardware logic such as an integrated chip, or programmable array, or the like. Additionally, the code implementing the described operations may be implemented in transmission signals, where transmission signals may propagate through space or through a transmission medium such as an optical fiber, copper wire, and the like. The transmission signals in which the code or logic is encoded may further comprise a wireless signal (local or long distance), satellite transmission, and the like. The transmission signals in which the code or logic in encoded is capable of being transmitted by a transmitting station and received by a receiving station, where the code or logic encoded in the transmission signal may be decoded and stored in hardware or a computer readable medium at the receiving and transmitting stations or devices. Logic, as used here, may include software, hardware, firmware, or any combination thereof. Those skilled in the art will recognize that many modifications may be made to these configurations without departing from the scope of the embodiments, and that the computer product may comprise any suitable information bearing medium known in the art.
The embodiments described in detail above are illustrative examples and illustrate specific operations occurring in a particular order. In alternative embodiments, certain of the logic operations may be performed an alternate order, modified, or be removed and remain within the scope of the invention. Further, certain operations described herein may occur sequentially or certain operations may be processed in parallel. Certain operations may also be implemented as a single process or as distributed processes.