This application relates generally to synchronization of information and more specifically to synchronization of information in a plurality of information structures or hierarchies.
Often a company stores important information in various data sources. For example, a human resources department may store information about employees in a human resources data source. The human resources data source may be arranged or organized according to a human resources specific information structure or hierarchy. A finance department may also store information about employees, clients, suppliers, etc., in a finance department data source. The finance department data source may be arranged or organized according to a finance department information structure or hierarchy. It is likely that some common information exists in both data sources. Thus, synchronizing the information becomes desirable.
A synchronizing process typically implements rules and/or specifications to adequately harmonize information in various data sources. Further, such a process may rely on an engine capable of executing software and a storage capable of storing the information, as appropriate. In general, the synchronizing process may replicate information from various data sources in a central storage, wherein the replicated information has some degree of integrity. To achieve this task, information from the various data sources are either pushed or pulled into the central storage. In addition, information may be pulled or pushed out of such a central storage to the various data sources. Maintaining efficiency and information integrity in such an environment can become a daunting task. Various exemplary methods, devices and/or systems described below are directed at that task.
Briefly stated, modifications to references are propagated between entities in correlated namespaces. A first object in one external namespace refers to a second object in the one external namespace. The first object and the second object have associated central representations in a central namespace. A change to that reference is propagated to a third object in a third namespace by evaluating the associations between the central representations in the central namespace to determine if the third object is associated with one of the central representations, and if so, propagating the change to the reference.
In another aspect, a user interface is described that allows a user of a system to define relationships that govern the flow of data from one namespace to another. The user interface may be graphical, and include fields that allow the user to identify source and target entities and the direction of the flow of data. Configuration information describing the defined relationships may be stored in conjunction with each namespace the configuration information may take the form of a markup language file, and more specifically an eXtensible Markup Language file.
In yet another aspect, a technique is described for propagating a reference change from a first object in a first namespace to a related second object in another namespace by correlating the first object to a central representation, identifying another central representation, if any, corresponding to the referent of the reference, and identifying any other objects associated with the other central representation. Any such other objects that depend on data stored in association with the other central representation then receive that data. The data may be reformatted or otherwise reconstituted.
In still another aspect, a technique is described for propagating a name change of a referent in a reference field of a first object in a first namespace to a related second object in a second namespace by correlating the referent to a central representation of the referent, identifying any other objects associated with that central representation, and propagating the name change to those other objects. The correlation of the referent to its central representation is performed using an immutable property of the referent, such as a globally unique identifier.
The following description sets forth a specific embodiment of a system for propagating information between entities in correlated namespaces. This specific embodiment incorporates elements recited in the appended claims. The embodiment is described with specificity in order to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed invention might also be embodied in other ways, to include different elements or combinations of elements similar to the ones described in this document, in conjunction with other present or future technologies.
The following discussion refers to an information environment that includes a metadirectory. While a metadirectory is used here for explanatory purposes, the various mechanisms and methods described here may also be applied generally to other environments where synchronization of information is desired. In general, information should be identifiable in an information environment, for example, through use of an identifier, and preferably an immutable or traceable identifier. In some instances, information is structured or organized in a hierarchy.
Exemplary Metadirectory System
There may also be an object in another data source that pertains to the same body of information, but includes slightly different characteristics or information. For example, DSB 160 may be an information technology server that includes information about the logon accounts of employees. Accordingly, there may be a corresponding object within DSB 160 for each or many of the objects in DSA 150. However, the particular body of information for the objects within DSB 160 would be slightly different than those within DSA 150. Collectively, the information associated with a particular body of information are sometimes referred to as “identity data” or the like.
The metadirectory 102 is an infrastructure element that provides an aggregation and clearinghouse of the information stored within each of the several data sources associated with the metadirectory 102. The metadirectory 102 includes storage 130 in which reside “entities” that represent the individual bodies of information stored in each associated data source. Disparate information from different data sources that pertains to the same body of information (e.g., an individual, asset, or the like) is aggregated into a single entity within the metadirectory 102. In this way, a user can take advantage of the metadirectory 102 to view at a single location information that is stored piecemeal in several different data sources. Such an exemplary metadirectory may consolidate information contained in multiple data sources in a centralized manner, manage relationships between the data sources, and allow for information to flow between them as appropriate.
The storage 130 is for storing the aggregated and consolidated information from each of the associated data sources. The storage 130 may be a database or any other mechanism for persisting data in a substantially permanent manner. As will be described more fully in conjunction with
The metadirectory 102 also includes rules 110 and services 120 that are used to consolidate, synchronize, and otherwise maintain the integrity of the information presented through the metadirectory 102. The rules 110 and services 120 form or define one or more protocols, APIs, schemata, services, hierarchies, etc. In this particular embodiment, the rules 110 include at least a “join function” that defines relationships between entities in the metadirectory 102 and objects in the associated data sources. In one embodiment, the join function may produce a join table 111 that identifies links between entities in the storage and objects in each associated data source. One illustrative join table 111 is described in greater detail below in conjunction with
A user interface 140 is provided that allows a user (e.g., a system administrator) to view and manipulate information within the metadirectory 102, the configuration of the metadirectory 102, or both. In this particular embodiment, the user interface 140 is graphical in nature and is configured to, among other things, enable the user to declaratively define relationships between entities or their attributes in the several data sources. One illustrative embodiment of the user interface is described below in conjunction with
As mentioned above, the storage 130 may include a core 211 and a buffer 221. The core 211 represents data that is considered to accurately reflect (from the perspective of a user of the metadirectory 102) the information in the various data sources. In contrast, the buffer 221 includes data of a more transient nature. Recent changes to data at the data sources are reflected in the buffer 221 until that data can be committed to the core 211.
As illustrated, a data source (e.g., DSA 150) presents to the metadirectory 102 change data that represents changes to an external entity (e.g., external entity 250) stored within the data source. The change data may indicate that information about an object within the data source has changed in some fashion, or perhaps the change data indicates that its corresponding object has been deleted or added. A buffer entity (e.g., buffer entity 260) is first created using the change data to create an entity that represents the external entity (e.g., external entity 250) at the data source (e.g., DSA 150). Essentially, the buffer entity 260 mirrors its corresponding external entity 250. Other data sources (not shown) may also be presenting their own change data to the buffer 221 as well. The change data may sometimes be referred to as “delta information” or “deltas.”
As used in this document, the term “namespace” means any set of entities. The entities in a namespace may be unordered. Accordingly the term namespace may be used to refer to any set of entities, such as the core 211 or the buffer 221 (i.e., a core namespace or a buffer namespace). Or the term namespace may be used to refer collectively to the metadirectory 102 as a namespace. Similarly, any of the data sources may sometimes be referred to as namespaces.
A process termed synchronization occurs to reconcile the buffer entities in the buffer 221 with their corresponding core entities in the core 211. For instance, in this example buffer entity 260 is associated with core entity 270. Through the synchronization process, modifications represented in the buffer entity 260, as well as perhaps other buffer entities, become reflected in the core entity 270. By creating this synchronized relationship between the buffer 221 and the core 211, a pair of namespaces (e.g., a buffer namespace and a core namespace) may be referred to as “correlated namespaces.”
The synchronization process may also result in the creation or modification of other buffer entities (e.g., buffer entity 265) that represent changes to be made to objects in other data sources. The changes harmonize the information stored in the metadirectory 102 with the information stored in each of the data sources. In other words, a change to an object in data source DSA 150 may first be reflected in the metadirectory 102, but then that change may be pushed out of the metadirectory 102 to other data sources, such as to data source DSB 160 through buffer entity 265.
Each entity also includes an “identity” (e.g., identity 312), which is preferably a string value 322 that is globally unique. The identity of an entity does not change, i.e. it is an immutable property of the entity 310. In one example, the identity may be a Globally Unique IDentifier (“GUID”).
The use of both a name and a unique identifier may at first appear redundant, but each has a special purpose. For example, a human-readable name is intuitive and may be used to reflect a real-world property or concept, thus making the name very useful to users. However, this usability typically means that the name should also be changeable. In contrast, a globally unique identifier conveys little in terms of readability or intuitive message. It does however, effectively distinguish the entity from every other entity in existence.
The entity 310 includes an arbitrary number of reference attributes (e.g., reference attribute 313) that contain name/identity pairs 323 of other entities within the same namespace referred to by the referring entity. The reference attribute 313 can be used to create a loose association between entities when that information is helpful. For example, several entities may relate to individuals, and the reference attribute of one entity could be used to point to another entity that represents the manager of the individual represented by the first entity. The reference attribute 313 may have a single reference pair, or it may include multiple reference pairs, such as a distribution list. The reference attributes allow the modeling of arbitrary, directed relationships between entities.
The entity 310 may also include an arbitrary number of user data attributes (e.g., data—1 314 and data—2 315) that contain user data (e.g., user info 324 and 325, respectively). The user data attributes are helpful for storing random information that may be useful to the user or to the systems administering the entity or data sources. Of course, still other attributes may be included that are not described here without departing from the spirit of this disclosure.
The relationship between the entities is basically that entity A 410 and entity B 411 essentially represent the same body of information, and central entity 430 is an aggregated representation of those two entities. As discussed above, that could mean that both entity A 410 and entity B 411 relate to the same individual, corporate asset, e-mailing list, or any other body of information. In that case, central entity 430 could then be a common view into the information contained within each of those entities. The relationships between entities in the metadirectory 102 may be described in a persistent fashion (in this example) in a “join table” 501 or other comparable structure. One embodiment of the join table 501 is described below in conjunction with
It is not imperative that each of the entities refer to the same body of information, but it simplifies this discussion. It will become apparent how the mechanisms and techniques described here have equal applicability in the alternative case where each entity does not refer to the same body of information.
Note that the several entities need not include identical information. For example, entity A 410 includes an e-mail address 414 while entity B 420 does not. Similarly, entity A 410 does not include salary information while entity B 420 does (salary 425). However, the central representation (i.e., central entity 430) includes all the data from each of its associated entities (e.g., entity A 410 and entity B 420). Note that central entity 430 includes both an e-mail address 434 and a salary 435.
In addition, for no particular reason, the two entities may have the same information possibly formatted differently. For example, entity A 410 may identify a manager 416 of an individual with which the entity is associated, and entity B 420 may also. However, the manager 416 of entity A 410 may be identified by name, while the manager 426 of entity B 420 may be identified by e-mail address.
It will be appreciated that different attributes of the entities may be “mastered” in different locations. In other words, if multiple entities (e.g., entity A 410 and entity B 420) represent a similar attribute (e.g., both entities have a “manager”), then the value of that attribute may be governed by a master entity. For example, if entity A 410 is defined as the master of the “manager” attribute, then the value of the manager 436 in the metadirectory 102 should always be taken from entity A 410. Similarly, when harmonizing the information in each of the data sources, it is envisioned that entity B 420 will get its value for the manager 426 from the metadirectory 102, thus ensuring that only one data source governs the value of particular attributes.
The described system includes a particular mechanism that is especially well suited to establishing these governing master/slave relationships between entities in the form of a graphical user interface. One example of that graphical user interface is described below in conjunction with
As described above, each entity may include reference attributes that refer to other entities. In this way, one entity may essentially incorporate the information of another entity in a simplified way. For example, as illustrated in
In this embodiment, the join table 501 includes two sets of information: a first set 510 that includes at least one record for each entity in the metadirectory 102, and a second set 520 that identifies each entity with which the central entities are associated. Each record in the first set 510 includes at least the identity of a corresponding central entity. Each record in the first set 510 may also include the name of the corresponding central entity. Each record in the second set 520 also includes at least the identity of its corresponding external entity, and may also include its name.
Mappings are provided that link each record in the first set 510 with the records in the second set 520. So as illustrated in
It should also be noted that although the use of identities (as immutable 19 properties of entities) has been described here, a less robust system could be achieved through the use of names as the describing characteristic.
Referring first to
The user interface 600 includes a flow direction portion 610 to indicate whether the rule being created governs the import or export of data into or out of (respectively) the metadirectory. The import option is selected in
A data source attribute portion 612 and a metaverse attribute portion 614 present the user with options to select which particular attributes are being affected. In other words, the master/slave relationships for an entity are defined (in this example) on a per-attribute basis. Thus, the selections illustrated in
It should be noted that the user interface 600 is being used to create a rule that governs the flow of data for groups of entities that satisfy a particular entity or object “type.” This implementation detail avoids the need to create a separate rule to control every entity in the metadirectory 102. Thus, each entity may be assigned a particular “type” or “class” and its corresponding attributes would then be governed by the rule created for that particular type or class of entity.
Referring now to
The process 1000 begins at starting step 1001, where a metadirectory receives notice of a change to an attribute of an external entity. For the purpose of this discussion, it will be assumed that the change affects a reference attribute, and that the entity issuing the change is the master of the attribute (or belongs to the data source which is designated as the master of the attribute). The entity issuing the change will be termed the “master entity,” and the entity being referred to by the change to the reference attribute is termed the “referent.”
At block 1011, the process 1000 applies the change to the central entity that corresponds to the master entity. Using the join table 501 (
At block 1021, the central entity corresponding to the referent is identified (hereafter referred to as the central referent). Again, referring to the join table 501 and using the identity of the referent, the central referent is discovered. Recall that the value of a reference attribute is an identity/name pair. Thus, the central referent is easily identified by determining from the join table 501 which central entity correlates to the identity of the referent.
At block 1031, any external entities that depend on the affected attribute are discovered. In this embodiment, this may be achieved by referring to any configuration files (see
At block 1041, the particular character of data for the attribute is discovered be evaluating the affected external entity. For example, if the affected attribute of the master entity identifies (refers to) an employee's manager, it is possible that a different external entity may have an attribute that identifies that employee's manager but perhaps by e-mail alias rather than by reference. In this case, the particular data of interest is the e-mail alias of the manager and not a reference to the manager. Accordingly, at block 1041, the process determines the particular format of the data for the current external entity. The format, as used here, of the data may also mean a value from a different attribute of the central referent, such as a user data attribute, or the like.
At block 1051, the particular data needed for the current external entity is retrieved from the central referent, and, at block 1061 that data is propagated to the current external entity. Once this step is performed, the process loops 1060 until all affected external entities have been addressed.
It should be noted that the change to the reference attribute may have been caused by a change on the referent entity which modified the name of that entity. In other words, the reference attribute may continue to point to the same referent, but by a different name. The system described here includes techniques to address propagating that change as well.
At block 1120, the process 1100 identifies and retrieves the central entity that correlates to the master entity. This step may be achieved with reference to the join table 501 (
At block 1130, the process 1100 determines that the change reflects only a name change of the referent. This may be achieved by comparing the change data from the master entity with the stored data in the central entity. Because the current system identifies entities by both identity (immutable) and name (mutable), the process 1000 can merely determine if the identities are the same between the data stored in the central entity and the change data. If so, then the change is only a name change.
At block 1140, the process 1140 identifies other external entities that depend on the referent. For the purpose of this discussion, the term “depends on” means that some data or attribute value of the external entity is derived from data or attribute values of the referent. This identification can be performed by simply querying the join table 501 for the identity of the referent (which has not changed) even though the name has changed. This ability is one of the benefits of persisting the immutable identity information to correlate entities in the metadirectory with external entities.
At block 1150, an appropriate translation occurs to alter the dependant data of the identified external entities to reflect the name change of the referent. It should be noted that the particular fashion in which the external entity depends on the name of the referent controls the form of the transformation. In other words, it is impossible to determine in advance exactly how other external entity will depend on a referent, and accordingly, the particular transformation applied will be based on the particular manner of dependency. The process 1100 may loop (1145) until all affected entities have been processed.
Exemplary computer 1200 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by computer 1200 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 1200. Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
The system memory 1230 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 1231 and random access memory (RAM) 1232. A basic input/output system 1233 (BIOS), containing the basic routines that help to transfer information between elements within computer 1200, such as during start-up, is typically stored in ROM 1231. RAM 1232 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 1220. By way of example, and not limitation,
The exemplary computer 1200 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media discussed above and illustrated in
The exemplary computer 1200 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 1280. The remote computer 1280 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to computer 1200, although only a memory storage device 1281 has been illustrated in
When used in a LAN networking environment, the exemplary computer 1200 is connected to the LAN 1271 through a network interface or adapter 1270. When used in a WAN networking environment, the exemplary computer 1200 typically includes a modem 1272 or other means for establishing communications over the WAN 1273, such as the Internet. The modem 1272, which may be internal or external, may be connected to the system bus 1221 via the user input interface 1260, or other appropriate mechanism. In a networked environment, program modules depicted relative to the exemplary computer 1200, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
The subject matter described above can be implemented in hardware, in software, or in both hardware and software. In certain implementations, the exemplary flexible rules, identity information management processes, engines, and related methods may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The subject matter can also be practiced in distributed communications environments where tasks are performed over wireless communication by remote processing devices that are linked through a communications network. In a wireless network, program modules may be located in both local and remote communications device storage media including memory storage devices.
Although details of specific implementations and embodiments are described above, such details are intended to satisfy statutory disclosure obligations rather than to limit the scope of the following claims. Thus, the invention as defined by the claims is not limited to the specific features described above. Rather, the invention is claimed in any of its forms or modifications that fall within the proper scope of the appended claims, appropriately interpreted in accordance with the doctrine of equivalents.