Method and system for data replication

Information

  • Patent Grant
  • 6615223
  • Patent Number
    6,615,223
  • Date Filed
    Tuesday, February 29, 2000
    24 years ago
  • Date Issued
    Tuesday, September 2, 2003
    21 years ago
Abstract
A method and mechanism for data replication is disclosed. One embodiment of the invention relates to an efficient and effective replication system using LDAP replication components. Another embodiment of the invention pertains to a schema and format independent method for data replication. Procedures for adding, deleting, and modifying replicated data, and for replicating conflict resolution are also disclosed. A further embodiment of the invention is directed to improved methods and mechanisms for adding and removing nodes from a replication system.
Description




BACKGROUND OF THE INVENTION




1. Field of the Invention




The invention relates to the replication of data in a database system.




2. Background




Data replication is the process of maintaining multiple copies of a database object in a distributed database system. Performance improvements can be achieved when data replication is employed, since multiple access locations exist for the access and modification of the replicated data. For example, if multiple copies of a data object are maintained, an application can access the logically “closest” copy of the data object to improve access times and minimize network traffic. In addition, data replication provides greater fault tolerance in the event of a server failure, since the multiple copies of the data object effectively become online backup copies if a failure occurs.




In general, there are two types of propagation methodologies for data replication, referred to as “synchronous” and “asynchronous” replication. Synchronous replication is the propagation of changes to all replicas of a data object within the same transaction as the original change to a copy of that data object. For example, if a change is made to a table at a first replication site by a Transaction A, that change must be replicated to the corresponding tables at all other replication sites before the completion and commitment of Transaction A. Thus, synchronous replication can be considered real-time data replication. In contrast, asynchronous replication can be considered “store-and-forward” data replication, in which changes made to a copy of a data object can be propagated to other replicas of that data object at a later time. The change to the replicas of the modified data object does not have to be performed within the same transaction as the original calling transaction.




Synchronous replication typically results more overhead than asynchronous replication. More time is required to perform synchronous replication since a transaction cannot complete until all replication sites have finished performing the requested changes to the replicated data object. Moreover, a replication system that uses real-time propagation of replication data is highly dependent upon system and network availability, and mechanisms must be in place to ensure this availability. Thus, asynchronous replication is more generally favored for noncritical data replication activities. Synchronous replication is normally employed only when application requires that replicated sites remains continuously synchronized.




One approach to data replication involves the exact duplication of database schemas and data objects across all participating nodes in the replication environment. If this approach is used in a relational database system, each participating site in the replication environment has the same schema organization for the replicated database tables and database objects that it maintains. If a change is made to one replica of a database table, that same change is propagated to all corresponding database tables to maintain the consistency of the replicated data. Since the same schema organization used the replicated data across all replication sites, the instructions used to implement the changes at all sites can be identical.




Generally, two types of change instructions have been employed in data replication systems. One approach involves the propagation of changed data values to each replication site. Under this approach, the new value for particular data objects are propagated to the remote replication sites. The corresponding data objects at the remote sites are thereafter replaced with the new values. A second approach is to use procedural replication. Under this approach, a database query language statement, e.g., a database statement in the Structured Query Language (“SQL”), is propagated instead of actual data values. The database statement is executed at the remote sites to replicate the changes to the data at the remote replication sites. Since all replication sites typically have the same schema organization and data objects, the same database statement can be used at both the original and remote sites to replicate any changes to the data.




A significant drawback to these replication approaches is that they cannot be employed in a heterogeneous environment in which the remote replication sites have different, and possibly unknown, schema organizations for the replicated data. For example, consider if information located in a single database table at a first replication site is stored within two separate tables at a second replication site. The approach of only propagating changed values for a data object to a remote replication site presents great difficulties, since the data object to be changed at the first replication site may not exist in the same form at the second replication site (e.g., because the data object exists as two separate data items at the second replication site). Using procedural replication results in similar problems. Since each replication site may have a different schema organization for its data, a different database statement may have to be specifically written to make the required changes at the remote sites. Moreover, if the schema organization of the remote site is unknown, it is impossible to properly formulate a database statement to replicate the intended changes at the remote site.




Another drawback to these approaches in which database schema and objects are exactly duplicated across the replication environment is that they require greater use of synchronous replication. If a schema change is made to a database table at one site, then that change must be synchronously propagated to all other sites. This is because the basic structure of the table itself is being changed. Any further changes to that database table without first synchronously changing the underlying schema for that table could result in conflicts to the data. Moreover, synchronous replication of the schema changes could require that the replication environment be quieced during the schema change, affecting the availability of the system.




One type of database application for which data replication is particularly useful is the replication of data for directory information systems. Directory information systems provide a framework for the storage and retrieval of information that are used to identify and locate the details of individuals and organizations, such as telephone numbers, postal addresses, and email addresses.




One common directory system is a directory based on the Lightweight Directory Access Protocol (“LDAP”). LDAP is an object-oriented directory protocol that was developed at the University of Michigan, originally as a front end to access directory systems organized under the X.500 standard for open electronic directories (which was originally promulgated by the Comite Consultantif International de Telephone et Telegraphe “CCITT” in 1988). Standalone LDAP server implementations are now commonly available to store and maintain directory information. Further details of the LDAP directory protocol can be located at the LDAP-devoted website maintained by the University of Michigan at http://www.umich.edu/˜dirsvcs/ldap/doc/, including the following documents (which are hereby incorporated by reference in their entirety): RFC-1777 Lightweight Directory Access Protocol; RFC-1558 A String Representation of LDAP Search Filters; RFC-1778 The String Representation of Standard Attribute Syntaxes; RFC-1779 A String Representation of Distinguished Names; RFC-1798 Connectionless LDAP; RFC-1823 The LDAP Application Program Interface; and RFC-1959 An LDAP URL Format.




LDAP directory systems are normally organized in a hierarchical structure having entries organized in the form of a tree, which is referred to as a directory information tree (“DIT”). The DIT is often organized to reflect political, geographic, or organizational boundaries. A unique name or ID (which is commonly called a “distinguished name”) identifies each LDAP entry in the DIT. An LDAP entry is a collection of one or more entry attributes. Each entry attribute has a “type” and one or more “values.” Each entry belongs to a particular object class. Entries that are members of the same object class share a common composition of possible entry attribute types.




There are significant drawbacks to existing systems for performing replication of LDAP entries, objects, and attributes. Many conventional replication systems used for LDAP replication do not have robust procedures for adding or deleting replication nodes. For example, the addition or deletion of replication nodes in a conventional LDAP system often results in system downtime to implement configuration changes. Moreover, many existing systems for LDAP replication do not have robust procedures for adding, deleting, or modifying replicated data or handling replication conflicts.




Therefore, there is a need for an improved method and system for replicating data in a database system. There is further the need for a robust and efficient replication system for performing LDAP replication.




SUMMARY OF THE INVENTION




The present invention is directed to methods and mechanisms for data replication. According to an aspect of the invention, an efficient and effective replication system is disclosed using LDAP replication components. Another aspect of the invention pertains to a schema and format independent method and method for data replication. Yet another aspect of the invention relates to procedures for adding, deleting, and modifying replicated data and for replication conflict resolution. Another aspect of the invention relates to improved methods and mechanisms for adding and removing nodes from a replication system.











Further details of aspects, objects, and advantages of the invention are described below in the detailed description, drawings, and claims.




BRIEF DESCRIPTION OF THE DRAWINGS




The accompanying drawings are included to provide a further understanding of the invention and, together with the detailed Description of Embodiment(s), serve to explain the principles of the invention.





FIG. 1

depicts a system architecture for data replication according to an embodiment of the invention.





FIGS. 2A

,


2


B, and


2


C depict one approach for storing LDAP data in database tables.





FIG. 3

depicts a system architecture for replicating LDAP directory data according to an embodiment of the invention.





FIG. 4

shows an alternate approach for storing LDAP data in database tables.





FIG. 5

illustrates an example of a directory information tree.





FIG. 6

is a diagram of a computer hardware system with which the present invention can be implemented.





FIG. 7

is an additional diagram of computer hardware system with which the present invention can be implemented.





FIG. 8

depicts a revised version of the table shown in FIG.


2


C.





FIG. 9

is a flow diagram showing a process for adding a new LDAP site to a replication environment according to an embodiment of the invention.





FIG. 10

is a flow diagram showing a process for removing an LDAP site from a replication environment according to an embodiment of the invention.





FIG. 11

illustrates a revised version of the directory information tree shown in FIG.


5


.











DETAILED DESCRIPTION




The present invention is directed to a method and mechanism for replication in a database system that does not depend upon the same schema or data organizations being maintained at each replication site. The present invention is particularly well suited for LDAP data replication. According to one aspect of the invention, any data changes at a first replication site are replicated to other replication sites using schema and system independent change records. The change records are created in a standard format that is usable by all other replication sites in the system. Once the change record has been propagated to each remote replication site, the change record is then utilized to implement database instructions that are appropriate for the specific schema and system parameters of the remote site.





FIG. 1

depicts a system architecture for performing data replication according to an embodiment of the invention. Note that

FIG. 1

illustrates the invention with reference to two replication sites; however, the inventive principles described herein is equally applicable to systems having more than two replication sites.




A first replication site


2


includes a database


4


having database data


6


and a data dictionary


8


. Data dictionary


8


contains metadata that describes the schema and data organizations of database


4


. First replication site


2


includes a server


10


that is responsible for accessing and modifying the database data


6


in database


4


. Any client


14


that seeks to modify the database data


6


sends a request


12


to server


10


to add, change, or delete data. In response to request


12


, change instructions


16


are issued to modify the database data


6


.




A second replication site


52


similarly includes a database


54


having database data


56


and a data dictionary


58


. Second replication site


52


further includes a server


60


that is responsible for accessing and modifying the database data


56


in database


54


. If changes are to be made to database data


56


, server


60


issues change instructions


66


to implement the requested changes.




For the purposes of illustration, assume that the system of

FIG. 1

is used in a “peer-to-peer” or “multi-master” replication environment. In many peer-to-peer or multi-master replication environments, data changes made at a replication site are propagated to other replication sites, without the need for an overall “master” replication site. Thus, if a change request


12


at first replication site


2


is implemented to database data


6


, that same change is replicated to the database data


56


at second replication site


52


. Likewise, if a change request is made to second replication site


52


that is implemented to database data


56


, that same change is replicated to the database data


6


at first replication site


2


.




When a change request


12


is received at first replication site


2


, server


10


issues change instruction


16


to implement the change request


12


. The change instruction


16


takes into account the exact schema organization of the data object to be changed. Thus, the change instruction is schema-specific, and in a heterogeneous environment cannot simply be sent to all remote replication sites to replicate the data change, since the schema and/or system configuration of the remote replication sites may be entirely different than the schema and system configuration of local replication site


2


.




According to the invention, server


10


translates either change instruction


16


or change request


12


into a schema and system independent change record


20


. Change record is in a generic format that is consistent and recognizable across all replication sites in the system. In the normal contemplated usage of the invention, change record


20


comprises change information that is focussed upon the specific data to be added, deleted, or modified by the change request


12


, and does not contain information regarding the schema organization of the data at the originating replication site.




The change record


20


is added to a change record table


24


at first replication site


2


. According to an embodiment, it is the contents of the change record table


24


that is actually replicated to other replication sites. Thus, the contents of change record table


24


is replicated to the change record table


74


of second replication site


52


. The change record


70


, which is the replicated version of change record


20


, is retrieved by server


60


to be applied to database data


56


. Server


60


analyzes change record


70


to determine what data items are being changed. Based upon information located in the data dictionary


58


, server


60


translates change record


70


into change instructions


66


that is specific to the schema and system configuration of database


54


. The change instruction


66


is applied to replicate the change at replication site


52


.




Since the change records are created in a format that is independent of schema or system configuration for the replication sites, true peer-to-peer replication is achieved in a heterogeneous environment, regardless of the schema, data, or system configurations of the database systems taking part in the replication environment.




ILLUSTRATIVE EXAMPLE




The present illustrative example is directed to an LDAP information system, which is used to provide a framework for the storage and retrieval of information that are used to identify and locate the details of individuals and organizations, such as telephone numbers, postal addresses, and email addresses. Recall from above that LDAP directory systems are normally organized in a hierarchical structure having entries organized in the form of a tree, which is referred to as a directory information tree (“DIT”). The DIT is often organized to reflect political, geographic, or organizational boundaries. A unique name or ID (which is commonly called a “distinguished name”) identifies each LDAP entry in the DIT. An LDAP entry is a collection of one or more entry attributes. Each entry attribute has a “type” and one or more “values.” Each entry belongs to a particular object class. Entries that are members of the same object class share a common composition of possible entry attribute types.




Referring to

FIG. 5

, shown is an example of a hierarchical tree of directory entities. Entry


96


is the top most level of DIT


20


and is of object class “organization” having an attribute type “Org. Name” with an attribute value of “Oracle”. Entry


96


is the “parent” entry for three “child” entries (


97


,


98


, and


99


) directly beneath it in DIT


20


. Entries


97


,


98


, and


99


are objects of object class “Department” each having attributes “Dept. Name” and “State.” Entry


97


has an attribute type “Dept. Name” having a value of “Administration” and an attribute type “State” with the value “CA”. Entry


98


has an attribute “Dept. Name” with the value “Sales” and an attribute type “State” with an attribute value “NY”. Entry


99


has an attribute type “Dept. Name” with an attribute value “R&D” and an attribute type “State” with a value of “CA”.




Entry


103


is a child entry of entry


97


. Entry


103


represents an object of class “Person” having the following attribute type-value pairs: (1) attribute type “Last Name” with a value of “Founder”; (2) attribute type “First Name” with a value of “Larry”; (3) attribute type “Tel. No.” with a value of “555-4444”; and (4) attribute type “State” with a value of “CA”.




Entry


102


is a child entry of entry


98


. Entry


102


represents an object of class “Person” having the following attribute type-value pairs: (1) attribute type “Last Name” with a value of “Jones”; (2) attribute type “First Name” with a value of “Joe”; (3) attribute type “Tel. No.” with a value of “555-3333”; (4) attribute type “Manager” having the value of “Jim Smith”; and (5) attribute type “State” having the value “CA”. Note that entries


102


and


103


are both members of object class Person, but entry


102


has more listed object attributes than entry


103


. In many object-oriented based systems, objects that are members of the same object class may share a common set of possible object attributes, but some members of the class may not necessarily have values for some of the possible attributes. In this example, entry


103


does not have a value for attribute type “Manager” while entry


102


does have a value for this attribute.




Entries


100


and


101


are child entries of entry


99


. Entries


100


and


101


are both members of object class “Person.” Entry


100


is defined by the following attribute type-value pairs: (1) attribute type “Last Name” with a value of “Doe”; (2) attribute type “First Name” with a value of “John”; (3) attribute type “Tel. No.” with a value of “555-1111”; (4) attribute type “Manager” having the value of “Larry Founder”; and (5) attribute type “State” having the value “CA”. Entry


101


is defined by the following attribute type-value pairs: (1) attribute type “Last Name” with a value of “Smith”; (2) attribute type “First Name” with a value of “Jim”; (3) attribute type “Tel. No.” with a value of “555-2222”; and (4) attribute type “Manager” having the value of “John Doe”; and (5) attribute type “State” having the value “NY”.





FIGS. 2A

,


2


B, and


2


C depict one approach to storing the LDAP directory entries from DIT


20


of

FIG. 5

, into a relational database management system (“RDBMS”) or other database system using tables. In this approach, a separate table is provided for each object class in the system.

FIG. 2A

shows an object class table


202


for the Organization class, which includes entry


96


from DIT


20


as a member of that class.

FIG. 2B

is an example of an object class table


204


for the object class Department, which includes entries


97


,


98


, and


99


.

FIG. 2C

is an example of an object class table


206


for the object class Person, which includes entries


100


,


101


,


102


, and


103


from DIT


20


.




Each row of the object class table represents a single object of that corresponding object class. Thus, the Person class table


206


of

FIG. 2C

includes four rows, one row for each of the person class entries of DIT


20


(i.e., entries


100


,


101


,


102


, and


103


). Discrete columns within the object class table represent attributes of an object within the object class. A separate column us provided for each possible attribute of an object class. The Person class table


206


of

FIG. 2C

includes five columns for object attributes “Last Name,” “First Name,” “Tel. No.,” “Manager,” and “State.” Similar rows and columns in

FIGS. 2A and 2B

describe the objects and attributes for the Department and Organization objects of DIT


20


.




An alternate approach to representing the DIT


20


of

FIG. 5

in relational tables involves the implementation of a single table that comprises information describing objects and object attributes on the system. This table is hereby referred to as the “attribute_store” table. The attribute_store table comprises four columns having the following characteristics:


















Column Name




Datatype




Constraint




Description











EID




Number




Not null




ID for an entry






AttrName




Character-numeric





Attribute ID for a









particular attribute






AttrVal




Character-numeric





Attribute values






AttrKind




Character string




Not null




Kind of Attribute









(Operational, User









etc.)















FIG. 4

depicts an example of an attribute_store table


400


for entries in the DIT


20


of FIG.


5


. All entries in DIT


20


are represented in attribute_store table


400


, regardless of the particular object class that an entry belongs to. An entry is represented by one or more rows in table


400


. A set of rows having the same EID describes the attributes for the same entry in DIT


20


. Each row shown in attribute_store table


400


corresponds to a separate attribute for an entry.




Consider entry


100


from DIT


20


, which is represented in attribute_store table


400


by rows


416


,


418


,


420


,


422


,


423


, and


446


. The combination of the contents of these rows describes the attributes of entry


100


. Each row in attribute_store table


400


comprises a column that identifies that row's corresponding EID. These particular rows (


416


,


418


,


420


,


422


,


423


, and


446


) are identified as being associated with entry


100


since all of these rows comprise the same value of


100


in their EID column. Each of these rows describes a different attribute for entry


100


. For each row, the “AttrName” column identifies which object attribute is being described, and the “AttrVal” column identifies the value(s) for that attribute. For entry


100


, row


416


describes attribute “First Name” having a value of “John”, row


418


identifies the value “Doe” for attribute “Last Name”, row


420


identifies the value “555-1111”for attribute “Tel No.”, row


422


identifies the value “Larry Founder” for attribute “Manager,” and row


423


identifies the value “CA” for attribute “State.” Each of the other entries from DIT


20


is similarly represented by sets of one or more rows in the attribute_store table


400


.




In an embodiment, the rows in attribute_store table


400


contain an “AttrKind” column. This column identifies additional system categories for the object attributes. For example, one category of attribute kinds that can be identified according to the invention refers to access and modification privileges for particular object attribute. Two examples of attribute kinds relating to access and modification privileges are “User” and “Operational” attributes. User attributes are attributes that can be modified by the user, entity or organization associated with a particular entry. Operational attributes are attributes that are maintained by the system, and thus cannot be altered or modified except by the system. For example, row


420


identifies attribute type “Tel. No.” for entry


100


as being of AttrKind user, and thus the user or entity associated with entry


100


is permitted to modify this attribute value. Row


446


provides an example of an attribute type that is of attribute kind “operational” (i.e., “Modification Timestamp”). Many directory systems maintain a timestamp of the last modification time/date for each directory entry. Row


446


describes attribute “modification timestamp” for entry


100


having a value of “01/01/97.” Since this attribute type is “operational,” the entity or person corresponding to entry


100


is not normally permitted to modify this attribute value. In an alternate embodiment of the invention, the attribute_store table is configured without having a column for the AttrKind value.




Further details regarding the representation of directory information in an attribute_table are described in U.S. application Ser. No. 09/206,778 and U.S. Application Ser. No. 09/207,160, filed on Dec. 7, 1998, both of which are hereby incorporated by reference in their entirety.





FIG. 3

depicts an embodiment of a system architecture for replication of LDAP directory data according to an embodiment of the invention. Shown in

FIG. 3

is a first LDAP site


302


and a second LDAP site


304


. LDAP data operation requests


303


at LDAP site


302


are processed by LDAP server


306


. Modifications, additions, and deletions to the LDAP directory data


308


at LDAP site


302


are replicated to the directory data


312


at a second LDAP site


304


. LDAP site


304


similarly comprises an LDAP server


310


that implements LDAP data operations to LDAP directory data


312


.




Consider if the schema and data organizations for the replicated LDAP directory data are different between LDAP sites


302


and


304


. Thus, for the purposes of explanation, assume that LDAP site


302


comprises LDAP directory data


308


having the “object class table” schema described with reference to

FIGS. 2A-2C

. Further assume that LDAP site


304


comprises LDAP directory data


312


having the “attribute_store table” schema described with reference to FIG.


4


.




To perform data replication, a standard change record format is utilized to define LDAP data manipulation operations, in which the change record format is recognized and adhered to by each replication site. Change records are propagated to each replication site that describe the data changes made at the originating site. Regardless of the exact schema or data organization in place at each remote replication site, the LDAP server at each site comprises an LDAP engine that can interpret the standard format of the change records to replicate the changes to the local LDAP directory data. In this manner, peer-to-peer data replication can be performed in a heterogeneous environment in which local replication sites are not required to have knowledge of the exact schemas being employed by remote replication sites.




Consider if a client at replication site


302


wishes to add a new LDAP directory entry to the DIT


20


of FIG.


5


. The new entry has the following properties: entry no.=“104”, last name=“Last”, first name=“Bob”, tel. No.=“555-5555“, state=“CA”, and Manager=”Jim Smith”.

FIG. 11

depicts DIT


20


after new entry


104


is added to the directory tree.




The following SQL-based pseudocode represents a database statement that can be used to implement this change at replication site


302


(where the LDAP directory data


308


is stored as shown in FIGS.


2


A-C):




INSERT INTO Person_Class_Table (/*column names*/ Entry No., Last name, First Name, Tel. No., State, Manager) VALUES (/*column values*/ 104, ‘Last’, ‘Bob’, ‘555-5555’, ‘CA’, ‘Jim Smith’)




By executing this database statement, the new directory entry would be added to the Person Class Table


206


within the LDAP directory data


308


of replication site


302


.

FIG. 8

depicts a revised Person Class Table


806


in which row


809


represents newly added directory entry


104


.




This change to the LDAP directory data cannot be replicated at replication site


304


by merely re-executing the same database statement. This is because the schema organization of LDAP site


304


, as shown in

FIG. 4

, is significantly different than the schema organization of LDAP site


302


shown in

FIGS. 2A-C

. Since the above database statement is specific to the schema of LDAP site


302


, it would not properly reproduce the desired changes to the directory data


312


at LDAP site


304


.




In the present invention, when LDAP server


306


applies the requested LDAP data operation to the LDAP directory data


308


, a change log entry is made to the change log


314


at LDAP site


302


. The change log entry contains the requested LDAP data operation in a canonical format that is consistent across all participating replication sites. The change log entry in the change log


314


contains sufficient information to replicate the requested change to the LDAP directory data at any remote site, including remote LDAP site


304


. According to an embodiment, the change log entries are generated into conventional LDAP command protocols that have been standardized for LDAP directory data.




The embodiment of

FIG. 3

also includes the use of a shadow log to propagate changes from one replication site to another. Change log entries from change log


314


are copied to a replication log


316


to be propagated to other replication sites. Replication log


316


is a shadow of change log


314


, and its use prevents the need to bring down all LDAP databases when schema changes are propagated to the replication sites, such as the addition or deletion of LDAP databases from the replication environment. In essence, shadow logs are utilized to insulate the format of local replication logs from the actual mechanism used to propagate changes to other replication sites. In this manner, the internal schema formats of the replication sites are encapsulated by the shadow logs, such that schema changes can be made without downtime to the replication nodes.




A process runs at the LDAP directory site


302


to copy information from the change log


314


to the replication log


316


. Either asynchronous or synchronous replication can be implemented using the invention. For asynchronous replication, the copying of entries from the change log


314


to the replication log


316


occurs either periodically, or upon certain specified trigger conditions. The change information is propagated and applied to remote LDAP sites in a queued “store-and-forward” process. For synchronous replication, the system constantly monitors the change log for the arrival of new entries. If a new entry is generated at the change log


314


, the new entry is immediately copied to the replication log


316


for propagation to remote LDAP sites.




The change log information copied to the replication log


316


at the local LDAP directory site


302


is propagated to the replication log


320


at remote LDAP site


304


. In the preferred embodiment, the mechanism used to replicate this information is the Advanced Symmetric Replication mechanism from the Oracle 8i database management system, available from Oracle Corporation of Redwood Shores, Calif.




At the remote LDAP site


304


, the change log entry in replication log


320


is directly sent to LDAP server


310


for processing. Alternatively, the change log entry in replication log


320


can be copied to change log


324


before being sent to LDAP server


310


. A daemon process


322


initiates the application of the change log entry to the LDAP directory data


312


at LDAP site


304


. If asynchronous replication is employed, the daemon process


322


wakes up periodically based upon defined intervals or upon specified trigger conditions to initiate the changes. If synchronous replication is employed, daemon process


322


actively monitors for any incoming change log information that has been propagated by a remote LDAP site. With synchronous replication, once the changes have been implemented, an acknowledgement is sent back to the propagating LDAP site.




To implement the changes at LDAP site


304


, the daemon process


322


prompts LDAP server


310


to implement the changes. As noted above, the change log entry is in a schema-independent canonical format. LDAP server


310


analyzes the change information, determines which local data items are to be changed, and formulates a database statement that is capable of implementing the replicated LDAP data operation to data under the local schema and data organization. Thus, if the LDAP directory data is stored as shown in

FIG. 4

, the following SQL-based pseudocode represents the database statement to be generated to replicate the above change to the LDAP directory data


312


at LDAP site


304


:




INSERT INTO Attribute_Store_Table (/*column names*/ EID, AttrName, AttrVal, AttrKind) VALUES (/*column values*/ 104, ‘First Name’, ‘Bob’, ‘User’);




INSERT INTO Attribute_Store_Table (/*column names*/ EID, AttrName, AttrVal, AttrKind) VALUES (/*column values*/ 104, ‘Last Name’, ‘Last’, ‘User’);




INSERT INTO Attribute_Store_Table (/*column names*/ EID, AttrName, AttrVal, AttrKind) VALUES (/*column values*/ 104, ‘Tel. No.’, ‘555-5555’, ‘User’);




INSERT INTO Attribute_Store_Table (/*column names*/ EID, AttrName, AttrVal, AttrKind) VALUES (/*column values*/ 104, ‘Manager’, ‘Jim Smith’, ‘User’);




INSERT INTO Attribute_Store_Table (/*column names*/ EID, AttrName, AttrVal, AttrKind) VALUES (/*column values*/ 104, ‘State’, ‘CA’, ‘User’);




The LDAP server


310


may reference a data dictionary or other metadata to determine the appropriate schema objects to be accessed to implement the data changes. Thus, the database statement to be formulated by LDAP server


310


is normally tied to the exact schema and data organization of the local LDAP site


304


.




A garbage collector


326


is used to purge the change log


324


at LDAP site


304


. The garbage collector


326


is a daemon process that periodically wakes up based upon predefined intervals. Similarly, a garbage collector


327


is used to purge the change log


314


at LDAP site


302


.





FIG. 9

depicts the process flow of an embodiment of the invention to add a new LDAP site to an existing replication environment. The following describe the process actions of this process flow:




1. Stop the processes that propagate changes from change logs to replication logs tables at all sites (process action


902


).




2. Redirect all LDAP functions from a master definition/configuration database (process action


904


). In an embodiment of the invention, a master definition/configuration database is maintained to control configuration information regarding replication nodes, such as node identifiers, location, etc. Any of the replication nodes can be designated as the master definition/configuration site.




3. Suspend and quiesce the replication environment (process action


906


). This ensures that all data presently at the replication logs are propagated to all sites by the replication mechanism.




4. Build a snapshot of the master definition/configuration database (process action


908


). In an embodiment, building the snapshot comprises the performance of an online backup. A database log switch can be performed before the online backup. The master definition/configuration database can be triple-mirrored for quicker online backup.




5. Bring the master definition/configuration database back online (process action


910


).




6. Resume all LDAP functions on master definition/configuration site (process action


912


).




7. Add the new LDAP site to the replication environment, by adding the replication log table for the new site to the replicated environment and regenerating the replication support (


914


). At this point replication resumes between the LDAP sites.




8. Bring down the new LDAP directory site (process action


916


).




9. Resume the jobs that copy information from change logs to replication logs (process action


918


). Now all LDAP sites are fully available, except for the new LDAP database that is being added.




10. Bring up the LDAP new database (process action


920


). This is performed by first bringing up the new database without the replication processes. The new database is then brought down and recreated using the backup of master definition/configuration database. Database administration changes are made for the new database (e.g., network names, database names, file names that may need to be changed, etc.). The Replication catalog tables are dropped into the new database and recreated.




11. At the new LDAP site, start replication processes as well as the processes that copy change information from the change log to the replication log (process action


922


).




12. Start LDAP server and replication mechanism at the new LDAP site (process action


924


).




The following describes an alternate process to add a new node to a replication system:




1. Stop the replication server on all replication nodes.




2. Configure the new node into the same replication group as the existing replication nodes. “Replication agreements” can be established to maintain entries which describe the member nodes within a replication group that shares and replicates data changes. Replication agreements are referenced for configuration parameters when the replication server operates. In an embodiment, replication configuration parameters and replication agreements are stored as entries in an LDAP directory information tree.




3. Identify a sponsor node and switch the sponsor node to read-only mode. The sponsor node is an existing replication node that supplies data to the new replication node. According to an embodiment, when the sponsor node is in read-only mode, updates cannot be made to the sponsor node, but are allowed to any of the other nodes.




4. Back up sponsor node. If this action requires a lengthy time period, process action


5


may be configured to run concurrently with process action


4


.




5. Perform setup of the add node procedure. This executes a number of operations, including:




quiesce the replication process at any master definition sites;




configure the master definition sites and the new node as well as other sites that participate in the LDAP replication;




configure replication push jobs to all sites including the new node;




check to make sure that all steps have completed successfully.




6. Switch the sponsor node to updatable (read-write) mode.




7. Start the replication server on all nodes except the new node. At this time, verify that no replication processes are running on the new node.




8. Load data into the new node.




9. Start the LDAP server on the new node.




10. Configure the LDAP replication agreement on the new node. In an embodiment, these parameters include the following:




Retry count: this parameter identifies the number of processing retry attempts for a change entry before being dropped;




Purge schedule: this parameter indicates the frequency at which entries that have already been applied or have been dropped are purged by a garbage collector;




Threads: this parameter identifies the number of worker threads provided for each supplier for change log processing;




Replication agreement: identifies the replication agreement for which a server is responsible;




Replication protocol: specifies the protocol used in the replication agreement; for Oracle-based replication nodes, this parameter is set to ASR.




11. Start the replication server on the new node.





FIG. 10

depicts the process flow of an embodiment of a process to remove an existing LDAP directory site from a replication environment. The following describe the process actions for this process flow:




1. Stop processes that propagate change information the change log and replication log at each LDAP directory site (process action


1002


).




2. Quiesce the replication environment (process action


1004


).




3. Drop the LDAP server from replication (process action


1006


).




4. Resume replication activities at all other LDAP sites (process action


1008


).




5. Start the process that were stopped in process action


1002


(process action


1010


).




In an embodiment, the attribute_store table of

FIG. 4

is modified to include an additional column for replication information. Thus, the attribute_store table in an replication environment contains columns having the following characteristics:


















Column Name




Datatype




Constraint




Description











EID




Number




Not null




ID for an entry






AttrName




Character-numeric





Attribute ID for a









particular attribute






AttrVal




Character-numeric





Attribute values






AttrKind




Character string




Not null




Kind of Attribute









(Operational, User









etc.)






AttrVer




Character String





Attribute version and









timestamp














The AttrVer column describes the version of an attribute for an LDAP directory entry. Each time an attribute is modified, the version number of that attribute is incremented and the timestamp is adjusted to the most recent modification time.




Change Log Processing and Conflict Resolution




The following processes are utilized in an embodiment of the invention to address inbound change log processing and conflict resolution on a consumer directory. According to this embodiment, at least the following five kinds of inbound changes are addressed, including: (1) adding information; (2) deleting information; (3) modifying information; (4) moving leaf entry in a directory tree (resulting in a name change); and, (5) moving a subtree to a different location in directory tree.




Multi-master replication enables updates to multiple replication sites. Thus, a mechanism is needed to address the possibility of conflicting updates. Conflicts should be detected, for example, when the replication server attempts to apply changes from a remote directory to another directory that holds conflicting data.




Entry-level conflicts are caused when the replication server attempts to apply a change to a consumer directory that results in a conflict, such as:




adding an entry that already exists;




deleting an entry that does not exist; or




modifying an entry that does not exist.




Attribute-level conflicts are caused when two directories are updating the same attribute with different values, possibly at different times. One approach to address attribute-level conflicts is to examine timestamps of the changes involved in the conflict.




Generally, the present embodiment attempts to resolve conflicts by applying the following process:




1. Attempt to detect conflict when a change is applied or upon detection of error;




2. Attempt to re-apply the change a configurable number of times or for a configurable amount of time after a waiting period;




3. If the retry limit is reached without successfully applying the change, then the change request is escalated to a different-priority queue for processing.




According to this embodiment, three change log processing queues are employed. When a change first arrives to the consumer directory, it is placed in a “new queue”. An attempt is then made to apply the change. If it fails to be applied in the new queue, the change will be put to a “retry queue”. If it fails to be applied after a specified number of attempts in the retry queue, the change will be placed to a “Human Intervention queue” and re-attempted at a much lower rate. If it succeeds to be applied from one of the above 3 queues, it will be placed to the purge queue for garbage collection.




The following processes are employed to implement the change/conflict check procedures:




The following process matrix is employed to apply an “add” change request:





















Step 1.











Entry







(name)




Step 2.






Human






Change




Conflict




Apply




New





Intervention






type




Check




change




Queue




Retry Queue




Queue











Add




Search for




Compose




(a) Perform




(a) Repeat step 1




NOTE: A change







the parent




the correct




step 1 and




and 2.




entry would







entry in




identifier




2.




(b) If both steps




typically get into







directory




(distinguished




(b) If both




succeed put the




this queue if the







tree that




name or




steps




change to purge




parent entry fails







matches




DN) for the




succeed put




queue




to be located in







with the




entry being




the change




(c) If one of the two




the consumer







object




added




to purge




steps fails,




directory during







identifier




under its




queue




decrement the retry




the period of







(GUID) in




parent




(c) If one




count of the change




normal retry.







the change




entry




of the two




entry.




Same steps as in







entry.




identified




steps fails,




(d) If change fails




retry queue







If the




by GUID




put the




on the last retry




processing with







parent




in the




change into




because of a




the exception of







entry




consumer




retry queue




duplicated target




step (c).







exists,




directory.




and set the




entry, apply conflict




If there are







continue




Apply the




retry count




resolution as




failures, the entry







with step 2.




change in




to the




follows:




is retained in this








the




configured




Older creation time




queue until








consumer




maximum.




stamp wins. If




human








directory.





there is a tie, the




intervention.










smaller GUID wins.










(e) If one of steps










1&2 fails on the










last retry for any










reasons other than










duplicate DN, put










the change into










Human Intervention










queue.














The following process matrix is employed to apply a “delete” change request:





















Step 1.











Entry







(name)




Step 2.






Human






Change




Conflict




Apply




New





Intervention






type




Check




change




Queue




Retry Queue




Queue











Delete




Search for




Delete the




(a) Perform




(a) Repeat steps 1




Same steps as in







the entry in




entry found




step 1 and




& 2.




retry queue







the directory




in step 1.




2.




(b) If both




processing with







tree matched





(b) If both




succeed put the




the exception of







with the





steps




change to purge




step (c).







object





succeed put




queue.




If there are







identifier





the change




(c) If either of the




failures, the entry







(GUID) in





to purge




two steps fails,




is retained in this







the change





queue




decrement the




queue until







entry.





(c) If one




retry count in the




human









of the two




change entry.




intervention.









steps fails,




(d) If either of the









put the




two steps fails on









change into




the last retry









retry queue




move the change









and set the




to the human









retry count




intervention









to the




queue.









configured









maximum.














The following process matrix is employed to apply a “modify” change request:






















Step 2.











a. Attribute








Conflict







Step 1.




Check (for







Entry (name)




Modify only).






Human






Change




Conflict




b. Apply




New





Intervention






type




Check




change.




Queue




Retry Queue




Queue











Modify




Search for the




a. Filter the




(a) Perform




(a) Repeat steps 1




Same steps as in







correct unique




modification




step 1 and




& 2.




retry queue







identifier




in change




2.




(b) If both




processing with







(distinguished




entry by




(b) If both




succeed put the




the exception of







name or DN)




comparing




steps




change to purge




step (c).







in the target




each attribute




succeed put




queue.




If there are







directory that




in change




the change




(c) If either of the




failures, the entry







matches with




entry against




to purge




two steps fails,




is retained in this







the object




the one in




queue




decrement the




queue until







identifier




target entry.




(c) If one




retry count in the




human







(GUID) in the




(1. newer




of the two




change entry.




intervention.







change entry.




modify time




steps fails,




(d) If either of the








wins. 2.




put the




two steps fails on








greater version




change into




the last retry








wins. 3.




retry queue




move the change








smaller




and set the




to the human








hostname




retry count




intervention








using string




to the




queue.








comparison




configured








rule wins.).




maximum.








b. Apply the








filtered








modification.














The following process matrix is employed to apply a “modifyRDN” change request to move a leaf entry in the directory information tree (which results in a name change by modifying the relative distinguished name-RDN):





















Step 1.











Entry







(name)




Step 2.






Human






Change




Conflict




Apply




New





Intervention






type




Check




change




Queue




Retry Queue




Queue











Modify




Search for




Perform




(a) Perform




(a) Repeat step 1 and




Same steps as in






RDN




the current




modify




step 1 and




2.




retry queue







unique




RDN




2.




(b) If both steps




processing with







identifier




operation




(b) If both




succeed put the




the exception of







(distinguished




using the




steps




change to purge




step (c).







name or




current




succeed put




queue




If there are







DN) that




DN




the change




(c) If one of the two




failures, the entry







matches




acquired




to purge




steps fails, decrement




is retained in this







with the




from step




queue




the retry count of the




queue until







object




1.




(c) If one




change entry.




human







identifier





of the two




(d) If change fails on




intervention.







(GUID) in





steps fails,




the last retry because







the change





put the




of a duplicated target







entry.





change into




entry, apply conflict









retry queue




resolution as follows:









and set the




Older creation time









retry count




stamp wins. If there









to the




is a tie, the smaller









configured




GUID wins.









maximum.




(e) If one of steps










1&2 fails on the last










retry for any reasons










other than duplicate










DN, put the change










into Human










Intervention queue.














The following process matrix is employed to apply a “modify DN” change request to move a subtree into a different location in the information directory tree (by modifying the distinguished name DN):





















Step 1.











Entry







(name)




Step 2.






Human






Change




Conflict




Apply




New





Intervention






type




Check




change




Queue




Retry Queue




Queue











Modify




Search for the




Perform




(a) Perform




(a) Repeat step 1




Same steps as in






DN




current unique




the




step 1 and




and 2.




retry queue







identifier




modify DN




2.




(b) If both steps




processing with







(distinguished




operation




(b) If both




succeed put the




the exception of







name or DN)




using the




steps




change to purge




step (c).







that matches




current




succeed put




queue




If there are







with the object




DN and




the change




(c) If one of the two




failures, the entry







identifier




new parent




to purge




steps fails,




is retained in this







(GUID) in the




DN




queue




decrement the retry




queue until







change entry.




acquired




(c) If one




count of the change




human







Search for the




from step




of the two




entry.




intervention.







new parent




1.




steps fails,




(d) If change fails







DN that





put the




on the last retry







matches with





change into




because of a







the parent





retry queue




duplicated target







GUID in the





and set the




entry, apply conflict







change entry.





retry count




resolution as









to the




follows:









configured




Older creation time









maximum.




stamp wins. If there










is a tie, the smaller










GUID wins.










(e) If one of steps










1&2 fails on the last










retry for any reasons










other than duplicate










DN, put the change










into Human










Intervention queue.














Example 1




Add “dc=com


2


” on both Node


1


and Node


2


in a three node replication system.




The detailed process state information for example 1 is as follows:




At Time t




Node


1


:




Add dc=com


2






With GUID: 00001




Node


2


:




Add dc=com


2






With GUID: 00002




Node


3


:




NA




A conflict exists at time t since there are duplicated DN on the consumer directory for multiple nodes. To resolve this conflict, compare the creation time between the change and the consumer entries, favoring the one with older creation time. If creation time ties, the smaller GUID wins. The end result should be a situation in which both nodes end up with “dc=com


2


” having GUID: 00001.




At Time t+1




Node


1


:




The addition change “add dc=com


2


” supplied by node


2


arrived to “new queue”.




1. Change processing in “new queue”:




Step


1


: Skipped parent GUID check since the target DN in the change entry was a first level entry.




Step


2


: Applied the “dc=com


2


” add change to node


1


and got duplicated DN error.




Set retry count of the change to the configured maximum and moved it to “retry queue”.




2. Change processing in “retry queue”:




Repeated step


1


and


2


and failed on configured number of retries.




Compared the creation time between the change entry with the target entry. They tied at “time t”.




Compared the GUID in the change entry with the target entry and found the GUID value in the change entry was greater than the one in target entry. Hence, moved the change to purge queue.




Node


2


:




NA




Node


3


:




NA




At Time t+2




Node


1


:




NA




Node


2


:




The addition change “add dc=com


2


” supplied by node


2


arrived to “new queue”.




1. Change processing in “new queue”:




Step


1


: Skipped parent guid check since the target DN in the change entry was a first level entry.




Step


2


: Applied the add “dc=com


2


” change to node


1


and got duplicated DN error.




Set retry count of the change to the configured maximum and moved it to “retry queue”.




2. Change processing in “retry queue”:




Repeated step


1


and


2


and failed on configured number of retries.




Compare the creation time of the change entry with the target entry.




They tied at “time t”.




Compared the GUID in the change entry with the target entry and found the GUID value in the change entry was smaller than the one in the target entry. Hence, deleted the target entry and applied the change.




Node


3


:




NA




At Time t+3




Node


1


:




NA




Node


2


:




NA




Node


3


:




Change supplied by node


1


and node


2


all arrived to “new queue”. One of the two changes applied first. Then, the change applied later received a duplicated DN error. The change supplied by node


1


with the smaller GUID eventually superseded the other change and added to node


3


.




At time t+4




Node


1


:




dc=com


2






With GUID: 00001




Node


2


:




dc=com


2






With GUID: 00001




Node


3


:




dc=com


2






With GUID: 00001




Example 2




Add “dc=com


2


”, delete it and add it back on both node


1


and node


2


in a three node replication system. Note that the creation time/GUID combination applied in the following example is just one out of many possibilities, and is not intended to be limiting as to the scope of formats.




The detailed process state information for example 2 is as follows:




At Time t




Node


1


:




Add “dc=com


2







With GUID=00003




Node


2


:




Add “dc=com


2







With GUID=00006




Node


3


:




NA




A conflict exists because there are duplicated DN for the ad request. However, objects with the same GUID does not exist for delete.




The conflict resolution solution for add on node


1


: After failing on configured number of retries, the add change with GUID:00006 created at time


0


superseded the existing entry with GUID:00005 created at time


2


. The add change with GUID:00004 created at time


2


was dropped.




The conflict resolution solution for add on node


2


: After failing on configured number of retires, the add change with GUID:00003 created at time


0


superseded the existing entry with GUID:00004 created at time


2


. The add change with GUID:00005 created at time


2


was dropped.




The conflict resolution for delete: The delete change failed a number of times until the “add” change with the same GUID applied to the target node. End result: “dc=com


2


” was removed from both directories.




At Time t+1




Node


1


:




Delete “dc=com


2







With GUID=00003




Node


2


:




Delete “dc=com


2







With GUID=00006




Node


3


:




NA




At Time t+2




Node


1


:




Add “dc=com


2







With GUID=00005




Node


2


:




Add “dc=com


2







With GUID=00004




Node


3


:




NA




At Time t+3




Node


1


:




The three changes supplied by node


2


arrived at “new queue”. All three changes failed and are moved into the retry queue. The add change with GUID:00006 superseded the target entry with GUID:00005 after maximum configured number of retries.




The add change with GUID:00004 dropped because it was created at a later time than the add change with GUID:00006.




The delete change with GUID:00006 eventually succeeds.




Node


2


:




The three changes supplied by node


1


arrived at “new queue”. All three changes failed and are moved into retry queue. The add change with GUID:00003 created at time


0


superseded the target entry with GUID:00004 created at time


2


after the configured number of retries.




The add change with GUID:00005 dropped because it was created at a later time than the add change with GUID:00003.




The delete change with GUID:00003 eventually succeeds.




Node


3


:




six changes arrived to “new queue”.




The race condition is similar to what happened on node


1


and node


2


.




At Time t+4




Node


1


:




“dc=com


2


”no longer exists.




Node


2


:




“dc=com


2


”no longer exists.




Node


3


:




“dc=com


2


”no longer exists.




The following queue parameters are employed in an embodiment of the invention:























Human










Intervention







New queue




Retry queue




queue




Purge queue




























Retry count in




0




>0




−1




−2






change entry






Change number




>last change




<=last change




<=last change




<=last change






in change entry




number applied




number applied




number applied




number applied







in change log.




in change log.




in change log.




in change log.














According to an embodiment, the following additional considerations are applied to replication processing:




a. A delete issued from the replication server triggers a subtree deletion. This stems from the policy that an entry delete has precedence over any subsequent addition of children under that entry.




b. The replication server skips the parent GUID checking when replicating a first level entry to a consumer directory since there is no real parent entry for a first level entry.




c. In one change log processing cycle, there can be multiple “modify” changes modifying the same attribute of the same entry. Because of this, multiple worker threads can be applying changes modifying a same attribute of the same entry in a race. The replication server provides synchronization logic between worker threads to ensure attribute convergence in such a race condition.




d. To ensure schema and group modification convergence, “modify add” or “modify delete” operations should not be allowed to overlap with “modify replace”, and vice versa. Any “modify add” or “modify delete” for schema or group entries should only be performed after any previous “modify replace” (and vice versa) of the same entry has been replicated to all the consumer directories.




SYSTEM ARCHITECTURE OVERVIEW




Referring to

FIG. 6

, in an embodiment, a computer system


620


includes a host computer


622


connected to a plurality of individual user stations


624


-


1


,


624


-


2


,


624


-


3


, and


624


-


4


. In an embodiment, the user stations


624


-


1


,


624


-


2


,


624


-


3


, and


624


-


4


, each comprise suitable data terminals, for example, but not limited to, e.g., personal computers, portable laptop computers, or personal data assistants (“PDAs”), which can store and independently run one or more applications, i.e., programs. For purposes of illustration, some of the user stations


624


-


3


and


624


-


4


are connected to the host computer


622


via a local area network (“LAN”)


626


. Other user stations


624


-


1


and


624


-


2


are remotely connected to the host computer


622


via a public switched telephone network (“PSTN”)


628


and/or a wireless network


630


.




In an embodiment, the host computer


622


operates in conjunction with a data storage system


631


, wherein the data storage system


631


contains a database


632


that is readily accessible by the host computer


622


.




In alternative embodiments, the database


632


may be resident on the host computer, stored, e.g., in the host computer's ROM, PROM, EPROM, or any other memory chip, and/or its hard disk. In yet alternative embodiments, the database


632


may be read by the host computer


622


from one or more floppy disks, flexible disks, magnetic tapes, any other magnetic medium, CD-ROMs, any other optical medium, punchcards, papertape, or any other physical medium with patterns of holes, or any other medium from which a computer can read.




In an alternative embodiment, the host computer


622


can access two or more databases


632


, stored in a variety of mediums, as previously discussed.




Referring to

FIG. 7

, in an embodiment, user stations


624


-


1


,


624


-


2


,


624


-


3


, and


624


-


4


and the host computer


622


, each referred to generally as a processing unit, embodies a general architecture


705


. A processing unit includes a bus


706


or other communication mechanism for communicating instructions, messages and data, collectively, information, and one or more processors


707


coupled with the bus


706


for processing information. A processing unit also includes a main memory


708


, such as a random access memory (RAM) or other dynamic storage device, coupled to the bus


706


for storing dynamic data and instructions to be executed by the processor(s)


707


. The main memory


708


also may be used for storing temporary data, i.e., variables, or other intermediate information during execution of instructions by the processor(s)


707


.




A processing unit may further include a read only memory (ROM)


709


or other static storage device coupled to the bus


706


for storing static data and instructions for the processor(s)


707


. A storage device


710


, such as a magnetic disk or optical disk, may also be provided and coupled to the bus


706


for storing data and instructions for the processor(s)


707


.




A processing unit may be coupled via the bus


706


to a display device


711


, such as, but not limited to, a cathode ray tube (CRT), for displaying information to a user. An input device


712


, including alphanumeric and other keys, is coupled to the bus


706


for communicating information and command selections to the processor(s)


707


. Another type of user input device may include a cursor control


713


, such as, but not limited to, a mouse, a trackball, a fingerpad, or cursor direction keys, for communicating direction information and command selections to the processor(s)


707


and for controlling cursor movement on the display


711


.




According to one embodiment of the invention, the individual processing units perform specific operations by their respective processor(s)


707


executing one or more sequences of one or more instructions contained in the main memory


708


. Such instructions may be read into the main memory


708


from another computer-usable medium, such as the ROM


709


or the storage device


710


. Execution of the sequences of instructions contained in the main memory


708


causes the processor(s)


707


to perform the processes described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and/or software.




The term “computer-usable medium,” as used herein, refers to any medium that provides information or is usable by the processor(s)


707


. Such a medium may take many forms, including, but not limited to, non-volatile, volatile and transmission media. Non-volatile media, i.e., media that can retain information in the absence of power, includes the ROM


709


. Volatile media, i.e., media that can not retain information in the absence of power, includes the main memory


708


. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise the bus


706


. Transmission media can also take the form of carrier waves; i.e., electromagnetic waves that can be modulated, as in frequency, amplitude or phase, to transmit information signals. Additionally, transmission media can take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications.




Common forms of computer-usable media include, for example: a floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, RAM, ROM, PROM (i.e., programmable read only memory), EPROM (i.e., erasable programmable read only memory), including FLASH-EPROM, any other memory chip or cartridge, carrier waves, or any other medium from which a processor


707


can retrieve information.




Various forms of computer-usable media may be involved in providing one or more sequences of one or more instructions to the processor(s)


707


for execution. For example, the instructions may initially be provided on a magnetic disk of a remote computer (not shown). The remote computer may load the instructions into its dynamic memory and then transit them over a telephone line, using a modem. A modem local to the processing unit may receive the instructions on a telephone line and use an infrared transmitter to convert the instruction signals transmitted over the telephone line to corresponding infrared signals. An infrared detector (not shown) coupled to the bus


706


may receive the infrared signals and place the instructions therein on the bus


706


. The bus


706


may carry the instructions to the main memory


708


, from which the processor(s)


707


thereafter retrieves and executes the instructions. The instructions received by the main memory


708


may optionally be stored on the storage device


710


, either before or after their execution by the processor(s)


707


.




Each processing unit may also include a communication interface


714


coupled to the bus


706


. The communication interface


714


provides two-way communication between the respective user stations


624


-


1


,


624


-


2


,


624


-


3


, and


624


-


4


and the host computer


622


. The communication interface


714


of a respective processing unit transmits and receives electrical, electromagnetic or optical signals that include data streams representing various types of information, including instructions, messages, and data.




A communication link


715


links a respective user station


624


and a host computer


622


. The communication link


715


may be a LAN


626


, in which case the communication interface


714


may be a LAN card. Alternatively, the communication link


715


may be a PSTN


628


, in which case the communication interface


714


may be an integrated services digital network (ISDN) card or a modem. Also, as a further alternative, the communication link


715


may be a wireless network


630


.




A processing unit may transmit and receive messages, data, and instructions, including program, i.e., application, code, through its respective communication link


715


and communication interface


714


. Received program code may be executed by the respective processor(s)


707


as it is received, and/or stored in the storage device


710


, or other associated non-volatile media, for later execution. In this manner, a processing unit may receive messages, data and/or program code in the form of a carrier wave.




In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. For example, the reader is to understand that the specific ordering and combination of process actions shown in the process flow diagrams described herein is merely illustrative, and the invention can be performed using different or additional process actions, or a different combination or ordering of process actions. The specification and drawings are, accordingly, to be regarded in an illustrative rather than restrictive sense.



Claims
  • 1. A process for database replication comprising:receiving a change request to modify data at a first replication site; implementing said change request at said first replication site; generating a change record corresponding to said change request; said change record having a schema independent format; sending said change record to a second replication site; and implementing said change record to a copy of said data at said second replication site.
  • 2. The process of claim 1 in which said change request is directed to LDAP directory data.
  • 3. The process of claim 1 in which implementing said change record at said replication site comprises:generating a database instruction that is specific to schema and data organizations at said second replication site.
  • 4. The process of claim 3 further comprising:accessing metadata at said second replication site to generate said database instruction.
  • 5. The process of claim 1 further comprising copying said change record to a first replication log at said first replication site, in which sending said change record to said second replication site comprises replicating said change record from said first replication log to a second replication log at said second replication site.
  • 6. The process of claim 1 in which implementing said change record at said second replication site is performed using synchronous replication.
  • 7. The process of claim 6 further comprising:adding said change record to a change log at said first replication site; and monitoring said change log for entry of new change records.
  • 8. The process of claim 6 further comprising:adding said change record to a second change log at said second replication site; and monitoring said second change log for entry of new change records.
  • 9. The process of claim 1 in which implementing said change record at said second replication site is performed using asynchronous replication.
  • 10. The process of claim 9 in which said change record is sent to said second replication site in a periodic manner.
  • 11. The process of claim 9 in which said change record is sent to said second replication site upon a trigger event.
  • 12. The process of claim 1 in which schema and data organizations of said first replication site is different than that of said second replication site.
  • 13. A process for replication of LDAP directory data in a distributed LDAP environment, comprising:receiving an LDAP operation request at a first LDAP server, said first LDAP server located at a first LDAP site; implementing said LDAP operation request to LDAP directory data at said first LDAP site; generating a change log entry to a change log, said change log entry independent of schema and data organizations at said first LDAP site, said change log entry corresponding to said LDAP operation request; replicating said change log entry to a second change log at a second LDAP site; and utilizing said change log entry at said second change log to implement said LDAP operation request at said second LDAP site.
  • 14. The process of claim 13 in which said schema and data organizations of said first replication site is different than that of said second replication site.
  • 15. The process of claim 13 in which implementing said LDAP operation request to said second LDAP site comprises:generating a database instruction that is specific to schema and data organizations at said second LDAP site.
  • 16. The process of claim 15 further comprising:accessing metadata at said second LDAP site to generate said database instruction.
  • 17. The process of claim 13 in which replicating said change log entry comprises:copying said change log entry from said change log to a first replication log at said first LDAP site; and replicating said change log entry from said first replication log at said first LDAP site to a second replication log at said second LDAP site.
  • 18. The process of claim 13 in which said LDAP directory data is synchronously replicated.
  • 19. The process of claim 14 further comprising:monitoring said change log for addition of said change log entry.
  • 20. The process of claim 13 in which said LDAP directory data is asynchronously replicated.
  • 21. The process of claim 20 in which said change log entry is replicated to said second LDAP site in a periodic manner.
  • 22. The process of claim 20 in which said change log entry is sent to said second LDAP site upon a trigger event.
  • 23. A computer program product that includes a medium usable by a processor, the medium having stored thereon a sequence of instructions which, when executed by said processor, causes said processor to execute a process for database replication, said process comprising:receiving a change request to modify data at a first replication site; implementing said change request at said first replication site; generating a change record corresponding to said change request; said change record having a schema independent format; sending said change record to a second replication site; and implementing said change record to a copy of said data at said second replication site.
  • 24. A computer program product that includes a medium usable by a processor, the medium having stored thereon a sequence of instructions which, when executed by said processor, causes said processor to execute a process for replication of LDAP directory data in a distributed LDAP environment, said process comprising:receiving an LDAP operation request at a first LDAP server, said first LDAP server located at a first LDAP site; implementing said LDAP operation request to LDAP directory data at said first LDAP site; generating a change log entry to a change log, said change log entry independent of schema and data organizations at said first LDAP site, said change log entry corresponding to said LDAP operation request; replicating said change log entry to a second change log at a second LDAP site; and utilizing said change log entry at said second change log to implement said LDAP operation request at said second LDAP site.
  • 25. A process for database replication comprising:receiving a change request to modify data at a first replication site; implementing the change request at the first replication site; translating the change request into a schema independent change record; sending the schema independent change record to a second replication site; and implementing the schema independent change record to a copy of the data at the second replication site.
  • 26. The process of claim 25 wherein implementing the schema independent change record comprises issuing a change instruction that is specific to schema and data organization at the second replication site.
  • 27. The process of claim 26 wherein implementing the schema independent change record further comprises accessing metadata at the second replication site to generate the change instruction.
  • 28. The process of claim 25 further comprising copying the change record to a first replication log at the first replication site, wherein sending the schema independent change record to a second replication comprises replicating the schema independent change record from the first replication log to a second replication log at the second replication site.
  • 29. The computer program product of claim 23, wherein said process further comprises:copying said change record to a first replication log at said first replication site, in which sending said change record to said second replication site comprises replicating said change record from said first replication log to a second replication log at said second replication site.
  • 30. The computer program product of claim 23, wherein implementing said change record at said second replication site is performed using synchronous replication.
  • 31. The computer program of claim 30, wherein said process further comprises:adding said change record to a change log at said first replication site; and monitoring said change log for entry of new change records.
  • 32. The computer program product of claim 30, wherein said process further comprises:adding said change record to a second change log at said second replication site; and monitoring said second change log for entry of new change records.
  • 33. The compute program product of claim 24, wherein said schema and data organizations of said first replication site is different than that of said second replication site.
  • 34. The computer program product of claim 24, wherein implementing said LDAP operation request to said second LDAP site comprises:generating a database instruction that is specific to schema and data organizations at said second LDAP site.
  • 35. The computer program product of claim 34, wherein said process further comprises:accessing metadata at said second LDAP site to generate said database instruction.
  • 36. The computer program product of claim 24, wherein replicating said change log entry comprises:copying said change log entry from said change log to a first replication log at said first LDAP site; and replicating said change log entry from said first replication log at said first LDAP site to a second replication log at said second LDAP site.
  • 37. The computer program product of claim 24, wherein said LDAP directory data is synchronously replicated.
  • 38. The computer program product of claim 37, wherein said process further comprises:monitoring said change log for addition of said change log entry.
US Referenced Citations (40)
Number Name Date Kind
4714995 Materna et al. Dec 1987 A
5295256 Bapat Mar 1994 A
5442780 Takanashi et al. Aug 1995 A
5454106 Burns et al. Sep 1995 A
5471613 Banning et al. Nov 1995 A
5499371 Henninger et al. Mar 1996 A
5615362 Jensen et al. Mar 1997 A
5664173 Fast Sep 1997 A
5706506 Jensen et al. Jan 1998 A
5799306 Sun et al. Aug 1998 A
5806074 Souder et al. Sep 1998 A
5809502 Burrows Sep 1998 A
5884324 Cheng et al. Mar 1999 A
5937409 Wetherbee Aug 1999 A
5937414 Souder et al. Aug 1999 A
5963932 Jakobsson et al. Oct 1999 A
5995999 Bharadhwaj Nov 1999 A
6009422 Ciccarelli Dec 1999 A
6012067 Sarkar Jan 2000 A
6016499 Ferguson Jan 2000 A
6029178 Martin et al. Feb 2000 A
6047284 Owens et al. Apr 2000 A
6052681 Harvey Apr 2000 A
6058401 Stamos et al. May 2000 A
6078925 Anderson et al. Jun 2000 A
6085188 Bachmann et al. Jul 2000 A
6122258 Brown Sep 2000 A
6122630 Strickler et al. Sep 2000 A
6131098 Zellweger Oct 2000 A
6134559 Brumme et al. Oct 2000 A
6154743 Leung et al. Nov 2000 A
6163776 Periwal Dec 2000 A
6178416 Thompson et al. Jan 2001 B1
6189000 Gwertzman et al. Feb 2001 B1
6199062 Byrne et al. Mar 2001 B1
6301589 Hirashima et al. Oct 2001 B1
6338092 Chao et al. Jan 2002 B1
6356913 Chu et al. Mar 2002 B1
6412017 Straube et al. Jun 2002 B1
6453310 Zander Sep 2002 B1
Non-Patent Literature Citations (14)
Entry
Microsoft Computer Dictionary, Microsoft Press, Third Edition, p.292.*
Innosoft International, Inc., “LDAP FAQ”, Jun. 1997, pp. 1-7; http://www.critical-angle.com/ldapworld/dapfaq.html.
W. Yeong et al., “Lightweight Directory Access Protocol”, Mar. 1995, pp. 1-20; http://www.umich.edu/˜dirsvcs/ldap/doc/rfc/rfc1777.txt.
T. Howes, “A String Representation of LDAP Search Filters”, Dec. 1993, pp. 1-3; http://www.umich.edu/˜dirsvcs/ldap/doc/rfc/rfc1558.txt.
T. Howes et al., “The String Representation of Standard Attribute Syntaxes”, Mar. 1995, pp. 1-11; http://www.umich.edu/˜dirsvcs/ldap/doc/rfc/rfc1778.txt.
S. Kille, “A String Representation of Distinguished Names”, Mar. 1995, pp. 1-8; http://www.umich.edu/˜dirsvcs/ldap/doc/rfc/rfc1779.txt.
A. Young, “Connection-less Lightweight Directory Access Protocol”, Jun. 1995, pp. 1-8; http://www.umich.edu/˜dirsvcs/ldap/doc/rfc/rfc1798.txt.
T. Howes et al., “The LDAP Application Program Interface”, Aug. 1995, pp. 1-20; http://www.umich.edu/˜dirsvcs/ldap/doc/rfc/rfc/rfc1823.txt.
T. Howes et al., “An LDAP URL Format”, Jun. 1996, pp. 1-4; http://www.umich.edu/˜dirsvcs/ldap/doc/rfc/rfc1959.html.
ldap.support@umich.edu, “Referrals Within the LDAPv2 Protocol”, Aug. 1996, pp. 1-4; http://www.umich.edu/˜dirsvcs/ldap/doc/other/ldap-ref.html.
Innosoft International, Inc., “Lightweight Directory Access Protocol (Version 3) Specifications”, Dec. 1998, pp. 1-3; http://www.critical-angle.com/ldapworld/ldapv3.html.
T.A. Howes et al., “A Scalable, Deployable, Directory Service Framework for the Internet”, Aug. 1995, pp. 1-2; http://info.isoc.org/HMP/PAPER/173/abst.html.
EIN-DOR, Phillip et al., “Natural Language Access to Multiple Databases: A Model and a Prototype”, Journal of Management Information Systems, Summer 1995, vol. 12, No. 1, pp. 171-197.
Ozsoyoglu, Gultekin et al., “Query Processing Techniques in the Summary-Table-by-Example Database Query Language”, ACM Transactions on Database Systems, vol. 14, No. 4, Dec. 1989, pp. 526-573.