1) Field of the Invention
The present invention relates generally to the field of databases. Specifically, the present invention relates to various methods of avoiding dangling references.
2) Background
Individuals commonly use numerous computing devices and information appliances to store and communicate information. For example, a person may use portable computing devices such as a smart cell phone or personal digital assistant (PDA) while in transit, and a laptop or desktop computer system while at work and/or at home. Furthermore, with use of such devices and systems, a user may store data to enable viewing of the data by others via an internet or intranet site on a server system. It is common for a data set to be concurrently maintained on a plurality of these devices. For example, a user may maintain a calendar or address book on both a PDA and a desktop or laptop computer system.
The entries in the data set are typically referred to as “records” or “data objects.” When a change is made to a record in the data set residing on one device (hereafter also referred to as a “node”), it is desirable to have the data set on the other “node” be updated as well, so that the data set is synchronized on both nodes. Accordingly, processes have been developed to facilitate synchronization of the data sets at both nodes. However, as computer systems are networked, multiple communication pathways between PDAs and computer systems can exist, and synchronization between multiple devices can become complicated and needs to be supported.
This added complexity is not just a result of the number of computers users typically carry and use, but also results from the complexity of the programs installed and the number of programs being executed at any given time. This increased complexity gives rise to the fact that competing applications are always fighting for the finite resources of the computer systems. Other than the competition for resources, an ever increasing number of applications and programs require access to the same data in a preferably timely manner.
Another consequence of the mixed type of environment that exists between different types of hardware requiring access to the same data is the actual hardware constraints and performance requirements. For example, a home or office computer running a calendar or address book today may have at a minimum, 1 GB of Ram and a 1.8 ghz dual core processor. On the other hand, a PDA device may have only a 400 mhz processor and 28 MB of Ram. These differences in capabilities facilitate a different way of viewing data management on a PDA from a desktop computer.
A major drawback that arises in database design results from the deletion of elements that are referenced or the renaming of elements that are referenced by other data elements. As an example involving a synchronization scheme with multiple devices, a modification or deletion of one object on a particular device may cause a conflict with a modification to the same or similar object on another device. During a synchronization process, conflicts between the objects must be resolved in a manner so that the intended object is properly propagated. During a synchronization process attempting to resolve different versions of data, objects to be discarded are deleted (so that only the proper copy of a particular object remains). However, other data objects on the same or different device may still reference data objects that are deleted. Thus, a situation is created where conflicts (e.g., two differing versions of the same object) are created and data integrity quickly breaks down. Typically, deleting one data element that is referenced by another data element creates what is called a dangling reference. Furthermore, it is possible that a dangling reference may be propagated throughout the system through synchronization processes not attuned to this problem.
What is therefore needed is a method for databases to efficiently store information in their databases so that the data can be accessible to users in as efficient a manner as possible on all platforms that the database may be accessed on. Additionally, to deal with the complex environment and cross-references between items, a method to fix the problem of dangling references is needed.
In accordance with various embodiments of the present invention, a method and system for maintaining data integrity across a plurality of devices using reference identifications is provided. Reference IDs are maintained when a conflict arises and the primary version (the data object which holds the reference ID) is to be superseded by a new version.
In accordance various embodiments of the present invention, a method and system for maintaining data integrity across a plurality of devices using a global synchronization clock per synchronization node is provided.
In accordance with various embodiments of the present invention, a method and system for maintaining data integrity across a plurality of devices using snapshots during synchronization is provided.
In accordance with various embodiments of the present invention, a method and system for maintaining data integrity across a plurality of devices using forwarding deletes is provided.
In accordance various embodiments of the present invention, a method and system for maintaining data integrity across multiple device using update ordering is provided. The above exemplary embodiments are not meant to restrict the scope of this application. Though not enumerated in the above embodiments the programs attempting to access data need not be running locally, but rather may be running on networked device connected by some means either wired or wireless.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
In order to describe the manner in which the above recited and other advantages and features of the invention can be obtained, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof, which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
Various embodiments of the invention are described in detail below. While specific implementations involving electronic mobile devices (e.g., portable computers) are described, it should be understood that the description here is merely illustrative and not intended to limit the scope of the various aspects of the invention. A person skilled in the relevant art will recognize that other components and configurations may be easily used or substituted than those that are described here without parting from the spirit and scope of the invention.
A data object, such as a file or record, typically has two properties of interest, an identity (or identifier) and content. The identity property allows two data objects to be compared to determine whether they represent the same data. Data objects having the same identity may then be compared for content. Thus, for example the same data object O may be stored in a portable computing device (P) and on a desktop computer (D). If data object O is modified on the portable computing device (P) but not on the desktop computer (D), then a situation exists where data objects having the same identity have different content on the respective devices. When a data object is modified in a node, a presumption of priority is made where the modified data object takes precedence over the previous version of the object and that accordingly, descendant data takes precedence over ancestor data.
By including ‘pedigree’ information that preserves edit priority, conflicts in which data objects having the same identity but different content can be resolved. The pedigree may be viewed as a change history of a data item and may include a node identifier indicating the device at which the data item is stored, and a counter such as a sync counter (which increments after each synchronization event). For example, a record created at a desktop computer with node identifier “D” and current sync clock counter at 21 is said to have pedigree D21. During a synchronization event, another computing device such as the portable computing device may receive a copy of the data item. If the data item is then subsequently modified on the portable computing device, the pedigree of the data item stored on the portable computing device is updated to include a new node identifier and counter. For example, the pedigree D21 may become D21P43 where P identifies the portable computing device and 43 is the value of the portable's sync counter at the time of the modification. Another way that the pedigree may be modified is, for example, if the data object is modified after a synchronization event. The sync counter is incremented to 22 for the desktop, and the pedigree is D22. Then, if the object were to be modified again before the next synchronization, the pedigree would not change because the clock has not been ticked. Of course, other schemes of identification can be used in accordance with the present invention. An additional important property of objects that exist in a data store or data stores is that inter object referencing may exist. For example, an object named ‘Honda’; may also share the ‘Cars’ category with other objects within multiple data stores. As such, this object may be related to other objects. Moreover, there may not be a restriction on references, as any object may reference another, which creates interconnections within data stores and throughout multiple data stores. These data stores may not be on separate devices as there may be multiple data stores on an individual device. Furthermore, multiple data stores may also contain the same information gained through synchronization.
As noted above, a plurality of devices in a networked system may be synchronized so that each node of the system obtains the most up-to-date version of a set of data.
A “pedigree” is generally a record of change(s) of a database object, and may be used to indicate if an object has changed since a synchronization session. Commonly owned U.S. patent application Ser. No. 10/159,461, issued as U.S. Pat. No. 7,337,193, filed May 31, 2002, discloses a novel pedigree system that is well suited to embodiments of the present invention, and is hereby incorporated herein by reference in its entirety. It is to be appreciated that embodiments of the present invention are well suited to pedigrees and methods of creating and updating pedigrees other than those described herein and in the aforementioned patent application.
Referring again to
The laptop computer 110, the desktop computer 120, the cellular phone 140a-c, PDA 150a-b and the Tablet PC 160 all can include one or more processors for running an operating system and executing program applications or coprocessors designed to offload specific tasks from the CPU. The devices 110, 120, 140a-c, 150a-b and 160 can include local or attached memory, including, for example, volatile (RAM) and non-volatile (ROM) memory for storing data. In preferred embodiments, the computing devices 110, 120, 140a-c, 150a-b and 160 of the system 100 all operate on a Linux-based platform such as the Access Linux Platform (ALP), but the present invention is also applicable in systems in which other operating systems are adapted for mobile/portable devices and information appliances such as Symbian, Windows Mobile, Blackberry OS, and the like.
Each of the computing devices 110, 120, 140a-c, 150a-b and 160 may execute personal information management (PIM) applications including calendar, address book and email applications. Information input by a user of such applications is initially stored locally in data stores located on each of the devices 110, 120, 140a-c, 150a-b and 160. As discussed further below, the data stores on the devices may be duplicative to the extent that the same data objects are stored in each node (device). For example, appointment information on a personal calendar may be stored on each of the devices for easy access regardless of which device the user is currently operating.
The computing device 140 may be equipped with or coupled to a transceiver 212 enabling a wireless communication link to a wireless base station (not shown). In some embodiments, the computing device 140 may communicate with other devices in the networked system via a base station and gateway (not shown). The transceiver 212 is coupled to host circuitry 214 which may comprise or include a digital signal processor (DSP) for processing received/transmitted data. The host circuitry may also include an asynchronous receiver-transmitter (UART) module that provides serial communication capability via a serial port 216 and IR (infrared) port 218. Alternatively, the processor 202 may perform some or all of the functions performed by the host circuitry 214.
The computing device 140 may also be equipped with a coprocessor 228. A coprocessor would be used to handle advanced application specific tasks that would otherwise take valuable system resources. The coprocessor 228 may also be a DSP used, for example, for audio processing in voice recognition or text to speech systems or a Field Programmable Gate Array (FPGA) that could be programmed to perform application specific tasks.
The computing device 140 might also be equipped with input/output devices including a display device 222 such as a screen or a haptic display, an alphanumeric input/output device 224, and an optional on-screen cursor control 226 for communicating user input and command selection to the processor 204. The input/output device 224 may also be a GPS device or any device that may require some input/output of data connecting to the system following known protocols. In various embodiments, the computing device 140 may include other elements not shown in
In accordance with various embodiments of the present invention, a method for maintaining data integrity across a plurality devices is provided. For example, a data store or collection of data stores may utilize reference IDs. A reference ID is an identification that is maintained for the life of an object and identifies the “proper” object to use. Upon synchronization or any other similar process, conflicts may arise between objects that should be the same, but for a variety of reasons, now contain differing content. Because during a resolution process, one or more of the objects to be resolved is to be deleted, it is necessary to maintain a reference ID and “assign” it to the proper object after resolution.
In accordance with various embodiments of the present invention, each object is provided with a locally unique ID and a globally unique ID. A local ID is a small integer value that is meant to only be used on a local system where the number of entries is generally kept smaller than on a global scale, in contrast a Global ID is a large integer value. The smaller length in size of a Local ID compared with a Global ID enables for increased of ease of use and increased space efficiency as well as shortened processing and search times. The Global ID is the same across all nodes and the local unique ID may vary for different data nodes. All objects define a referenced version that constitutes a valid reference for as long as the object exists.
Referring to
During a synchronization operation, individual data stores may use individual sync clocks. However, this can cause problems if the clocks do not tick in a synchronized fashion, which may skew synchronization results. For example, one possible implementation that exists in some systems is for each data store to have its own sync clock. However, a problem arises where if one clock does not tick at the same time as the clocks in the rest of the devices, dangling references are created. For example, consider a situation where if a new object O1 in data store D1 references a new object O2 in data store D2, and both objects O1 and O2 are added relatively simultaneously. Also consider that object O1 is added before D1's clock ticks, and object O2 is added just after D2's clock ticks. During a synchronization, object O1 will be added whereas object O2 will not be added. This situation creates a dangling reference from object O1 to object O2 when O1 is synchronized with another node.
In accordance with various embodiments of the present invention, synchronized global clocks are used to maintain data integrity.
In accordance with various embodiments of the present invention, a method for using snapshots during synchronization is provided. A snapshot is a pair of pedigrees that define the upper and lower bounds for the priority of objects to be included in a synchronization session.
Unfortunately, it is possible for a modification that occurs after the snapshot is taken to result in the transmission of a dangling reference. For example, consider a situation where a new object O1 in data store D1 references a new object O2 in data store D2. Furthermore, consider that both new objects are initially included in the session snapshot. By definition, if a concurrent process modifies object O2 before it is read from data store D2, then object O2's priority will fall outside the snapshot. Therefore, when object O1 is read from data store D1, it will contain a dangling reference to object O2.
In accordance with various embodiments of the present invention, this problem is avoided by retaining all object versions included in any “open” snapshot. For example, when a change is made to an object included in an open snapshot, a more recent version of the object is created while retaining the existing version. When the snapshot is closed, any “old” object versions are deleted, unless required for another open snapshot. So, transmission of dangling references is avoided. Unnecessary object versions are purged when a snapshot is closed.
In an exemplary embodiment,
Another situation that threatens data integrity is when a data store enforces data constraints. In implementing data constraints, it is possible that the data store may choose to delete new objects that violate those constraints. This deletion will cause any references to the deleted object to “dangle.” For example, consider a situation where there are two categories in two different data stores named “Business” and “Business2”. Then a user renames “Business2” to “Business” in one of the data stores, and synchronizes the two data stores. If we assume in this example that category names are unique, then the other data store must delete one of the categories named “Business.” Unfortunately, this means references to the deleted category will “dangle.” The data store may issue a delete that forwards references from the deleted version of “Business.” This forwarding allows the members of the deleted category to seamlessly become members of the non-deleted category and thus avoids the creation of dangling references.
To avoid this problem or similar problems, in accordance with various embodiments of the present invention, a method and system for maintaining data integrity across a plurality of devices through the use of forwarding deletes is provided. A forwarding delete is whereby, through the resolution of a conflict, a conflicted version of an object is deleted and all references are then forwarded to an extant version of the object. Forwarding deletes may also be used in situations where two similar but different objects exist where one object will replace the other and the references to both objects are to now reference just one. The forwarding process is one where the system maintains knowledge of deleted objects in relation to an extant version of the object and its local ID, thus enabling the system to when receiving a request for a none extant object, to automatically send the extant version.
In accordance with various embodiments of the present invention, maintaining data integrity across multiple devices may utilize update ordering. The order in which object references are added or deleted amongst related data objects is called update ordering. In some embodiments, the nature of the relationship determines the order in which objects are added or deleted. In other embodiments, enabling data stores to indicate that a particular data store is referenced facilitates update ordering.
Object 730 is an object named cars and is meant as a category. Objects 740 and 750 are subcategories of cars, in this case two brands, Toyota and Honda. Objects 760 through 790 are models of cars. As can be seen 760 through 790 reference the brand of car that they belong to and objects 740 and 750 reference 730 showing that they are cars. Arrow 710 shows how the addition process follows in this case a top down process, where objects that are referenced are added first. This prevents dangling references from existing because an object with a reference to another object is not added before the referenced object. In the situation shown
Further, although the invention has been described herein with reference to particular structure, materials and/or embodiments, the invention is not intended to be limited to the particulars disclosed herein. In particular, while the invention has been described with reference to portable devices such as personal digital assistants, mobile phones, smart phones, camera phones, pocket personal computers and the like, the invention applies equally to other devices able to execute software instructions and containing data stores, devices having embedded systems (referred to as ‘information appliances’) including, for example, small televisions, media players, set top boxes, automotive navigation devices, GPS devices and portable gaming devices (e.g., Sony Play Station®), personal computers, servers or any computational device that can execute software. In addition, the invention extends to all functionally equivalent structures, methods and uses, such as are within the scope of the appended claims. Those skilled in the art, having the benefit of the teachings of this specification, may affect numerous modifications thereto and changes may be made without departing from the scope and spirit of the invention.
The present application is a continuation-in-part of co-pending and commonly-assigned U.S. patent application Ser. No. 10/972,965 entitled “Determining Priority Between Data Items in Shared Environments,” U.S. patent application Ser. No. 11/966,950 entitled “Determining Priority Between Data Items,” and U.S. patent application Ser. No. 12/186,535 entitled “Generating Coherent Global Identifiers for Efficient Data Identification,” all of which are incorporated herein by reference. The present application also incorporates by reference the disclosures in U.S. patent application Ser. No. 10/159,077 entitled “Generating Coherent Global Identifiers for Efficient Data Identification” issued as U.S. Pat. No. 6,934,710; U.S. patent application Ser. No. 11/210,023 entitled “Generating Coherent Global Identifiers for Efficient Data Identification” issued as U.S. Pat. No. 7,418,466; and U.S. patent application Ser. No. 10/159,461 entitled “Determining Priority Between Data Items in Shared Environments” issued as U.S. Pat. No. 7,337,193.
Number | Date | Country | |
---|---|---|---|
Parent | 10972965 | Oct 2004 | US |
Child | 12577116 | US | |
Parent | 11966950 | Dec 2007 | US |
Child | 10972965 | US | |
Parent | 12186535 | Aug 2008 | US |
Child | 11966950 | US |