High-speed communications networks are becoming increasingly available at reasonable costs to both enterprise and home users. These networks may enable different users to collaboratively edit shared documents, despite being distant from one another in some cases. Over time, these different users may provide disparate revisions to these shared documents, with these revisions being merged from time to time. In previous approaches, document collaboration systems may employ a single-master model, in which one master version of the shared document serves as the basis for merging subsequent revisions made to that shared document.
Tools and techniques are described for merging versions of documents using multiple masters. These tools may provide methods that include syncing a first peer system with one or more other peer systems, with the peer systems receiving respective instances of a document for collaborative editing. The peer systems may maintain respective version histories of the document, with these version histories capturing and storing revisions occurring locally at the various peer systems. The peer systems may exchange version histories, and merge these version histories. The above-described subject matter may also be implemented as a method, computer-controlled apparatus, a computer process, a computing system, or as an article of manufacture such as a computer-readable medium. These and various other features will be apparent from a reading of the following Detailed Description and a review of the associated drawings.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended that this Summary be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
The following detailed description is directed to technologies for merging versions of documents using multiple masters. While the subject matter described herein is presented in the general context of program modules that execute in conjunction with the execution of an operating system and application programs on a computer system, those skilled in the art will recognize that other implementations may be performed in combination with other types of program modules. Generally, program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the subject matter described herein may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like.
In the following detailed description, references are made to the accompanying drawings that form a part hereof, and which are shown by way of illustration specific embodiments or examples. Referring now to the drawings, in which like numerals represent like elements through the several figures, aspects of tools and techniques for merging versions of documents using multiple masters will be described.
Turning to the peer systems 102 in more detail, the peer systems may include one or more processors 104, which may have a particular type or architecture, chosen as appropriate for particular implementations. The processors 104 may couple to one or more bus systems 106 chosen for compatibility with the processors 104.
The peer systems 102 may also include one or more instances of computer-readable storage media 108, which couple to the bus systems 106. The bus systems may enable the processors 104 to read code and/or data to/from the computer-readable storage media 108. The media 108 may represent storage elements implemented using any suitable technology, including but not limited to semiconductors, magnetic materials, optics, or the like. The media 108 may include memory components, whether classified as RAM, ROM, flash, or other types, and may also represent hard disk drives.
The storage media 108 may include one or more modules of instructions that, when loaded into the processor 104 and executed, cause the peer systems 102 to perform various techniques for merging versions of documents using multiple masters. As detailed throughout this description, these peer systems 102 may provide these services using the components, process flows, and data structures described and illustrated herein.
As an example of these modules of instructions, the storage media 108 may include software elements that provide a multi-master merge service, denoted generally at 110. In general, the peer systems 102 may facilitate interactions with any number of respective users, with examples of users indicated respectively at 112a and 112n (collectively, users 112).
Turning to the server systems 116 in more detail, the server systems may include one or more processors 118, which may have a particular type or architecture, chosen as appropriate for particular implementations. The processors 118 in the server systems 116 may or may not have the same type and architecture as the processors 104 in the peer systems.
The processors 118 may couple to one or more bus systems 120 chosen for compatibility with the processors 118. The bus systems 120 in the server systems 116 may or may not be of the same type and architecture as the bus systems 106 included in the peer systems 102.
The server systems 116 may also include one or more instances of computer-readable storage media 122, which couple to the bus systems 120. The bus systems may enable the processors 118 to read code and/or data to/from the computer-readable storage media 122. The media 122 may represent storage elements implemented using any suitable technology, including but not limited to semiconductors, magnetic materials, optics, or the like. The media 122 may include memory components, whether classified as RAM, ROM, flash, or other types, and may also represent hard disk drives.
The storage media 122 may include one or more modules of instructions that, when loaded into the processor 118 and executed, cause the server systems 116 to perform various techniques for merging versions of documents using multiple masters. For example, the storage medium 122 may include server-side merge services 124, which are operative to provide multi-master merge services in cooperation with the peer-side merge services.
The storage media 122 may include server-side central storage elements 126, which may contain any number of documents or files 128. These files may be shareable across any number of peer systems 102. In the example shown, the server-side merge services 124 may retrieve the shareable files 128 from the storage 126, and provide them to the peer-side merge services 110.
In the example shown, the server systems 116 and the peer systems 102 may communicate over one or more intermediate communications networks 132. In addition, different ones of the peer systems 102 may communicate with one another over the networks 132. These networks 132 may be personal, local area, regional, or global in scope, and may utilize any appropriate communications protocols suitable in different implementations. In addition, the networks 132 may include any number of sub-networks, and may include wired or wireless communications components.
At the peer systems 102, the peer-side merge services 110 may receive the shared files 130, and store them in storage elements 134 maintained locally by different ones of the peer systems 102. As described further in the examples provided below, a given peer system (e.g., 102a) may receive the shared files 130, and may provide them in turn to another peer system (e.g., 102n), as denoted at 136. However, in other scenarios, the peer systems 102n may receive the shared files 136 currently from the server 116.
Having described the overall systems or operating environments 100 in
Before proceeding to
Turning to
In turn, blocks 206a and 206n (collectively, blocks 206) generally represent generating and capturing versions of the shared document or file, with these captured versions incorporating various revisions made locally at the peer systems 102a and 102n. Blocks 206a and 206n may also include storing representations of these versions in the local storage elements 134a and 134n, with
The revisions and versioning represented in blocks 204 and 206 may occur on any number of peer systems 102 over time, with these operations proceeding on different peer systems 102 generally in parallel. However, these operations may not occur necessarily concurrently or simultaneously relative to one another, because peers may go online or offline at arbitrary times.
At any convenient times, two or more peer systems 102 may establish relationships with one another, with these relationships enabling the peer systems to sync versions with one another. blocks 210a and 210n (collectively, blocks 210) as shown in
In general, sync operations refer to two or more peer systems exchanging version information, as formerly captured and represented respectively on the individual peer systems. Once the sync operation is complete between two or more given peer systems, at least some (but not necessarily all) of the peers may contain a complete copy of the version history as combined across all peer systems involved in the sync operation. In some scenarios, complete or incomplete version history may flow in one or both directions between two or more of the peers.
Once the peer systems 102a and 102n have synced with one another and exchanged their version information, these peer systems 102 may then proceed with respective operations to merge this version information, as denoted respectively at blocks 214a and 214n (collectively, blocks 214). In general, the peer systems 102a and 102n may perform these merge operations individually and independently from one another, to create merged versions 216a and 216 (collectively, merged versions 216).
Having described the components and data flows 200 by which various peer systems may receive, store, and merge revisions to shared files in
Turning to
Individual versions 304 and/or revisions 308 may be associated with respective identifiers, with
In example implementations, the identifiers 306 and 310 are globally unique identifiers (GUIDs). It is also noted that these identifiers are unique to a given version, rather than a specific machine. For example, a given version may be created independently on two different machines by a merge process (described below) merging the same past version history information on the two machines. This given version would have the same unique identifier. This affects how those unique identifiers are created. These identifiers 306 may indicate or designate particular instances of stored versions for the purposes of merging the versions, or merging the revisions represented in those versions. These identifiers may also be used to resolve conflicts arising in various versions or revisions. For example, conflicts may arise when different users attempt to revise different portions of a shared file 130 to contain different or contradictory information.
Turning to the peer system 102n in more detail, the local storage elements 134n may store version history records 302b and 302n representing versions generated and stored on the peer system 102n. In the example shown in
The version history records 302b and 302n may also contain any number of representations of particular versions that are captured and stored on the peer system 102n. For example only, but not to limit possible implementations,
It is noted that version histories as stored on different peers may or may not be linear in nature. For example, version histories may be represented, or visualized, as having tree-like structures. These tree structures may include forks, branches, or other features, depending on from where in the version history a given peer branches its revisions.
Having described the data structures and hierarchies 300 and
For convenience of description only, the process flows 400 are discussed in connection with the peer systems 102a and 102n. However, it is noted that implementations of this description may perform these process flows in connection with other systems, without departing from the scope and spirit of this description.
As shown in
When two or more peer systems connect to one another to synchronize, these peer systems may each contain different version history graphs of the same file. However, despite the differences between the version history graphs, new versions are globally unique, and thus do not conflict. Some portions of these history graphs may be shared, but other portions of these graphs may be independent and not shared between the two peer systems. As represented in block 404, the peer systems may share or exchange representations of their version history graphs. In turn, block 406 represents combining these graphs to create a graph containing a complete version history, incorporating revisions made by either of the synchronized peer systems.
It is noted that up to this point in the process flows 400, the version history graph 408 is not yet merged. Put differently, although the synchronized peer systems 102a and 102n are now aware of what revisions have occurred locally on the other peer systems, these revisions have not yet been reconciled into a common version shared across these two peer systems. For an example visual representation of how an unmerged version history may appear,
The version history of shared files or documents (e.g., 130 in
In cases in which the peer systems store only state representations of the shared document, the combined version history graph 408 may contain these state representations. Block 412 represents extracting these state representations from the combined version history graph 408, as represented at 414. Block 412 may also include inferring the deltas associated with the various state representations contained within the version graph, assuming that those deltas are not already stored in the version graph. In turn, block 416 represents expressing these deltas in terms of one or more particular operations. For example, the peer systems may change the value of a given object within the shared document, with changes in the value of this object resulting in new states of the shared document. In this example, the version graph may track the values of this given object as associated with these different states. In such a scenario, block 416 may include identifying what operations at the peer systems resulted in the value of the given object at a given state.
Block 418 represents reducing the version graph to form a version tree. Assume, for example, that the version graph 408 is implemented as a directed acyclic graph. An example of the version graph 408 shown in
A state 502, designated as state S0, may represent an initial state of the shared document. In this initial state, the object X is assumed to have an initial value of 0. A first peer system (e.g., 102a) may change the shared document, as represented by a vector 504, designated as Δ1. The vector 504 represents a state transition from the state 502 to a new state 506, designated as state S1.
Another peer system (e.g., 102n) may also receive the shared document in its initial state 502, and users of this peer system may change the shared document, as represented by a vector 508 (designated Δ2). This vector 508 represents a transition from the initial state 502 to a new state 510, designated as state S2.
From the state 506 (S1), subsequent user changes made at the first peer system may transition from the state to a new state 512, designated as state S3. The changes (or deltas) between the states 506 and 512 are represented by a vector 514, designated as Δ3.
A user at another peer system may receive the document in the state 506 (S1), and may change the document, as represented generally by a vector 516 (designated as Δ4). These user changes may transition the document from the state 506 (S1) to a new state 518 (S4).
From the state 510 (S2), a user at the second peer system may change the document from the state to a new state 520 (S6). The user changes transitioning the document from the states 510 to 520 are represented by a vector 522 (Δ6).
Also from the state 510 (S2), a user at another peer system may receive the document in this state, and may change it, resulting in a transition from the state 510 to a new state 524 (S5). The user changes transitioning the document from the states 510 to 524 are represented by a vector 526 (Δ5).
In the example shown in
In another merge example, assume that the states 518 (S4) and 520 (S6) are to be merged into a new, system generated state 534 (S8). New vector 536 represents system-generated changes, designated at Δ9, transitioning from the state 518 (S4) to the new state 534 (S8). Similarly, new vector 538 represents system-generated changes, designated at Δ10, transitioning from the state 520 (S6) to the new state 534 (S8).
Having created the new merged states 528 (S7) and 534 (S8), another merge example may create a new, system-generated state 540 (S9). A system-generated vector 542 represents system-generated changes, designated at Δ11, transitioning from the state 528 (S7) to the new state 540 (S9), while a system-generated vector 544 represents system-generated changes, designated at Δ12, transitioning from the state 534 (S8) to the new state 540 (S9).
Taking the version history topology shown in
In visually inspecting the topology shown in
The foregoing example may suggest that all multi-master merges may be handled in a three-way merge approach, by carefully selecting the appropriate base for the three-way merge as described in the comments in the last step. However, extending the above example illustrates that some three-way merges may remain problematic, regardless of which base is chosen for the merge. For example, in addition to the object X featured in the previous example, consider another object Y that has an initial value of 0 at state S0. Assume that the change vector Δ2 changes the object Y to have a value of 1, and that the change vector Δ6 resets the value of the object Y back to 0. In this example including both of the objects X and Y, when calculating the values of X and Y in connection with the merge represented at the state 540 (S9), the correct value for Y (i.e., 0) results only if the state 510 (S2) is chosen as the basis for the merge. However, as indicated in the table above, a different state (i.e. the state 506 (S1)) was chosen to obtain the correct value for the object X.
As the above example illustrates with the example topology shown in
The discussion now returns to describing processes for handling the merge to address this issue. As discussed above, the combined version history graph 408 may be implemented as a directed cyclic graph that may be reduced to a tree representation by removing some of the change or delta vectors. The solid and dashed arrows shown in
Turning to
From decision block 604, if the selected leaf now represents a system-calculated merge state, the process flows 600 may take Yes branch 606 to block 608, which represents removing the selected leaf node from the version graph. In turn, block 610 represents removing the delta vectors leading to the removed leaf node.
Decision block 612 represents determining whether the version graph contains any additional system-calculated leaf nodes. Put differently, decision block 612 represents determining whether all leaf nodes remaining in the version graph are fixed states that resulted from actually user input, as distinguished from leaf nodes generated by merge processes. In the notation used for
From decision block 612, the version graph contains additional system-calculated leaf nodes, the process flows 600 may take Yes branch 614 to return to block 602. As described above, block 602 represents selecting another leaf node in the version graph. In turn, the process flows 600 may repeat decision block 604 for the newly-selected leaf node.
From decision block 604, if the leaf node does not represent a system-calculated merge state, the process flows 600 may take No branch 616, which bypasses block 608 and 610 to reach decision block 612. From decision block 612, if the version graph does not contain any additional system-calculated leaf nodes, the process flows 600 may take No branch 618 to block 620. Block 620 represents traversing from the leaf state nodes backward up the version graph. In the example tree topology shown in
Decision block 624 represents determining whether the selected node has two or more immediate parents. From decision block 624, if the selected node has two or more immediate parents, the process flows 600 may take Yes branch 626 to block 628, which represents removing all but one delta vector from the parent nodes. Put differently, block 628 represents reducing the number of parents associated with the selected node to one.
Implementations of the process flows 600 may use any number of techniques to determine which delta vector to retain in block 628. For example, assuming that unique identifiers are associated with the delta vectors, block 628 may include retaining the delta vector having the lowest unique identifier. In general, any approach may be suitable that is uniquely deterministic for all peer systems involved with collaboratively editing a given shared file or document.
Regarding the state 904, this state represents a user-created edit of the merge state 534 (S8).
Referring to
Afterwards, the process flows 600 may return to block 622 to select another node. Returning briefly to decision block 624, if the selected node does not contain two or more immediate parents, the process flows 600 may take No branch 630 to return to block 622.
Having described the process flows 600 shown in
Having described the process flows 600 in
Turning to the process flows 700 in more detail, block 702 represents aggregating a list of all user-created delta operations represented within the reduced version tree output from block 418. In turn, block 704 represents producing a single list of operations, excluding inferred merge deltas (e.g., 530, 532, 536, and 538 in
Block 706 represents ordering the list or table of operations. Some operations in this list or table may depend on earlier or previous operations. For the purposes of this description, but not to limit possible implementations, a given operation is “dependent” on another operation if the given operation was performed with knowledge of the other operation. The other operation occurs in the version history graph before the given operation. Assuming that the given operation and the other operations are represented as Δx and Δy, respectively, if any path from Δx passes through Δy back to the root of the version history graph, then the operations in Δx are dependent on Δy. There may be multiple paths back through the graph, so Δx may depend on several previous operations, in addition to Δy.
Block 706 may include ordering the list or table of operations to account for such dependencies, such that operations dependent on previous operations appear in the list after such previous operations. Block 708 represents referring to the original version of the history graph (e.g., 408 in
Block 710 represents grouping together any independent operations that are performed on the same object or dependent objects. Assuming that these operations are independent and made without knowledge of one another, these operations may potentially conflict with one another. Returning to the previous definitional example involving operations represented at Δx and Δy, topologically, if no path from Δx passes through Δy back to the root of the version history graph, then the operations represented in Δx are independent of the operations represented in Δy. For example, referring back to the examples above regarding the values of the objects X and Y, operations performed on different peer systems may assign conflicting values to these objects. Block 710 may include referring to the original version history graph, as represented at block 708.
If these independent operations conflict with one another, implementations of this description may employ various different approaches to resolve such conflicts. In addition, the grouping performed in block 710 may contribute to the efficiency of such conflict resolution. Block 710 may apply deterministic rules to order the operations included within different groupings. For example, block 710 may include ordering operations based on unique identifiers associated with these operations. For example, as detailed further below, one example of a globally deterministic rule for resolving conflicts may state that the operation from the delta with the lowest unique identifier wins over another delta having a higher unique identifier.
Having described the process flows 700 in
Turning to the process flows 800 and more detail, block 802 represents applying the operations in the order specified by the list output from block 420. In turn, block 804 represents generating the merged state as a result of performing the operations specified in the list.
Decision block 806 represents evaluating whether multiple operations are performed on the same or group of objects. As described above, this scenario may result in conflicting operations being performed on these objects. From decision block 806, if conflicting operations are performed on such objects, the process flows 800 may take Yes branch 808 to block 810, which represents resolving any conflicts.
Depending on the circumstances of particular implementations, any number of different conflict resolution techniques may be appropriate. Particular conflict resolution strategies are not detailed herein, aside from noting that in general, all peer systems participating in merge operations employ the same globally deterministic strategies for resolving conflicts.
In addition, the examples of conflict resolution algorithms described herein operate only on state information from the version history graph, and are deterministic when operating on version history data shared between the peer systems described herein. For example, if two or more delta changes edit the same object, then these edits may conflict. In some cases, the delta changes may be associated with unique identifiers (e.g., a globally unique identifier, or GUID). In such scenarios, the delta change that is associated with the lowest unique identifier may “win” the conflict. In another example of a globally deterministic rule for conflict resolution, “edit” operations may take precedence over “delete” operations. In other scenarios, conflicts may be queued for user resolution.
From decision block 806, if the output of this decision is negative, the process flows 800 may take No branch 812 to block 814, which represents assigning a unique identifier to the merged state generated in block 804. Preferably, this unique identifier is identical across any peer systems calculating the merge state, and results from merge processes that generate the same identifiers deterministically when operating on different peer systems. These approaches may be more efficient than other approaches that generate different identifiers for merges occurring on different peer systems, and then investigate the lineage of these different merged versions to determine whether they are the same or equivalent. For example, referring briefly to
In some implementations, block 814 may include calculating the identifier for the merge state using a well-behaved hash function, which operates on identifiers associated with all states and/or delta changes participating in the merge. However, other techniques for calculating the identifier for the merge state may be appropriate in other applications, provided that the techniques are applied consistently and uniformly across the peer systems participating in the merge, and operate only on data or information shared between the peer systems. As shown in
The above merge algorithm is now described with the following example operations performed on objects X, Y, and Z:
For the purposes of this example, assume that the other deltas contain no operations on the objects X, Y, and Z or their dependents (i.e. the other deltas are independent operations performed on other objects). Using the merge graph topology shown in
The creation of the merged state 910 (S11) as represented by the following notation, in which forks in the version graph are represented by commas, and user-created states that occur sequentially in the version graph also occur sequentially in the notation:
S11 (the merged result state)=(Δ1 (Δ3 Δ11, Δ4 Δ12), Δ2 (Δ5, Δ6))
Aggregating these delta operations into a table according to the algorithm described above results in the following table. More specifically, this table represents aggregating all of the operations from the deltas above. Afterwards, the delta operations are ordered, such that any operations dependent on earlier operations appear after them in the table. Any independent conflicting operations are ordered such that operations having higher precedence (i.e., the operation that “wins” the conflict) appears after operations having lower precedence. For convenience, the table below groups these operations by the object on which the operation was dependent.
The merge process may then calculate the final merged state by traversing through the operations in the list, turning specifically to the three example objects X, Y, and Z:
Having provided the above description, several observations are now noted. The drawings and descriptions thereof are presented in certain orders only for the convenience of description, but not to limit possible implementations. Regarding flow diagrams, the individual processes shown within these diagrams may be performed in orders other than those shown herein without departing from the scope and spirit of this description.
Although the subject matter presented herein has been described in language specific to computer structural features, methodological acts, and computer readable media, it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific features, acts, or media described herein. Rather, the specific features, acts and mediums are disclosed as example forms of implementing the claims.
The subject matter described above is provided by way of illustration only and should not be construed as limiting. Various modifications and changes may be made to the subject matter described herein without following the example embodiments and applications illustrated and described, and without departing from the true spirit and scope of the present invention, which is set forth in the following claims.