Claims
- 1. A method for replicating a multithreaded application program using a semi-active or passive replication strategy, wherein said application program executes under the control of an operating system having a thread library, the method comprising:
at a Primary replica, piggybacking mutex ordering information onto regular multicast messages specifying the order in which threads in the Primary replica have been granted their claims to mutexes; and at a Backup replica, receiving said messages containing said mutex ordering information which determines the order in which threads in said Backup replica are granted mutexes.
- 2. A method as recited in claim 1, further comprising:
employing thread library interpositioning to intercept calls to functions in the operating system's thread library.
- 3. A method as recited in claim 1, wherein said messages are multicast according to a protocol that delivers messages reliably and in the same order from the Primary replica to said Backup replicas.
- 4. A method as recited in claim 1, wherein strong replica consistency is maintained without
counting the number of instructions between non-deterministic events, additional messages for claiming, granting and releasing each mutex, and risk that a result might be communicated to a client but the Backup replicas might lack ordering information necessary for reproducing said result.
- 5. A method for replicating a multithreaded application program using the semi-active or passive replication strategy, wherein said application program executes under the control of an operating system having a thread library, the method comprising:
employing thread library interpositioning to intercept calls to functions in the operating system's thread library to render said application program virtually deterministic.
- 6. A method as recited in claim 5, further comprising:
at a Primary replica, piggybacking mutex ordering information onto regular multicast messages specifying the order in which threads in the Primary replica have been granted their claims to mutexes; and at a Backup replica, receiving said messages that determine the order in which threads in said Backup replica are granted mutexes.
- 7. A method for replicating a multithreaded application program using the leader-follower strategy of semi-active or passive replication, wherein said application program executes under the control of an operating system having a thread library, the method comprising:
at a Primary replica, piggybacking mutex ordering information onto regular multicast messages specifying the order in which threads in the Primary replica have been granted mutexes; at a Backup replica, receiving said messages that determine the order in which threads in said Backup replica are to claim mutexes; and employing thread library interpositioning to intercept calls to functions in the operating system's thread library for performing said piggybacking and for controlling said order in which threads in said Backup replica are granted their claims to mutexes.
- 8. A method as recited in claim 7, wherein if the Primary replica does not have a regular message to multicast, it multicasts a control message containing said mutex ordering information.
- 9. A method as recited in claim 7, wherein a thread in said Backup replica is not allowed to claim a given mutex, for a given claim, until said Backup replica receives a multicast message that contains said mutex ordering information for said claim from said Primary replica.
- 10. A method for replicating a multithreaded application program using a semi-active or passive replication strategy, wherein said application program executes under the control of an operating system having a thread library, the method comprising:
providing a consistent multithreading library that is interposed ahead of said operating system's thread library so that calls to functions of the operating system's thread library can be intercepted to render said application program virtually deterministic.
- 11. A method as recited in claim 10, wherein said virtual determinism enables strong replica consistency to be maintained.
- 12. A method as recited in claim 11, wherein said consistent multithreading library contains wrapper functions for intercepting calls to functions in said operating system's thread library.
- 13. A method as recited in claim 12, wherein said application program invokes said wrapper functions of said consistent multithreading library instead of the corresponding functions of said operating system's thread library.
- 14. A method as recited in claim 13, wherein in response to a Primary replica invoking a function of said consistent multithreading library to claim a mutual exclusion construct (mutex), said function invokes the corresponding function of the operating system's thread library to claim said mutex and piggybacks mutex ordering information onto regular messages multicast to Backup replicas.
- 15. A method as recited in claim 14, wherein the invocation of a claim function to claim a mutex by a thread in said Primary replica comprises invoking a claim function of said consistent multithreading library and subsequently piggybacking ordering information onto the next message multicast.
- 16. A method as recited in claim 11, wherein said consistent multithreading library mechanisms allow concurrency of threads that do not simultaneously acquire the same mutex.
- 17. A method as recited in claim 11:wherein if the application program runs on an operating system that provides Dynamically Linked Libraries (DLL), the DLL mechanisms are used to interpose the consistent multithreading library ahead of said operating system's thread library; and wherein said interpositioning causes the application program to invoke said functions of said consistent multithreading library, instead of the corresponding functions of said operating system's thread library.
- 18. A method as recited in claim 17, further comprising inserting a command into the makefile for said application program directing the linker to interpose said consistent multithreading library ahead of said operating system's thread library.
- 19. A method as recited in claim 10, further comprising:
communicating mutex ordering information as messages from said primary replica to said Backup replica specifying the order in which threads in said Primary replica have been granted mutexes to establish the order in which threads in said Backup replica are to claim mutexes; wherein said message are communicated using a reliable source-ordered multicast group communication protocol; wherein said multicast protocol delivers messages reliably and in the same source-order to the Backup replicas; wherein said mutexes are granted in the same source-order to the threads at said Backup replicas.
- 20. A method as recited in claim 19, wherein said communicating mutex ordering information comprises piggybacking mutex ordering information onto regular multicast messages.
- 21. A method as recited in claim 20, wherein if the Primary replica does not have a regular message to multicast, it multicasts a control message containing said mutex ordering information.
- 22. A method as recited in claim 19, wherein said communicating said mutex ordering information comprises multicasting two messages, one message multicast by a first Primary replica to the replicas of other processes, objects or components and another message multicast by a second Primary replica of said other processes, objects or components to first said Primary replica and its Backup replicas.
- 23. A method of achieving strong replica consistency for a replicated multithreaded application programs using the semi-active or passive replication strategy, comprising:
sanitizing multithreaded application programs by masking multithreading as a source of non-determinism to render said replicated multithreaded application program virtually deterministic.
- 24. A method as recited in claim 23, wherein said sanitizing comprises:
piggybacking mutex ordering information onto regular multicast messages from a Primary replica that specifies the order in which threads in the Primary replica have been granted mutexes; and delivering said messages to a Backup replica that determine the order in which threads in said Backup replica are granted the mutexes that they claim.
- 25. A method as recited in claim 24, wherein said delivering of messages comprises delivering of two multicast messages, one message multicast by a first Primary replica to the replicas of other processes, objects or components and one message multicast by a second Primary replica of said other processes, objects or components to first said Primary replica and its Backup replicas.
- 26. A method as recited in claim 23, wherein if said Primary replica does not have a regular message to multicast, it multicasts a control message containing said mutex ordering information.
- 27. A method as recited in claim 24, wherein said delivering using a reliable source-ordered multicast group communication protocol to deliver messages from the Primary replica to the Backup replica.
- 28. A method as recited in claim 24, further comprising employing thread library interpositioning to intercept calls to functions of the operating system's thread library for performing said piggybacking and for controlling said order in which threads in said Backup replica are granted the mutexes that they claim.
- 29. A method as recited in claim 28, wherein a thread T in said Backup replica is not allowed to claim a given mutex M, for a given Nth time that thread T has claimed any mutex, until it receives said message from said Primary replica that contains the ordering information (T, M, N).
- 30. A method for replicating a multithreaded application program using a semi-active or passive replication strategy, wherein said application program executes under the control of an operating system, said method comprising:
using a multicast group communication protocol to render the multithreaded application program virtually deterministic.
- 31. A system as recited in claim 30, wherein said virtual determinism enables strong replica consistency to be maintained.
- 32. A method as recited in claim 31, further comprising:
using a mutex to protect shared resources accessed by threads in said application program; wherein said threads are granted access to said shared resources in the same order at the replicas of said application program.
- 33. A method as recited in claim 31, further comprising:
intercepting calls to the functions of the operating system's thread library; and multicasting ordering information from said Primary replica to said Backup replicas, regarding the order in which threads in the Backup replicas are to be granted their claims to mutexes.
- 34. A method as recited in claim 33, wherein said ordering information describes the order in which threads in said Primary replica are granted their claims to mutexes and which is delivered reliably and in the same order to said Backup replicas.
- 35. A method as recited in claim 33, wherein said multicasting of ordering information comprises piggybacking said ordering information onto regular messages that are multicast from said Primary replica to said Backup replicas.
- 36. A method as recited in claim 35, wherein said means of multicasting and piggybacking said ordering information uses two multicast messages, one message multicast by first said Primary replica to the replicas of other processes, objects or components and one message multicast by second Primary replica of said other processes, objects or components to first said Primary replica and its Backup replicas.
- 37. A method as recited in claim 35, wherein if said Primary replica does not have a regular message to multicast, it multicasts a control message containing said ordering information.
- 38. A method as recited in claim 30, further comprising:
maintaining strong replica consistency and application transparency by interpositioning a consistent multithreading library ahead of the operating system's thread library and intercepting calls to functions in said operating system's thread library.
- 39. A method as recited in claim 38,
wherein functions of said operating system's thread library are wrapped by functions of said consistent multithreading library; wherein the application program invokes the wrapper functions of said consistent multithreading library, instead of the corresponding functions of said operating system's thread library, thereby maintaining strong replica consistency and application transparency.
- 40. A method as recited in claim 39, wherein said wrapping is performed by dynamically linking said consistent multithreading library to said application program.
- 41. A software mechanism for replicating a multithreaded application program using a semi-active or passive replication strategy, wherein said application program executes under the control of an operating system having a thread library, the mechanism comprising:
control program code; said control program code at a Primary replica, being configured to piggyback mutex ordering information onto regular multicast messages for specifying the order in which threads in the Backup replicas are granted their claims to mutexes; said control program code configured to deliver said control messages using a multicast group communication protocol that delivers the messages in an order that determines the order in which the threads in different replicas are granted their claims to mutexes.
- 42. A software mechanism as recited in claim 41, further comprising a consistent multithreading library containing said control program code that is interpositioned for intercepting calls to functions of the operating system's thread library.
- 43. A software mechanism for replicating a multithreaded application program subject to a semi-active or passive replication strategy, wherein said application program executes under the control of an operating system having a thread library, the mechanism comprising:
a consistent multithreading library interpositioned to intercept calls to functions of the operating system's thread library.
- 44. A software mechanism as recited in claim 43, further comprising:
control program code within said consistent multithreading library; said control program code configured to cause mutex ordering information to be piggybacked on messages multicast from the Primary replica to the Backup replicas, which information specifies the order in which threads in said Primary replica claimed, and were granted, mutexes; said control program code configured to receive said messages by Backup replicas from said multicast group communication protocol that delivers the messages in an order that determines the order in which the corresponding threads in said Backup replicas are granted corresponding claims to mutexes.
- 45. A software mechanism for replicating a multithreaded application program using a semi-active or passive replication strategy, wherein said application program executes under the control of an operating system having a thread library, the mechanism comprising:
a consistent multithreading library that is interposed ahead of the operating system's thread library so that calls to functions of said operating system's thread library can be intercepted to render said application program virtually deterministic.
- 46. A software mechanism as recited in claim 45, wherein said virtual determinism enables strong replica consistency to be maintained.
- 47. A software mechanism as recited in claim 46, wherein said consistent multithreading library contains wrapper functions for intercepting calls to functions of said operating system's thread library.
- 48. A software mechanism as recited in claim 47, wherein said application program invokes said wrapper functions of said consistent multithreading library instead of the corresponding functions of said operating system's thread library.
- 49. A software mechanism as recited in claim 48, wherein when a Primary replica invokes a function of the consistent multithreading library to claim a mutex, said consistent multithreading library function invokes the corresponding function of said operating system's thread library and subsequently piggybacks ordering information onto the next message that it multicasts.
- 50. A software mechanism as recited in claim 49, wherein said multicasting said ordering information uses two multicast messages, one message multicast by first said Primary replica to the replicas of other processes, objects or components and one message multicast by second Primary replica of said other processes, objects or components to first said Primary replica and its Backup replicas.
- 51. A software mechanism as recited in claim 49, wherein if said Primary replica does not have a regular message to multicast, it multicasts a control message containing said ordering information.
- 52. A software mechanism as recited in claim 51, wherein said message is multicast using a reliable source-ordered multicast group communication protocol.
- 53. A software mechanism as recited in claim 51:wherein said source-ordered multicast protocol delivers messages reliably and in the same source order from the Primary replica to the Backup replicas; and wherein the mutexes are granted in the same order to the threads in the Backup replicas as in the Primary replica, as dictated by the ordering information piggybacked within said messages.
- 54. A software mechanism as recited in claim 45:wherein if the application program runs on an operating system that provides Dynamically Linked Libraries (DLL), the dynamic linking mechanisms are used to interpose the consistent multithreading library ahead of the operating system's thread library; and wherein said interpositioning causes the application program to invoke the functions of the consistent multithreading library, rather than the corresponding functions of the operating system's thread library.
- 55. A software mechanism as recited in claim 46, further comprising dynamically linking said consistent multithreading library to said application program and interposing said consistent multithreading library ahead of said operating system's thread library to maintain transparency to the application program and the operating system.
- 56. A software mechanism for achieving strong replica consistency using a semi-active or passive replication strategy for replicating multithreaded application programs, comprising:
control program code configured to sanitize multithreaded application programs by masking multithreading as a source of non-determinism.
- 57. A software mechanism as recited in claim 56, further comprising library interpositioning of said control program code to intercept calls to functions of the operating system's thread library for sanitizing said multithreaded application programs.
- 58. A software mechanism as recited in claim 57, wherein said control program code comprises a consistent multithreading library containing wrapper functions for said functions of said operating system's thread library that claim and release mutexes.
- 59. A software mechanism as recited in claim 57, wherein when a Primary replica invokes a wrapper function of said consistent multithreading library to claim a mutex, said consistent multithreading library function invokes the corresponding function of said operating system's thread library and then piggybacks ordering information onto the next message that it multicasts to the Backup replicas.
- 60. A software mechanism as recited in claim 59, wherein said multicasting uses two multicast messages, one message multicast by first said Primary replica to the replicas of other processes, objects or components and one message multicast by second Primary replica of said other processes, objects or components to first said Primary replica and its Backup replicas.
- 61. A software mechanism as recited in claim 59, wherein if said Primary replica does not have a regular message to multicast, it multicasts a control message containing said ordering information.
- 62. A software mechanism for replicating a multithreaded application program using a semi-active or passive replication strategy, wherein said application program executes under the control of an operating system having a thread library, the mechanism comprising:
a consistent multithreading library that is interposed ahead of said operating system's thread library; wherein said consistent multithreading library contains wrapper functions for functions of said operating system's thread library; wherein said wrapper functions ensure that the threads in the replicas are granted their claims to mutexes in the same order, and similarly for releasing mutexes; and wherein said application program invokes the wrapper functions of said consistent multithreading library instead of the corresponding functions in said operating system's thread library.
- 63. A software mechanism as recited in claim 62, wherein when a Primary replica invokes a function of said consistent multithreading library to claim a mutex, said function invokes the claim function of the operating system's thread library and subsequently piggybacks mutex ordering information onto the next regular message that it multicasts.
- 64. A software mechanism as recited in claim 63, wherein said regular message is multicast using a reliable source-ordered multicast group communication protocol.
- 65. A software mechanism as recited in claim 64:wherein said multicast protocol delivers messages reliably and in the same source order from the Primary replica to said Backup replicas; and wherein the mutexes are granted in the same order to the threads in said Backup replicas as dictated by the mutex ordering information piggybacked onto said multicast messages.
- 66. A software mechanism as recited in claim 65, wherein said granting of mutexes in the same order maintains strong replica consistency.
- 67. A software mechanism as recited in claim 66, wherein said multicasting uses two multicast messages, one message multicast by first said Primary replica to the replicas of other processes, objects or components and one message multicast by second Primary replica of said other processes, objects or components to first said Primary replica and its Backup replicas.
- 68. A software mechanism as recited in claim 67, wherein if said Primary replica does not have a regular message to multicast, it multicasts a control message containing said mutex ordering information.
- 69. A software mechanism as recited in claim 62:wherein if the application program runs on an operating system that provides Dynamically Linked Libraries, the dynamic linking mechanisms are used to interpose the consistent multithreading library ahead of the operating system's thread library; and wherein said interpositioning causes the application program to invoke the functions of said consistent multithreading library, rather than the corresponding functions of said operating system's thread library.
- 70. A software mechanism as recited in claim 62, further comprising dynamically linking said consistent multithreading library to said application program and interposing said consistent multithreading library ahead of said operating system's thread library to maintain transparency to the application program and the operating system.
- 71. A software mechanism for replicating a multithreaded application program using a semi-active or passive replication strategy, wherein said application program executes under the control of an operating system, said mechanism comprising:
control program code; said control program code configured to use mutex ordering information piggybacked on regular messages multicast by a source-ordered group communication protocol from the Primary replica, which dictates the order in which the threads in the Backup replicas are granted their claims to mutexes, to render the replicated multithreaded application program virtually deterministic.
- 72. A software mechanism as recited in claim 71, wherein if said Primary replica does not have a regular message to multicast, it multicasts a control message containing said mutex ordering information.
- 73. A software mechanism as recited in claim 71, wherein said control program code is configured to intercept calls to the operating system's thread library.
- 74. A software mechanism as recited in claim 73, wherein strong replica consistency and application transparency are maintained by interpositioning said consistent multithreading library ahead of said operating system's thread library and intercepting calls to functions of said operating system's thread library.
- 75. A software mechanism as recited in claim 74,
wherein said functions of said operating system's thread library are wrapped by functions of said consistent multithreading library; and wherein said application program invokes the wrapper functions of said consistent multithreading library, instead of the corresponding functions of said operating system's thread library, thereby maintaining strong replica consistency and application transparency.
- 76. A software mechanism as recited in claim 71:wherein said control program code is configured to allow concurrent processing of threads that do not claim the same mutex simultaneously and threads that claim different mutexes; wherein strong replica consistency is maintained.
- 77. A software mechanism as recited in claim 71:wherein said control program code is configured to allow threads to communicate with each other by multicasting messages; wherein said control program code is configured to allow threads to use shared resources; and wherein strong replica consistency of the different replicas is maintained.
- 78. A system for executing threads that share resources, within a computing environment that supports semi-active or passive replication of multithreaded application programs, comprising:
means for identifying requests for accesses to shared resources by threads in the Primary replica; means for communicating to one or more Backup replicas the order in which said requests are granted to threads in said Primary replica; and means for ordering and granting requests for accesses to shared resources by threads in a Backup replica, in response to the order in which corresponding requests were granted to threads in said Primary replica and communicated by said Primary replica to said Backup replica.
- 79. A system as recited in claim 78, wherein said Primary replica dictates the order in which said threads in said Backup replicas are granted access to shared resources, as communicated by said Primary replica to said Backup replicas.
- 80. A system as recited in claim 78, wherein control programming for said means for communicating and said means for ordering and granting are contained in, or are invoked by, functions of a consistent multithreading library.
- 81. A system as recited in claim 80, wherein to render said application programs virtually deterministic in a transparent manner, said system employs library interpositioning to intercept calls to functions of the operating system's thread library and to direct them to said functions of said consistent multithreading library, which in turn invoke said functions of the operating system's thread library.
- 82. A system as recited in claim 81, further comprising dynamically linking said consistent multithreading library to said application program and interposing said consistent multithreading library ahead of said operating system's thread library.
- 83. A system as recited in claim 81, further comprising inserting a command into the makefile for the application program directing the linker to interpose said consistent multithreading library ahead of said operating system's thread library.
- 84. A system as recited in claim 81, wherein functions of said consistent multithreading library are configured as a set of functions incorporated within the operating system's thread library.
- 85. A system as recited in claim 78, wherein said means for communicating comprises a reliable source-ordered multicast protocol which guarantees that said Backup replicas receive the messages from said Primary replica in an identical order, as multicast by said Primary replica.
- 86. A system as recited in claim 85, wherein said means of communicating information, about claims for shared resources by threads in said Primary replica and about the order in which said claims were granted, comprises piggybacking said information on a message, multicast by said Primary replica to its Backup replicas.
- 87. A system as recited in claim 85, wherein said means of communicating information comprises piggybacking information, about claims for shared resources by threads in first said Primary replica and about the order in which said claims were granted, on two or more messages, one message multicast by first said Primary replica to the replicas of other processes, objects or components and one message multicast by second Primary replica of said other processes, objects or components to first said Primary replica and its Backup replicas.
- 88. A system as recited in claim 85, wherein lacking regular multicast messages on which to piggyback ordering information, said means for communicating is configured to multicast a control message containing information about claims for shared resources by the threads in said Primary replica and about the order in which said claims were granted.
- 89. A system as recited in claim 78, wherein a shared resource comprises data configured for being shared between threads in a given replica or code sections configured for manipulating shared data or both.
- 90. A system as recited in claim 78, wherein said shared resource is configured for being accessed by a thread utilizing a mutual exclusion construct (mutex).
- 91. A system as recited in claim 90, wherein a request by a thread of said Backup replica to access a mutual exclusion construct is not granted until the message from the Primary replica that contains information about the ordering and granting of said request is delivered to said Backup replica.
- 92. A system as recited in claim 78, wherein said means for communicating to multiple replicas comprises a computing environment configured for providing reliable source-ordered multicasting of messages to Backup replicas in response to grants of requests to access shared resources by threads in said Primary replica.
- 93. A system as recited in claim 78, wherein said means for communicating, and said means for ordering and granting, comprise functions that maintain strong replica consistency
- 94. A system as recited in claim 93, wherein said means for granting accesses to shared resources to threads in a Backup replica comprises a computing environment that grants said accesses, based on the availability of said resources and on the order in which accesses to corresponding resources were granted to threads in said Primary replica and were communicated by said Primary replica and received by said Backup replica from said means for communicating.
- 95. A system as recited in claim 94, wherein within said Backup replica said means for ordering and granting is configured to grant a specific thread access to a specific shared resource for a specific claim if said Primary replica has previously communicated that the corresponding thread in said Primary replica has been granted access to the corresponding shared resource for the corresponding claim.
- 96. A system as recited in claim 95:wherein said access to said shared resource is controlled by a mutual exclusion construct; wherein a thread in said Backup replica is not granted said mutual exclusion construct for a given claim until said Primary replica has communicated that the corresponding thread in said Primary replica has been granted access to the corresponding shared resource for the corresponding claim.
- 97. A system for maintaining strong replica consistency of replicas of a multithreaded application program within a computing environment, using semi-active or passive replication, comprising:
means for communicating the order in which access to a shared resource is granted to a thread in the Primary replica; and means for ordering and granting access to a shared resource to threads in a Backup replica in response to the order of granting access to a corresponding shared resource by a corresponding thread in said Primary replica.
- 98. A system as recited in claim 97, wherein said Primary replica dictates the order in which said threads in said Backup replica are granted access to shared resources, as communicated by said means of communicating to said Backup replica.
- 99. A system as recited in claim 98, wherein said means for granting access to shared resources comprises a computing environment that grants said access to said shared resources, based on the availability of said shared resources and on the order in which corresponding accesses to shared resources were granted to the corresponding thread at said Primary replica and were communicated by said Primary replica and received by said Backup replica from said means for communicating.
- 100. A system as recited in claim 97, wherein said means for determining the order in which threads can access shared resources comprises a mutual exclusion construct that is granted to a thread in response to a claim to access the resource, and which is then later released by said thread allowing said mutual exclusion construct to be claimed by other threads.
- 101. A system as recited in claim 100, wherein said means for communicating is configured to communicate the order in which threads in said Primary replica are granted said mutual exclusion construct.
- 102. A system as recited in claim 100, wherein said means for ordering and granting of accesses to threads in a Backup replica is configured to grant a mutual exclusion construct to said Backup replica as determined by the order in which the corresponding mutual exclusion construct was granted to the corresponding thread in said Primary replica, which order was communicated by said Primary replica to said Backup replica.
- 103. A system as recited in claim 102, wherein a thread in said Backup replica is not granted a mutual exclusion construct for a given claim until said Primary replica has communicated that the corresponding thread in said Primary replica has been granted the corresponding mutual exclusion construct to the corresponding shared resource for the corresponding claim.
- 104. A system as recited in claim 97, wherein said means for communicating, and said means for ordering and granting, comprise functions that maintain strong replica consistency and are executed in response to calls to functions of a consistent multithreading library.
- 105. A system as recited in claim 104, wherein said functions of said consistent multithreading library are configured to intercept calls to corresponding functions of the operating system's thread library.
- 106. A system as recited in claim 105, further comprising dynamically linking said consistent multithreading library to said application program and interposing said consistent multithreading library ahead of said operating system's thread library.
- 107. A system as recited in claim 105, further comprising inserting a command into the makefile for the application program directing the linker to interpose said consistent multithreading library ahead of said operating system's thread library.
- 108. A system as recited in claim 104, wherein functions of said consistent multithreading library are configured as a set of functions incorporated within the operating system's thread library.
- 109. A system as recited in claim 104, wherein said functions of said consistent multithreading library comprise functions configured for claiming or releasing shared resources in a manner in which corresponding threads in different replicas are granted access to the shared resource in an identical order.
- 110. A system as recited in claim 97, wherein said means for ordering and granting resource accesses in each said Backup replica is configured to order and grant said accesses in response to the order in which information about the claiming and granting of corresponding accesses to shared resources by corresponding threads in said Primary replica is communicated by said Primary replica to said Backup replica.
- 111. A system as recited in claim 97, wherein said means for ordering and granting access to shared resources by threads in said Backup replica is configured to prevent said granting of shared resources until information about the claiming and granting of corresponding accesses to shared resources by corresponding threads in said Primary replica has been communicated to said Backup replica by said Primary replica.
- 112. A system as recited in claim 111, wherein said means for communicating to multiple replicas comprises a computing environment configured for providing reliable source-ordered multicasting of messages by said Primary replica to said Backup replicas in response to the granting of accesses to shared resources by threads in said Primary replica.
- 113. A system as recited in claim 112, wherein said means for ordering and granting of accesses to shared resources is configured to maintain an order of granting said accesses to threads in said Backup replicas that is identical to the order in which corresponding accesses are granted to threads in said Primary replica and are communicated to said Backup replicas by said Primary replica.
- 114. A system as recited in claim 97, wherein said means for communicating to multiple replicas comprises a computing environment configured for providing reliable source-ordered multicasting of messages to Backup replicas in response to granting of accesses of shared resources by threads in said Primary replica.
- 115. A system as recited in claim 114, wherein said means of communicating information, about accesses to shared resources by threads in said Primary replica and about the order in which said accesses were granted, comprises piggybacking said information on a message, multicast by said Primary replica to its own Backup replicas.
- 116. A system as recited in claim 114, wherein said means of communicating said information comprises piggybacking information, about the granting of accesses to shared resources by threads in first said Primary replica and about the order in which said accesses were granted, on two or more messages, one message multicast by first said Primary replica to the replicas of other processes, objects or components and one message multicast by second Primary replica of said other processes, objects or components to first said Primary replica and its Backup replicas.
- 117. A system as recited in claim 114, wherein lacking regular multicast messages on which to piggyback ordering information, said means for communicating is configured to multicast a control message containing information about the granting of corresponding accesses to corresponding shared resources by corresponding threads in said Primary replica and about the order in which said accesses were granted.
- 118. A system as recited in claim 114, wherein said multicast messages comprise information about which shared resource is being claimed by a thread in said Primary replica, which thread is claiming the given shared resource, and which shared resource claim request of said thread is being made.
- 119. A system as recited in claim 97:wherein said means for communicating, and said means for ordering and granting, are configured for being executed transparently to said application program; wherein said transparency comprises the inclusion of said means for communicating and said means for ordering and granting within said computing environment without modifying the code of said multithreaded application program.
- 120. A system as recited in claim 119, further comprising dynamically linking said consistent multithreading library to said application programming and interposing said consistent multithreading library ahead of said operating system's thread library.
- 121. A system as recited in claim 119, further comprising inserting a command into the makefile for the application program directing the linker to interpose said consistent multithreading library ahead of said operating system's thread library.
- 122. A system as recited in claim 119, wherein functions of said consistent multithreading library are configured as a set of functions incorporated within the operating system's thread library.
- 123. A system as recited in claim 97, wherein said means for communicating and said means for ordering and granting are provided by interposing a consistent multithreading library ahead of said operating system's thread library and by intercepting calls of functions of said operating system's thread library and by invoking instead corresponding functions of said consistent multithreading library, which in turn invoke functions of said operating system's thread library.
- 124. A system as recited in claim 97, wherein strong replica consistency can be maintained without the need to count the number of instructions between nondeterministic events .
- 125. A system as recited in claim 97, wherein said computing environment comprises a client-server system or a fault-tolerant system or both.
- 126. A system as recited in claim 97, wherein said shared resource comprises data shared between said threads in a given replica, or code for accessing said shared data in a given replica or both.
- 127. A system for executing a replicated multithreaded application program within a computing environment, using a semi-active or passive replication strategy, comprising:
means for granting access to shared resources to threads in a Backup replica in response to information received about the order in which access to said shared resources was granted to corresponding threads in said Primary replica; and means for communicating the order of granting access of shared resources by the Primary replica to the Backup replicas.
- 128. A system as recited in claim 127, wherein said means for communicating information comprises a routine configured for multicasting messages from said Primary replica to said Backup replicas in response to the granting of accesses of shared resources to threads in said Primary replica.
- 129. A system as recited in claim 128, wherein said means for communicating information, about the order of granting accesses to shared resources, by said Primary replica to said Backup replicas, comprises piggybacking said information on a message, multicast by said Primary replica to its Backup replicas.
- 130. A system as recited in claim 128, wherein said means of communicating information, about the order of granting accesses to shared resources by first said Primary replica to said Backup replicas, comprises piggybacking said information on two or more messages, one message multicast by first said Primary replica to the replicas of other processes, objects or components and one message multicast by second Primary replica of said other processes, objects or components to first said Primary replica and its Backup replicas.
- 131. A system as recited in claim 129, wherein lacking regular multicast messages on which to piggyback ordering information, said means for communicating is configured to multicast a control message containing information about the order of granting accesses to shared resources by the Primary replica to the Backup replicas.
- 132. A system as recited in claim 127, wherein said means for communicating, and said means for ordering and granting, comprise functions that maintain strong replica consistency.
- 133. A system as recited in claim 132, further comprising means for transparently executing said ordering and granting, and said means for communicating, without modifying the code of said application program.
- 134. A system as recited in claim 133, further comprising dynamically linking said consistent multithreading library to said application program and interposing said consistent multithreading library ahead of said operating system's thread library.
- 135. A system as recited in claim 133, further comprising inserting a command into the makefile for said application program directing the linker to interpose said consistent multithreading library ahead of said operating system's thread library.
- 136. A system as recited in claim 133, wherein functions of said consistent multithreading library are configured as a set of functions incorporated within the operating system's thread library.
- 137. A consistent multithreading library of functions for constraining the order of granting accesses to shared resources by threads in a Backup replica to match the order of granting accesses to shared resources by corresponding threads in the Primary replica, within a computing environment, using a semi-active or passive replication strategy, comprising:
a communication routine configured for communicating information, about the order in which said shared resources were granted to threads in said Primary replica, to said Backup replicas within said computing environment; and an ordering and granting routine in said Backup replica configured for granting access to shared resources by threads in said Backup replica in response to the order of granting access to shared resources by threads in said Backup replicas, communicated by said Primary replica to said Backup replica.
- 138. A library as recited in claim 137, wherein said communication routine and said allocation routine are configured as a consistent multithreading library containing functions that intercept calls to functions in the operating system's thread library.
- 139. A library as recited in claim 138, wherein said interception of calls to said operating system's thread library comprises:
performing said communication and said ordering and granting as functions of said consistent multithreading library to constrain the order of granting access to shared resources by threads in said Backup replicas to match the order of granting access to said shared resources by threads in said Primary replica; and invoking functions of said operating system's thread library to grant access to said shared resources, subject to said ordering constraints.
- 140. A library as recited in claim 138, wherein said interception comprises intercepting calls to functions of said operating system's thread library and diverting said calls to wrapper functions of said consistent multithreading library to constrain the granting of access to shared resources prior to invoking functions of said operating system's thread library.
- 141. A library as recited in claim 138, wherein said interception of calls to said operating system's thread library by said wrapper functions of said consistent multithreading library is performed in response to a dynamic linking process in which said consistent multithreading library is interposed ahead of said operating system's thread library.
- 142. An apparatus for maintaining strong replica consistency for a replicated multithreaded application program in a computer environment under the control of an operating system having a thread library and using the semi-active or passive replication strategy, comprising:
a computer configured for executing said multithreaded application programs; and programming associated with said computer for,
communicating the order of granting accesses to shared resources by threads in a Primary replica to the Backup replicas, and ordering and granting access to shared resources in said Backup replicas in response to the order of granting corresponding accesses to shared resource communicated from said Primary replica to said Backup replicas.
- 143. An apparatus as recited in claim 142, wherein said ordering and granting access comprises constraining the granting of access to the shared resources by the threads in said Backup replicas to match the order of granting the corresponding access to said shared resources, as communicated by said Primary replica to said Backup replicas.
- 144. A media that is computer readable and includes a computer program which, when executed on a computer configured for multithreaded execution and communication with multiple program replicas, causes the computer to execute instructions, comprising:
communicating to multiple replicas the order of granting access to shared resources by threads in a Primary replica; and ordering the granting of accesses to shared resources by threads in a Backup replica in response to the order of granting the corresponding accesses in the Primary replica, communicated by said Primary replica to said Backup replica.
- 145. In a computer system configured for executing a replicated multithreaded application program that executes under the control of an operating system having a thread library, wherein the improvement comprises:
communicating the order of granting accesses to shared resources, to threads in the Primary replica, to the Backup replicas; and ordering the granting of accesses to shared resources, to threads in said Backup replicas, in response to the order communicated by said Primary replica to said Backup replicas.
- 146. A system as recited in claim 145, wherein said means for communicating, and said means for ordering and granting, comprise functions that maintain strong replica consistency.
- 147. An improvement as recited in claim 146, further comprising transparently executing functions of a consistent multithreading library to perform said communicating the order of granting accesses to shared resources from said Primary replica to the Backup replicas; and said ordering the granting of accesses to shared resources to threads in said Backup replicas.
- 148. An improvement as recited in claim 147, wherein said transparent execution comprises interposing said consistent multithreading library ahead of said operating system's thread library
- 149. An improvement as recited in claim 148, further comprising dynamically linking said consistent multithreading library to said application program and interposing said consistent multithreading library ahead of said operating system's thread library.
- 150. An improvement as recited in claim 148, further comprising inserting a command into the makefile for said application program directing the linker to interpose said consistent multithreading library ahead of said operating system's thread library.
- 151. An improvement as recited in claim 147, wherein functions of said consistent multithreading library are configured as a set of functions incorporated within said operating system's thread library.
- 152. A system for maintaining strong replica consistency within a computing environment, using a semi-active or passive replication strategy, wherein threads share resources and execute under the control of an operating system having a thread library, comprising:
a message multicasting mechanism configured for communicating information on the order of granting accesses to said shared resources by threads in the Primary replica to the Backup replicas; and means for ordering and granting accesses to shared resources to threads in said Backup replicas in response to the delivery of messages from said Primary replica to said Backup replicas containing said information.
- 153. A system as recited in claim 152, wherein said computing environment provides multithreading, distributed computing, fault tolerance, and a client-server paradigm.
- 154. A system as recited in claim 153, wherein said means for ordering and granting accesses to shared resources constrains the order of granting access to shared resources by threads in said Backup replicas to match the order of granting said accesses to shared resources in said Primary replica, as communicated from said Primary replica to said Backup replicas.
- 155. A system as recited in claim 152, wherein said shared resources are shared through claiming and releasing functions applied to mutual exclusion constructs for the shared resources.
- 156. A system as recited in claim 152, wherein said means for ordering and granting accesses to shared resources comprises functions of a consistent multithreading library invoked in response to requests to access shared resources.
- 157. A system as recited in claim 156:wherein calls to functions of said operating system's thread library are intercepted and diverted to calls to corresponding functions of said consistent multithreading library; wherein said functions of said consistent multithreading library invoke functions of said operating system's thread library.
- 158. A system as recited in claim 157, further comprising dynamically linking said consistent multithreading library to said application program and interposing said consistent multithreading library ahead of said operating system's thread library.
- 159. A system as recited in claim 157, further comprising inserting a command into the makefile for said application program directing the linker to interpose said consistent multithreading library ahead of said operating system's thread library.
- 160. A system as recited in claim 157, wherein functions of said consistent multithreading library are configured as a set of functions incorporated within said operating system's thread library.
- 161. A system as recited in claim 152, wherein said means for ordering and granting accesses to shared resources, comprises:
communication routines for communicating to said Backup replicas the order of granting access to shared resources by threads in said Primary replica; and ordering the granting of accesses to shared resources to threads in said Backup replicas in response to the order of granting accesses to shared resources, communicated by said Primary replica.
- 162. A system as recited in claim 161, wherein said order of granting accesses to shared resources comprises:
identifying the thread accessing the shared resource; identifying the shared resource being accessed; and identifying the particular access so that multiple accesses to a shared resource from each thread may be distinguished.
- 163. A method of maintaining strong replica consistency for a replicated multithreaded application program, using the semi-active or passive replication strategy, comprising:
granting access requests for shared resources to threads in the Backup replicas in response to the order in which corresponding requests were granted to corresponding threads in the Primary replica.
- 164. A method as recited in claim 163, wherein said granting of access requests is performed by employing library interpositioning to intercept calls to functions of the operating system's thread library.
- 165. A method as recited in claim 164, wherein said shared resources are accessed using a mutual exclusion construct.
- 166. A method as recited in claim 165, wherein said granting of access requests comprises:
piggybacking information, about the order of granting mutual exclusion constructs to threads in said Primary replica, onto regular messages that are multicast from said Primary replica to said Backup replicas; delivering said messages to said Backup replicas, that determine the order in which threads in said Backup replica are granted their claims to mutual exclusion constructs.
- 167. A method as recited in claim 166, wherein said piggybacking of information comprises piggybacking information about claims for shared resources by threads in first said Primary replica and about the order in which said claims were granted, on two or more messages, one message multicast by first said Primary replica to the replicas of other processes, objects or components and one message multicast by second Primary replica of said other processes, objects or components to first said Primary replica and its Backup replicas.
- 168. A method as recited in claim 166, wherein if said Primary replica does not have a regular message to multicast, it multicasts a control message containing said information to said Backup replica.
- 169. A method as recited in claim 168, further comprising a consistent multithreading library executing said ordering and granting of access requests by intercepting calls to said operating system's thread library.
- 170. A method of replicating multithreaded application programs in which threads access shared resources, within a computing environment that uses semi-active or passive replication, comprising:
claiming shared resources by a thread in the Primary replica; granting said claim to said thread in said Primary replica; communicating to the Backup replicas the order of granting said claim; and granting the corresponding claim of a shared resource to a corresponding thread in each Backup replica, as determined by the order in which corresponding claims to shared resources were granted to corresponding threads in said Primary replica.
- 171. A method as recited in claim 170, wherein said claiming, said communicating, and said granting are controlled by the functions of a consistent multithreading library that is interposed ahead of said operating system's thread library so that calls to functions of the operating system's thread library are intercepted to render said application program virtually deterministic.
- 172. A method as recited in claim 171, wherein said shared resources are accessed by using a mutual exclusion construct.
- 173. A method as recited in claim 171, wherein said communicating of said claim comprises piggybacking information onto regular multicast messages specifying the order in which threads in the Primary replica have been granted their claims to said mutual exclusion constructs.
- 174. A method as recited in claim 173, wherein said multicasting comprises piggybacking information, about claims for shared resources by threads in first said Primary replica and about the order in which said claims were granted, on two or more messages, one message multicast by first said Primary replica to the replicas of other processes, objects or components and one message multicast by second Primary replica of said other processes, objects or components to first said Primary replica and its Backup replicas.
- 175. A method as recited in claim 173, wherein if said Primary replica does not have a regular message to multicast, it multicasts a control message containing said order of granting information.
- 176. A method as recited in claim 171:wherein when said thread T of said Primary replica has been granted a mutual exclusion construct M for its Nth claim of any mutual exclusion construct, a message is multicast that contains the ordering information (T, M, N); wherein said granting in said Backup replica comprises granting the corresponding mutual exclusion construct M to the corresponding thread T in said Backup replica for the corresponding claim N, only if said ordering information (T, M, N) from said Primary replica has been delivered to, and received by, said Backup replica.
- 177. A method as recited in claim 176, wherein said claiming of a shared resource by a thread at said Primary replica comprises:
invoking the claim function to claim mutex M for thread T; diverting the invocation from the claim function of the operating system's thread library to the corresponding claim function of the consistent multithreading library; determining the information (T, M, N) for a claim to a mutual exclusion construct by a thread in the Primary replica, wherein T represents the thread making said claim, M represents the mutual exclusion construct being claimed, and N represents the claim number by thread T to access any mutual exclusion construct; granting the mutual exclusion construct M to thread T in the Primary replica; appending the information (T, M, N) to the queue of claims to be multicast to the Backup replicas; and multicasting messages including the piggybacked claim (T,M, N) to the Backup replicas.
- 178. A method as recited in claim 176, wherein said granting of a shared resource to a thread in said Backup replica comprises:
invoking the claim function to claim mutex M for thread T; diverting the invocation from the claim function of the operating system's thread library to the corresponding claim function of the consistent multithreading library; determining the information (T, M, N) for a claim to a mutual exclusion construct by a thread in said Backup replica, wherein T represents the thread in said Backup replica making said claim, M represents the mutual exclusion construct being claimed, and N represents the claim number by thread T to access any mutual exclusion construct; determining if (T, M, N) matches the next grant for mutex M, as directed by said communication from said Primary replica; if so, mutex M is granted to thread T when mutex M is available; and determining that (T, M, N) does not match the next grant for mutex M, according to the order of mutex granting dictated by said Primary replica, wherein said thread T is suspended until (T, M, N) is delivered to, and received by said Backup replica, and is the next grant of mutex M in the order dictated by said Primary replica.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority from U.S. provisional application Ser. No. 60/367,615 filed on Mar. 25, 2002, incorporated herein by reference, and from U.S. provisional application Ser. No. 60/367,616 filed on Mar. 25, 2002, incorporated herein by reference.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
[0002] This invention was made with Government support under Grant No. 70NANBOH3015, awarded by the U.S. Department of Commerce, National Institute of Standards and Technology. The Government may have certain rights in this invention.
Provisional Applications (2)
|
Number |
Date |
Country |
|
60367615 |
Mar 2002 |
US |
|
60367616 |
Mar 2002 |
US |