This invention relates to performance data access.
Multiple redundant processor systems are implemented as fault-tolerant systems to prevent downtime, system outages, and to avoid data corruption. A multiple redundant processor system provides continuous application availability and maintains data integrity such as for stock exchange systems, credit and debit card systems, electronic funds transfers systems, travel reservation systems, and the like. In these systems, data processing computations can be performed on multiple, independent processing elements of a processor system.
Processors in a multiple redundant processor system can be loosely synchronized in a loose lock-step implementation such that processor instructions are executed at slightly different times. This loosely synchronized implementation provides that each of the processors can execute the same instruction set faster than a typical tight lock-step configuration because the processors are not restricted to synchronized code execution. The performance of a multiple redundant processor system can be monitored to determine optimizations for software processing and for hardware configurations, such as for cache management and configuration to optimize cache hit rates.
When performance data is requested, such as the processing time for a processor event, the loosely-synchronized processor elements all execute the same instruction set in response to the request, but may all return a different performance response because the performance data is likely asymmetric (e.g., different in each of the multiple processor elements). The different data responses will appear as an error to the performance monitoring application that has requested the data.
The same numbers are used throughout the drawings to reference like features and components:
The following describes embodiments of performance data access. Performance monitoring is implemented to obtain system performance data from loosely-synchronized processor elements. Examples of performance data for a redundant processor system include time intervals for performing instruction sequences and counts of various processor events.
Although embodiments of performance data access may be implemented in various redundant processor systems, performance data access is described with reference to the following processing environment.
Processor elements, one each from the processor groups 104(1-3), are implemented together as a logical processor 112(1-N). For example, a first logical processor 112(1) includes processor element 106(1) from processor group 104(1), processor element 108(1) from processor group 104(2), and processor element 110(1) from processor group 104(3). Similarly, logical processor 112(2) includes processor elements 106(2), 108(2), and 110(2), while logical processor 112(3) includes processor elements 106(3), 108(3), and 110(3). In an alternate embodiment, a logical processor 112 may be implemented to include only two processor elements 106. For example, a processor complex may be implemented with two processor groups such that each logical processor includes two processor elements, one from each of the two processor groups.
In the example shown in
Each processor group 104(1-3) has an associated memory component 114(1-3), respectively. A memory component 114 can be implemented as any one or more memory components, examples of which include random access memory (RAM), DRAM, SRAM, a disk drive, and the like. Although the memory components 114(1-3) are illustrated as independent components, each processor group 104 can include a respective memory component 114 as an integrated component in an alternate embodiment.
In this example, processor complex 102 is a triplex redundant processor system having triple modular redundancy in that each logical processor 112 includes three redundant processor elements. To maintain data integrity, a faulty processor element can be replaced and reintegrated into the system while the redundant processor system 100 remains on-line without a loss of processing capability. Similarly, in an alternate embodiment, a duplex redundant processor system has dual modular redundancy in that each logical processor includes two redundant processor elements.
The processor elements of a logical processor 112 are loosely synchronized in a loose lock-step implementation such that instructions may be executed, or processed, in each of the processor elements at a slightly different time. This implementation provides that the logical processors can execute instructions faster than a typical tight lock-step configuration because the processor elements and logical processors 112 are not restricted to synchronized code execution. This implementation also provides for non-deterministic execution among the processor elements in a logical processor, such as non-deterministic branch prediction, cache replacement algorithms, and the like. The individual processor elements can also perform independent error recovery without losing synchronization with the other processor elements.
Each of the logical processors 112(1-N) correspond to one or more respective logical synchronization units 204(1-N). A logical synchronization unit 204 performs various rendezvous operations for an associated logical processor 112 to achieve agreements on data synchronization between the processor elements that cooperate to form a logical processor 112. For example, input/output operations and/or interprocessor communications can be communicated from each processor element of a logical processor 112 to an associated logical synchronization unit 204 to compare and vote on the input/output operations and/or interprocessor communications generated by the processor elements. Logical synchronization units and rendezvous operations are described in greater detail in U.S. patent application Ser. No. ______, which is Attorney Docket No. 200316143-1 entitled “Method and System of Executing User Programs on Non-Deterministic Processors” filed Jan. 25, 2005, to Bernick et al., the disclosure of which is incorporated by reference herein for the purpose of implementing performance data access.
A rendezvous operation may further be implemented by a logical synchronization unit 204 to exchange state information and/or data among the processor elements of a logical processor 112 to synchronize operations and responses of the processor elements. For example, a rendezvous operation may be implemented such that the processor elements deterministically respond to incoming asynchronous interrupts, to accommodate varying processing rates of the processor elements, to exchange software state information when performing operations that are distributed across the processor elements, and the like.
In this example, logical processor 302 includes processor elements 306(1-3) which are each a microprocessor that executes, or processes, computer executable instructions. The redundant processor system 300 includes the memory components 114(1-3) that are each associated with a respective processor group 104(1-3) as shown in
The memory regions 308(1-3) form a logical memory 310 that corresponds to logical processor 302. The processor elements 306(1-3) of the logical processor 302 each correspond to a respective partitioned memory region 308(1-3) of the logical memory 310. In practice, a logical processor 302 can communicate with a corresponding logical memory 310 via an input/output bridge memory controller (not shown).
The memory components 114(1-3) each include an instantiation of performance monitoring logic 312(1-3) that corresponds to a respective processor element 306(1-3) of the logical processor 302. Each of the processor elements 306(1-3) can execute the performance monitoring logic 312 to implement performance data access. In this example, the performance monitoring logic 312(1-3) is maintained by the memory components 114(1-3) as a software application.
As used herein, the term “logic” (e.g., the performance monitoring logic 312) can also refer to hardware, firmware, software, or any combination thereof that may be implemented to perform the logical operations associated with performance data access. Logic may also include any supporting circuitry utilized to complete a given task including supportive non-logical operations. For example, logic may also include analog circuitry, memory components, input/output (I/O) circuitry, interface circuitry, power providing/regulating circuitry, and the like.
Each of the processor elements 306(1-3) of logical processor 302 include a high-frequency clock 314, a cache memory 316, and one or more accumulators 318, respectively. For illustration, only the clock 314, cache memory 316, and accumulator(s) 318 for processor element 306(1) are shown. The description of the processor element components, however, applies to each processor element 306(1-3). The one or more accumulators 318 of a processor element 306 can be implemented as memory to store, update, and/or maintain performance data corresponding to a respective processor element 306.
The performance monitoring logic 312(1-3) implements performance data access such that system performance data can be obtained from the non-synchronized processor elements 306(1-3) of the logical processor 302. The performance of the processor elements 306(1-3) can be monitored for time durations to execute processor events, such as a procedure, and for any number of other operational features, such as cache hit rates, interrupt handling, and the like. While the non-synchronized processor elements 306(1-3) all execute the same instruction set (e.g., a processor event or procedure), each may return a different performance response and the corresponding performance data is likely asymmetric (e.g., different in each of the multiple processor elements 306).
The different performance data responses from each of the processor elements 306(1-3) may appear as an error when the data is compared by the logical synchronization unit 304, such as when an output operation of the performance data response is performed. The different performance data responses may also appear as an error if the performance monitoring logic 312 makes a decision based on that data and branches two (or three) different directions causing different action sequences that can be detected by the logical synchronization unit 304.
In an embodiment of performance data access, the performance data requested by the performance monitoring logic 312 can be exchanged via a rendezvous operation with the logical synchronization unit 304 such that the performance monitoring logic 312 receives consistent data from the processor elements 306(1-3). For example, a procedure may take 6.3 microseconds for processor element 306(1) to execute, 6.4 microseconds for processor element 306(2) to execute, and 5.9 microseconds for processor element 306(3) to execute. The time duration for each processor element 306 to execute the procedure can be stored in an accumulator 318 for each respective processor element 306(1-3).
When the performance data for each of the processor elements 306(1-3) is requested by the performance monitoring logic 312, the logical synchronization unit 304 exchanges the performance data of each of the processor elements such that each processor element has a copy of all three processor elements' individual performance measurement. For example, processor element 306(1) will have the 6.3 microseconds to execute the procedure, the 6.4 microseconds for processor element 306(2) to execute the procedure, and the 5.9 microseconds for processor element 306(3) to execute the procedure.
Each of the processor elements 306(1-3) then conform, or synchronize, the performance data. In this example, the 6.3 microseconds, 6.4 microseconds, and 5.9 microseconds can be averaged as 6.2 microseconds to execute the procedure. The averaging operation is deterministic, and all three processor elements 306(1-3) will arrive at the same answer of 6.2 microseconds. The average 6.2 microseconds is then returned to the performance monitoring logic 312 as the synchronized performance data.
Other conforming operations or algorithms can be implemented to synchronize the performance data from the multiple processor elements 306(1-3). For example, the processor elements 306(1-3) can select a performance measurement from any one of the processor elements 306(1-3), such as the minimum performance measurement, the middle performance measurement, or the maximum performance measurement corresponding to a particular processor element 306. Alternatively, the processor elements 306(1-3) can discard the performance data value that is the farthest from the other two, and then average the two remaining performance data values (e.g., for a system with triple modular redundancy), or any other form of a deterministic algorithm can be implemented.
Alternatively, each processor element 306(1-3) can replicate the performance measurements from the other processor elements 306. For example, prior to the logical synchronization unit 304 exchange of data, processor element 306(1) will have value A, processor element 306(2) will have value B, and processor element 306(3) will have value C. After the data exchange, each processor element 306(1-3) will have all three values A, B, and C which are replicated as if each processor element generated the performance data three times rather than just the one time.
In an implementation, the time duration of a processor event can be determined by obtaining a first time from a clock 314 of the respective processor element 306 at the beginning of a processor event, and subtracting the first time from an accumulator 318 of the processor element 306. A second time can be obtained from the clock 314 after the processor event has been executed by the processor element. The second time is then added to the accumulator 318 such that a time difference between the first time and the second time is the time duration of the processor event. The time duration is maintained in the accumulator 318 as the performance data.
For multiple performance data requests, alternate embodiments of performance data access can be implemented if it is not practicable to conform each individual performance data measurement of the processor elements 306(1-3). For example, the processor time required to accomplish each individual exchange and conforming operation may not be available within the implementation constraints of a redundant processor system.
In another embodiment of performance data access, the performance data is accumulated, or aggregated, in the accumulators 318 for the respective processor elements 306(1-3). For example, time durations for multiple executions of a repeated processor event can be stored and updated as the performance data in the accumulators 318 of each respective processor element 306(1-3). A procedure may be executed as a processor event multiple times by each of the processor elements 306(1-3). For a procedure that is executed ten-thousand times, and which takes on average 3 microseconds to execute, the accumulated time duration would be approximately 30 milliseconds. An accumulator 318 for processor element 306(1) can have stored performance data of 31.5 milliseconds, an accumulator 318 for processor element 306(2) can have stored performance data of 32.3 milliseconds, and an accumulator 318 for processor element 306(3) can have stored performance data of 29.7 milliseconds.
When the performance data for each of the processor elements 306(1-3) is requested by the performance monitoring logic 312, the logical synchronization unit 304 exchanges the data and an average (or other conforming operation) of the performance data for each processor element 306(1-3) is synchronized. In this example, an average 3.15 microseconds for processor element 306(1), an average 3.23 microseconds for processor element 306(2), and an average 2.97 microseconds for processor element 306(3) can be averaged, or conformed, to approximately 3.12 microseconds to execute the procedure each time. The approximate 3.12 microseconds is then returned to the performance monitoring logic 312 as the synchronized performance data.
This embodiment of performance data access avoids the extensive processing overhead of exchanging and conforming the performance data for each individual measurement, and provides performance data obtained for multiple processor events over a duration of time. The asymmetric performance data is maintained by the accumulators 318 in each respective processor element 306 such that the performance monitoring logic 312 can not directly access the performance data. Rather, the performance monitoring logic interfaces with the processor elements 306(1-3) of the logical processor 302 via application program interfaces (APIs) for performance data access.
In an implementation of performance data access, code (e.g., software) executing in each of the processor elements 306(1-3) interfaces with an array of the accumulators 318. The performance monitoring logic 312 calls the code via APIs to register and have accumulator(s) allocated, and to request the performance data stored in the accumulator(s). The code communicates the requested performance data to the logical synchronization unit 304, and the performance data is conformed, or synchronized. In an embodiment, the code can be implemented as millicode which is software running as the lowest-level software in the operating system.
Each of the processor elements 306(1-3) of logical processor 302 include a high-frequency clock 314, a cache memory 316, and one or more accumulators 318, respectively. For illustration, only the clock 314, cache memory 316, and accumulator(s) 318 for processor element 306(1) are shown. The description of the processor element components, however, applies to each processor element 306(1-3). The one or more accumulators 318 of a processor element 306 can be implemented as memory to store, update, and/or maintain performance data corresponding to the respective processor element 306.
The exemplary redundant processor system 400 includes a remote computing device 402 configured for communication with components of the redundant processor system via a communication network 404. The remote computing device 402 includes a performance monitoring application 406 which implements performance data access as described above with reference to
The performance of the processor elements 306(1-3) can be monitored for time durations to execute processor events, such as a procedure, and for any number of other operational features, such as cache hit rates, interrupt handling, and the like. While the non-synchronized processor elements 306(1-3) all execute the same instruction set (e.g., a processor event), each may return a different performance response and the corresponding performance data is likely asymmetric (e.g., different in each of the multiple processor elements 306). The different performance data responses from each of the processor elements 306(1-3) may appear as an error to the performance monitoring application 406 when the performance data responses are compared (or “voted”) by the logical synchronization unit 304.
The performance data requested by the performance monitoring application 406 can be exchanged via a rendezvous operation with the logical synchronization unit 304 and synchronized in each of the processor elements 306(1-3) such that the performance monitoring application 406 receives consistent data from each of the processor elements 306(1-3). The performance monitoring application 406 calls code (e.g., software) executed by each of the processor elements 306(1-3) via APIs to register and have accumulator(s) allocated, and to request that the performance data be stored in the accumulator(s). The code communicates the requested performance data to the logical synchronization unit 304 which exchanges the performance data. The performance data is conformed, or synchronized, in the processor elements 306(1-3) before being returned to the remote computing device 402 and to the performance monitoring application 406 via the communication network 404.
Methods for performance data access, such as exemplary method 500 described with reference to
At block 502, processor events are processed with non-synchronized processor elements of a logical processor in a redundant processor system. For example, each processor element 306(1-3) of logical processor 302 (
In an embodiment of performance data access to determine a time duration of a processor event, a first time is obtained from a clock of a processor element at block 504(A). For example, a time is obtained from clock 314 of processor element 306(1) at the beginning of a processor event. At block 504(B), the first time is subtracted from a time stored in an accumulator of the processor element. For example, the time obtained from clock 314 is subtracted from accumulator 318 for the respective processor element 306(1).
If the time stored in the accumulator is initially zero, then the time obtained from clock 314 will be subtracted from zero and the accumulator will initially have a negative time. At block 504(C), a second time is obtained from the clock of the processor element after the processor event has been executed. At block 504(D), the second time is added to the accumulator such that a time difference between the first time and the second time is the time duration of the processor event. To accumulate multiple time durations for multiple executions of a repeated processor event or procedure, the method blocks 504(A-D) can be repeated to accumulate the performance data of processor elements 306(1-3). Each beginning time of a processor event is subtracted from the accumulator at block 504(B) and each time after the processor event has executed is added to the accumulator at block 504(D) such that a sum of all the time differences is accumulated.
At block 506, performance data associated with execution of the processor event(s) is stored in one or more accumulators corresponding to a respective processor element. For example, each processor element 306(1-3) includes one or more accumulators 318 to store, update, and maintain performance data associated with a respective processor element 306. Storing the performance data includes storing time duration(s) of a processor event as the performance data. For example, processor element 306(1) stores a first time duration of a processor event in an accumulator 318 of the processor element 306(1), processor element 306(2) stores a second time duration of the processor event in an accumulator 318 of the processor element 306(2), and processor element 306(3) stores a third time duration of the processor event in an accumulator 318 of the processor element 306(3). Performance data may also include counts of a repeated processor event, such as cache hits or misses, for example.
At block 508, the performance data from each of the non-synchronized processor elements is conformed as synchronized performance data. Conforming the performance data includes conforming an average of the time durations from each of the non-synchronized processor elements to generate the synchronized performance data. The logical synchronization unit 304 exchanges the performance data from each of the processor elements 306(1-3), and each of the processor elements conform the performance data to generate the synchronized performance data.
At block 510, the synchronized performance data is communicated to a performance monitoring application or logic that requests the performance data from the logical processor (e.g., the performance data stored in the one or more accumulators of the non-synchronized processor elements). For example, the logical synchronization unit 304 communicates the synchronized performance data to the performance monitoring logic 312 (
Although embodiments of performance data access have been described in language specific to structural features and/or methods, it is to be understood that the subject of the appended claims is not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as exemplary implementations of performance data access.
This application claims priority to U.S. Provisional Application Ser. No. 60/557,812 filed Mar. 30, 2004, entitled “Nonstop Advanced Architecture”, the disclosure of which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
60557812 | Mar 2004 | US |