Method and Apparatus for Conditional Broadcast of Barrier Operations

Information

  • Patent Application
  • 20080059683
  • Publication Number
    20080059683
  • Date Filed
    August 31, 2006
    18 years ago
  • Date Published
    March 06, 2008
    16 years ago
Abstract
A weakly-ordered processing system implements an execution synchronization bus transaction, or “memory barrier” bus transaction, to enforce strongly-ordered data transfer bus transactions. A slave device that ensures global observability may “opt out” of the memory barrier protocol. In various embodiments, the opt-out decision may be made dynamically by each slave device asserting a signal, may be set system-wide during a Power-On Self Test (POST) by polling the slave devices and setting corresponding bits in a global observability register, or it may be hardwired by system designers so that only slave devices capable of performing out-of-order data transfer operations participate in the memory barrier protocol.
Description

BRIEF DESCRIPTION OF DRAWINGS


FIG. 1 is a functional block diagram of a weakly-ordered processing system;



FIG. 2 is a functional block diagram of a bus interconnect in a weakly-ordered processing system;



FIG. 3 is a functional block diagram of one embodiment of a controller in a bus interconnect for a weakly-ordered processing system; and



FIG. 4 is a functional block diagram of another embodiment of a controller in a bus interconnect for a weakly-ordered processing system.





DETAILED DESCRIPTION

The detailed description set forth below in connection with the appended drawings is intended as a description of various embodiments of the invention and is not intended to represent the only embodiments in which the invention may be practiced. In particular, for the purpose of explanation, embodiments are described with respect to a processing system comprising one or more processors issuing memory access requests to two or more memory controllers, and a bus interconnect. However, the invention is applicable to any master devices issuing data transfer bus transactions to slave devices in a shared bus system, and is not limited to processors and memory controllers.



FIG. 1 is a functional block diagram of a weakly-ordered processing system. The processing system 100 may be disposed of in a computer or other computational system, including a portable electronic device, embedded system, distributed system, or the like. The processing system 100 may be implemented as an integrated circuit, discrete components, or any combination thereof. Only those portions of the processing system 100 necessary for an explanation of embodiments of the present disclosure are depicted in FIG. 1. Those skilled in the art will recognize how best to implement the processing system 100 for each particular application.


The processing system 100, as depicted in FIG. 1, includes processors 102a-102c in communication with memory devices 104a-104c over a shared bus 106. The actual number of processors and memory devices required for any particular application may vary depending on the computational power required and the overall design constraints. A bus interconnect 108 may be used to manage bus transactions between the processors 102a-102c and memory devices 104a-104c using point-to-point switching connections. In at least one embodiment of the bus interconnect 108, multiple direct links may be provided to allow two or more bus transactions to occur simultaneously.


One or more of the processors 102a-102c may be configured to execute instructions under control of an operating system or other software. The instructions may reside in one or more of the memory devices 104a-104c. Data may also be stored in the memory devices 104a-104c, and retrieved by the processors 102a-102c to execute certain instructions. The new data resulting from the execution of these instructions may be written back into the memory devices 104a-104c. Each memory device 104a-104c may include a memory controller (not shown) and a storage medium (not shown), as known in the art.


Each processor 102a-102c may be provided with a dedicated channel 106a-106c on the bus 106 for communicating with the bus interconnect 108. Similarly, the bus interconnect 108 may use a dedicated channel 106d-106f on the bus to communicate with each memory device 104a-104c. By way of example, a first processor 102a can access a target memory device 104b by sending a data transfer bus transaction request over its dedicated channel 106a on the bus 106. The bus interconnect 108 determines the target memory device 104b from the address of the data transfer bus transaction request and issues a data transfer bus transaction to the target memory device 104b over the appropriate channel 106e on the bus 106. A data transfer bus transaction may be a write transaction, a read transaction, or any other bus transaction related to a data transfer. An originating processor 102a-102c may issue a write transaction to a target memory device 104a-104c by placing the appropriate address with a payload on the bus 106 and asserting a write enable signal. An originating processor 102a-102c may issue a read transaction to a target memory device 104a-104c by placing the appropriate address on the bus 106 and asserting a read enable signal. In response to the read request, the target memory device 104a-104c will send the payload back to the originating processor 102a-102c. An originating processor 102a-102c may also issue bus transactions that are not data transfer bus transactions, such as a memory barrier transaction.


In at least one embodiment of the processing system 100, the processors 102a-102c may transmit an attribute with each memory access request. The attribute may be any parameter that describes the nature of the data transfer bus transaction. The attribute may be transmitted with the address over the address channel. Alternatively, the attribute may be transmitted using sideband signaling or some other methodology. The attribute may be used to indicate whether or not the data transfer bus transaction request is strongly-ordered. A “strongly-ordered” request refers to a data transfer bus transaction request that cannot be executed out of order.


The bus interconnect 108 may monitor the attribute for each data transfer bus transaction request from the processors 102a-102c. If an attribute indicates a strongly-ordered data transfer bus transaction request, the bus interconnect 108 may enforce an ordering constraint on that transaction to every slave device that accepts bus transactions from that master and is capable of out-of-order execution of data transfer bus transactions, except for the slave device to which the strongly-ordered data transfer bus transaction is directed. By way of example, a data transfer bus transaction request from a first processor 102a to a target memory device 104a may include an attribute. The bus interconnect 108 may determine from the attribute whether the transaction is strongly-ordered. If the bus interconnect 108 determines that the transaction is strongly-ordered, it sends a memory barrier to every memory device 104b and 104c that the first processor 102a is capable of accessing and that may execute data transfer bus transactions out of bus transaction order, other than the target memory device 104a. The bus interconnect 108 also sends the strongly-ordered data transfer bus transaction to the target memory 104a without a memory barrier because the target memory device 104a will implicitly handle it as a strongly-ordered request due to the attribute associated with the data transfer bus transaction. Alternatively, the processor 102a may issue a memory barrier bus transaction prior to issuing the strongly-ordered data transfer bus transaction.



FIG. 2 is a functional block diagram illustrating an example of a bus interconnect 108 in a weakly-ordered processing system. The manner in which the bus interconnect is actually implemented will depend upon design considerations. Those skilled in the art will recognize the interchangeability of various designs, and how best to implement the functionality described herein for each particular application.


Referring to FIG. 2, a bus register 202 may be used to receive and store information from the bus 106. The bus register 202 may be any type of storage device such as a First-In-First-Out (FIFO) memory, or other suitable storage device. The information received and stored by the bus register 202 may be any bus related information, but more specifically may include the address and attribute for each data transfer bus transaction request, and in the case of a write operation, the payload. The bus register 202 may also store the attribute in the case of a non-data transfer bus transaction such as a memory barrier operation issued by a master device. The address for each data transfer bus transaction request is also provided to a decoder 204. The decoder 204 may be used to determine the target memory device for each data transfer bus transaction request in the bus register 202. This determination is used to control a bus switch 206. The bus switch 206 is used to demultiplex each data transfer bus transaction from the bus register 202 to the appropriate channel of the bus 106 for its target memory device. A controller 208 may be used to control the timing of the data transfer bus transactions released from the bus register 202.



FIG. 3 is a functional block diagram of one embodiment of a controller 208 in a bus interconnect 108 for a weakly-ordered processing system. The controller 208 enforces ordering constraints on memory operations based on information it receives from the decoder 204. The information may include the attribute for each bus transaction, which may be stored in a first input register 302. The information may also include data that identifies each memory device, other than the target memory device, that the originating processor is capable of accessing. The particular memory devices accessible by each processor are preconfigured during the design stage, and therefore, can be programmed or hardwired into the decoder 204. In any event, a second input register 304 may be used to store this information. The first and second input registers 302, 304 may be separate registers as shown in FIG. 3, or alternatively a single register. In some embodiments of the controller 208, the information from the decoder 204 may be stored in registers shared with other bus interconnect functions. Each register may be a FIFO or any other suitable storage medium.


The controller 208 enforces ordering constraints on data transfer operations by controlling the timing of data transfer bus transactions released from the bus register 202. The process will first be described in connection with an attribute which indicates that a strongly-ordered memory data transfer bus transaction is ready to be released from the bus register 202. In this case, the attribute is provided from the first input register 302 to a memory barrier generator 306 as an enabling signal. At the same time, the data stored in the second input register 304 is provided to the input of the memory barrier generator 306. As indicated above, the data stored in the second input register 304 includes data that identifies each memory device, other than the target memory device, that the originating processor is capable of accessing. When the memory barrier generator 306 is enabled by the attribute, this information is used to generate a memory barrier for each memory device identified by the data. Each memory barrier may be provided to the appropriate memory device by issuing a memory barrier transaction directed to the identified memory devices, with an attribute identifying the originating processor which initiated the strongly-ordered request. Alternatively, the memory barriers may be provided to the appropriate memory devices using sideband signaling, or by other suitable means. The memory barrier generator 306 may also generate memory barrier bus transactions in response to memory barrier bus transaction requests from a master device, which are also stored in the bus register 202, in a manner similar to that described above.


According to one or more embodiments, the memory barrier generator 306 may be used to suppress unnecessary memory barriers. For example, a memory barrier for a memory device accessible by the originating processor is superfluous, and may be suppressed, if the memory device is inherently globally observable. Globally observable slave devices may be identified in a number of ways.


In one embodiment of the controller 208, a logical global observability register 307 includes a bit for every slave device in the system. The state of the global observability register bit indicates whether the associated slave device is globally observable, and hence may be exempted from a memory barrier transaction. The global observability register 307 is an input to the memory barrier generator 306. The global observability register 307 may comprise a physical register set by system software during a Power On Self Test (POST), following a poll of slave devices to ascertain their behavior and capabilities with respect to global observability of bus transactions, such as by reading configuration status registers (CSRs) within the respective slave devices.


In one embodiment, which may be particularly advantageous in an ASIC or System On Chip (SOC) environment, one or more bits of a logical global observability register 307 may be hardwired by designers to a predetermined state indicating the known global observability of a corresponding slave device. This may reduce the complexity and execution time of the POST software.


In another embodiment, one or more bits of a logical global observability register 307 may comprise a dynamic, binary signal from a slave device. This allows the slave device to indicate periods of global observability. For example, a slave device may queue data transfer operations in a buffer, and execute the operations out of bus transaction order. When pending data transfer operations reside in the buffer, the slave device would indicate a lack of global observability, thus requiring memory barrier bus transactions be directed to the slave device if a processor issues a strongly-ordered data transfer bus transaction or memory barrier operation. However, if the buffer is empty, the slave device can guarantee global observability for at least the next occurring data transfer bus transaction (that is, the slave device guarantees that all data transfer operations previously issued to it have been executed). In this case, the slave device may indicate via the binary signal that it need not receive memory barrier transactions, and may maintain this indication only as long as its buffer is empty.


In any given implementation, the logical global observability register 307 may comprise any mix of one or more physical registers set by system software, hardwired bits, or dynamic signals from slave devices, as required or desired in a particular application.


Referring to FIGS. 1-3, an example will now be provided to illustrate the manner in which the global observability register bits can be used to suppress memory barriers. In this example, the processing system may be configured such that the first processor 102a can access the first, second, and third memory devices 104a, 104b, 104c. When a strongly-ordered data transfer bus transaction issued by the first processor 102a to the first memory device 104a (or alternatively when a memory barrier operation issued by the first processor 102a is at the output of the bus register 202), the corresponding attribute from the first input register 302 enables the memory barrier generator 306. The data provided to the memory barrier generator 306 from the second input register 304 identifies the memory devices, other than the target memory device, that the first processor 104a can access. In this case, the data identifies the second and third memory devices 104b, 104c. The memory barrier generator 306 checks the bits 307b, 307c in the logical global observability register 307 corresponding to the second and third memory devices 104b, 104c to determine whether either of memory device 104b, 104c is globally observable. In this example, bit 307b indicates global observability, and bit 307c does not. Accordingly, a memory barrier bus transaction is issued to the third memory device 104c, and the memory barrier to the second memory device 104b is suppressed.


Returning to FIG. 3, logic 308 in the controller 208 may be used to monitor feedback from the memory devices for memory barrier acknowledgements. A “memory barrier acknowledgement” is a signal from a memory device indicating that every data transfer operation from the processor requiring a strongly-ordered data transfer bus transaction or issuing a memory barrier operation, that preceded the memory barrier, has been executed. The data from the second input register 304 and the bits of the logical global observability register 307 are used by the logic 308 to determine which memory devices should be monitored for memory barrier acknowledgements. When the logic 308 determines that all necessary memory barrier acknowledgements have been received, it generates a trigger that is used to release the corresponding data transfer bus transaction from the bus register 202 (or the next pending data transfer bus transaction if the memory barrier operation was issued directly by the master device). More specifically, the attribute from the first input register 302 is provided to the input of a select multiplexer 310. The multiplexer 310 is used to couple the trigger generated by the logic 308 to the bus register 202 when the attribute indicates that the data transfer bus transaction is strongly-ordered. The release signal output from the multiplexer 310 is also coupled to the decoder to synchronize the timing of the bus switch 206 (see FIG. 2).


Once the data transfer bus transaction is released from the bus register, it is routed to the target memory device through the bus switch 206 (see FIG. 2). A second multiplexer 312 in the controller 208 may be used to delay the release of data from the first and second registers 302, 304 until a data transfer acknowledgement is received from the target memory device when an attribute indicating a strongly-ordered data transfer bus transaction or master device-issued memory barrier operation is applied to the select input. As discussed above, the attribute included in the bus transaction enforces an ordering constraint on the target memory device. Namely, the target memory device executes all outstanding data transfer operations issued by the originating processor before executing the strongly-ordered data transfer operation. A data transfer acknowledgement is generated by the target memory device following the execution of the strongly-ordered data transfer operation. The data transfer acknowledgement is fed back to the multiplexer 312 in the controller 208, where it is used generate a trigger to release new data from the first and second registers 302, 304 corresponding to the next data transfer bus transaction in the bus register 202. If the new data includes an attribute indicating that the corresponding data transfer bus transaction in the bus register 202 is strongly-ordered or comprises a master device-issued memory barrier operation, then the same process is repeated. Otherwise, the data transfer bus transaction can be released immediately from the bus register 202.


The controller 208 is configured to immediately release a data transfer bus transaction from the bus register 202 when the corresponding attribute in the first input register 302 indicates that the request is not strongly-ordered or a master device-issued memory barrier operation. In that case, the attribute disables the memory barrier generator 306. In addition, the attribute forces the multiplexer 310 into a state which couples an internally generated trigger to the bus register 202 to release the data transfer bus transaction. The data transfer bus transaction is released from the bus register 202 and coupled to the target memory device through the bus switch 206 (see FIG. 2). The data corresponding to the next data transfer bus transaction is then released from the first and second registers 302, 304 by an internally generated trigger output from the second multiplexer 312 in the controller 208.



FIG. 4 is a functional block diagram illustrating another embodiment of a controller in a bus interconnect for a weakly-ordered processing system. In this embodiment, a strongly-ordered data transfer bus transaction is released from the bus register 202 by the controller 208 at the same time the memory barriers are provided to the appropriate memory devices. More specifically, the first input register 302 is used to provide the attribute for a data transfer bus transaction to the memory barrier generator 306. If the attribute indicates that the corresponding data transfer bus transaction is strongly-ordered, then the memory barrier generator 306 is enabled. When the memory barrier generator 306 is enabled, the data from the second input register 304 is used to identify each memory device accessible by the originating processor, other than the target memory device. For each memory device identified, the memory barrier generator 306 checks the corresponding bit of the logical global observability register 307. A memory barrier is then generated for each memory device, other than the target memory device, that does not (at that time) indicate that it is globally observable.


With the memory barrier generator 306 enabled, logic 314 in the controller 208 may be used to prevent subsequent data transfer bus transactions from being released from the bus register 202 until the strongly-ordered data transfer bus transaction is executed by the target memory device. A delay 316 may be used to allow an internally generated trigger to release the strongly-ordered data transfer bus transaction from the bus register 202 before the trigger is gated off by the attribute. In this way, the data transfer bus transaction can be provided to the target memory device concurrently with the memory barriers for the remaining, non-globally observable memory devices accessible by the originating processor.


Logic 318 may be used to monitor feedback from the memory devices for the data transfer acknowledgement from the target memory device, and the memory barrier acknowledgements. The data from the second input register 304 and the bits of the logical global observability register 307 are used by the logic 318 to determine which memory devices need to be monitored for memory barrier acknowledgements. When the logic 318 determines that the various data transfer and/or memory barrier acknowledgements have been received, it generates a trigger to release new data from the first and second input registers 302, 304 corresponding to the next data transfer bus transaction in the bus register 202. The trigger is coupled through a multiplexer 320 which is forced into the appropriate state by the attribute from the first input register 202. If the new data includes an attribute indicating that the corresponding data transfer bus transaction in the bus register 202 is strongly-ordered, then the same process is repeated. Otherwise, the data transfer bus transaction can be released immediately from the bus register 202 with an internally generated trigger via the logic 314. An internally generated trigger may also be coupled through the multiplexer 320 to release the data from the first and second input registers 302, 304 for the next data transfer bus transaction in the bus register 202.


Although the present invention has been described herein with respect to a controller 208 within the bus interconnect 108 of a shared bus system, those of skill in the art will readily recognize that the invention is not limited to such implementation. In particular, the global observability indicator for each slave device may be propagated to or accessible by each master device, which may determine whether a memory barrier bus transaction is required, and if so, to which slave devices it should be directed.


Although the present invention has been described herein with respect to particular features, aspects and embodiments thereof, it will be apparent that numerous variations, modifications, and other embodiments are possible within the broad scope of the present invention, and accordingly, all variations, modifications and embodiments are to be regarded as being within the scope of the invention. The present embodiments are therefore to be construed in all aspects as illustrative and not restrictive and all changes coming within the meaning and equivalency range of the appended claims are intended to be embraced therein.

Claims
  • 1. A weakly-ordered processing system, comprising: a plurality of slave devices;at least one master device configured to direct data transfer bus transactions to one or more slave devices; anda bus interconnect configured to implement data transfer bus transactions between master and slave devices, and further configured to direct an execution synchronization bus transaction to one or more slave devices that are not globally observable in response to an execution synchronization bus transaction request from a master device.
  • 2. The system of claim 1 wherein the bus interconnect includes a logical global observability register comprising a plurality of bits, each bit corresponding to a slave device and indicating whether the slave device maintains global observability.
  • 3. The system of claim 2 wherein the global observability register comprises one or more physical registers set by system software during system initialization.
  • 4. The system of claim 3 wherein the system software polls configuration registers in slave devices to ascertain their global observability.
  • 5. The system of claim 2 wherein one or more bits of the logical global observability register are hard-wired.
  • 6. The system of claim 2 wherein one or more bits of the logical global observability register comprise dynamic binary signals from slave devices.
  • 7. The system of claim 6 wherein a slave device buffers data transfer operations prior to executing the data transfer operations, and wherein the slave device indicates it is globally observable via a dynamic binary signal to the logical global observability register when its buffer is empty.
  • 8. The system of claim 1 wherein data transfer bus transaction requests from the master devices include an attribute indicating whether the data transfer bus transaction is strongly-ordered.
  • 9. The system of claim 1 wherein each slave device receiving the execution synchronization bus transaction executes all previously received data transfer operations from at least the master device issuing the strongly-ordered data transfer bus transaction.
  • 10. The system of claim 1 wherein the slave device to which the strongly-ordered data transfer bus transaction is directed appears to master devices to have executed all previously received data transfer operations from at least the master device issuing the strongly-ordered data transfer bus transaction, prior to executing the strongly-ordered data transfer bus transaction.
  • 11. The system of claim 1 wherein the bus interconnect directs the execution synchronization bus transaction only to non-globally observable slave devices to which the master device issuing the strongly-ordered data transfer bus transaction request may direct data transfer bus transactions.
  • 12. The system of claim 1 wherein the bus interconnect is further configured to direct an execution synchronization bus transaction to one or more slave devices that are not globally observable in response to a strongly ordered data transfer bus transaction request.
  • 13. A bus interconnect operative to direct data transfer bus transactions from one or more master devices to two or more slave devices in a weakly-ordered processing system, comprising: a bus register operative to queue data transfer bus transaction requests; anda controller operative to control the issuance of data transfer bus transactions from the bus register and further operative to issue an execution synchronization bus transaction to one or more slave devices that are not globally observable in response to an execution synchronization bus transaction request from a master device.
  • 14. The bus interconnect of claim 13, wherein the controller includes a logical global observability register indicating which slave devices are globally observable.
  • 15. The bus interconnect of claim 14 wherein the logical global observability register comprises a physical register set by system software.
  • 16. The bus interconnect of claim 15 wherein the system software polls status registers in slave devices to ascertain their global observability, prior to setting the global observability register.
  • 17. The bus interconnect of claim 14 wherein one or more bits of the logical global observability register are hardwired by system designers.
  • 18. The bus interconnect of claim 14 wherein one or more bits of the logical global observability register comprise dynamic binary signals from slave devices.
  • 19. The bus interconnect of claim 18 wherein a slave device is operative to buffer data transfer operations prior to executing them, the slave device indicating global observability via a dynamic binary signal when the buffer is empty.
  • 20. The bus interconnect of claim 12, further comprising a decoder logically connected to the controller and operative to ascertain to which slave device a pending data transfer bus transaction is directed, and further operative to detect strongly-ordered data transfer bus transactions.
  • 21. The bus interconnect of claim 14, further comprising a bus switch receiving data transfer bus transactions from the bus register, the bus switch operative to direct the data transfer bus transactions to slave devices under the control of the decoder.
  • 22. The bus interconnect of claim 13 wherein the controller is further operative to issue an execution synchronization bus transaction to one or more slave devices that are not globally observable in response to a strongly ordered data transfer bus transaction request.
  • 23. A method of executing a strongly-ordered data transfer bus transaction in a weakly-ordered processing system including one or more master devices and two or more slave devices, comprising: maintaining an indication of which of the slave devices are globally observable; andissuing an execution synchronization bus transaction to one or more slave devices that are not globally observable in response to an execution synchronization bus transaction request from a master device.
  • 24. The method of claim 23 further comprising detecting a strongly-ordered data transfer bus transaction by decoding an attribute of each data transfer bus transaction request received from a master device.
  • 25. The method of claim 23 wherein the execution synchronization bus transaction is issued only to non-globally observable slave devices to which the master device issuing a strongly-ordered data transfer bus transaction request may direct data transfer bus transactions.
  • 26. The method of claim 23 wherein in maintaining an indication of which of the slave devices are globally observable comprises maintaining a logical global observability status register, one bit of which corresponds to each slave device.
  • 27. The method of claim 26 further comprising: polling status registers in slave devices during initialization to ascertain each slave device's global observability; andsetting a physical global observability status register.
  • 28. The method of claim 26 wherein maintaining an indication of which of the slave devices are globally observable comprises receiving a dynamic binary signal from one or more slave devices indicating the slave device's global observability.
  • 29. The method of claim 23 further comprising, for each a slave device receiving an execution synchronization bus transaction, executing all pending data transfer operations from at least the master device issuing the strongly-ordered data transfer bus transaction request.
  • 30. The method of claim 23 further comprising, for the slave device receiving the strongly-ordered data transfer bus transaction, executing all pending data transfer operations from at least the master device issuing the strongly-ordered data transfer bus transaction request, prior to executing the strongly-ordered data transfer bus transaction.
  • 31. The method of claim 23 further comprising: Receiving a strongly-ordered data transfer bus transaction request.