The present disclosure generally relates to a computer system and, more specifically, a computer processor having improved system bus communication capabilities. In accordance with one embodiment, a system comprises a computer processor with a first processor bus interface unit and a second processor bus interface unit coupled to a system bus. The first processor bus interface unit makes requests to the memory via the system bus to support instruction fetches, and the second processor bus interface unit makes requests to the memory system and peripherals to support data accesses. In computer systems comprising a system bus specification that does not allow more than one split transaction for any one bus master, such as the Advanced High-Performance Bus (AHB) specification, the first and second processor bus interface units allow the computer processor to initiate a first split transaction on behalf of a first core pipeline stage and initiate a second split transaction on behalf of a second core pipeline stage regardless of whether the first split transaction has completed.
As is known in the art, a core pipeline can stall if, for example, a fetch stage requires a memory access in order to complete an instruction fetch, a data access being an operation that may require more clock cycles to complete than if the requested instruction resides in the processor's instruction cache. A potential effect of this stalling is that a downstream core pipeline stage, such as the data-access pipeline stage, is also prevented from submitting a request to the memory system or peripherals if the fetch stage has submitted a request because a system bus specification disallowing multiple split transactions from a single bus master would prevent it. In this situation, the data-access stage must wait until the completion of a request to the memory system made on behalf of the fetch pipeline stage. This aforementioned situation can cause additional stalling of the core pipeline and reduced performance of the processor.
An embodiment in accordance with the disclosure can reduce the effect of core pipeline stalling on the performance of the computer system. By allowing the processor to submit more than one simultaneously pending request to a memory system or other component on the system bus, the effect of core pipeline stalling is reduced.
Other systems, methods, features, and/or advantages will be or may become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features and/or advantages be included within this description and be protected by the accompanying claims.
Having summarized various aspects of the present disclosure, reference will now be made in detail to the description as illustrated in the drawings. While the disclosure will be described in connection with these drawings, there is no intent to limit it to the embodiment or embodiments disclosed therein. On the contrary, the intent is to cover all alternatives, modifications, and equivalents included within the spirit and scope of this disclosure as defined by the appended claims. It should be emphasized that many variations and modifications may be made to the above-described embodiments. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the claims following this disclosure.
The system bus 308 can represent a system bus conforming to a specification supporting split transactions. As is depicted by the timing diagram of
However, as is depicted in
Each depicted component can be further coupled to a sideband channel 509, which can be used to communicate various control signals between the depicted components coupled to the system bus 508. For example, a “split” or an “unsplit” signal can be transmitted on the sideband channel 509 so that it is not necessary to occupy the system bus 508 during the transmission of such a signal.
The data cache 520 retains a cache of data that is in the memory system 510 for high-speed delivery to the core pipeline 516. The data cache 520, however, does not generally store all of the data that may be requested by the core pipeline 516. If the core pipeline 516 requests data that is not contained in the data cache 520, the data cache 520 will request that data from the memory system 510 via the second bus interface unit 538.
The data cache 520 can also submit a request to write data to the memory system 510 that is delivered by the core pipeline to the write-back buffer 522. The write-back buffer 522 retains the requests to write to the memory system 510 generated by the core pipeline 516 and delivers the requests when appropriate. The write-back buffer 522 can use methods or algorithms known in the art for efficiently buffering and sending requests through the second bus interface unit 538 to write to the memory system 510. The write-back buffer 522 also communicates with the data cache 520, which delivers core pipeline 516 requests to write data to the memory system 510 via the second bus interface unit 538.
The system bus arbiter 514 arbitrates access to the system bus 508 and determines when it is appropriate for a system bus master to read or write data to the system bus 508. As noted above, if the system bus 508 conforms to a specification that does not allow more than one split transaction for each bus master residing on the system bus, such as the AHB specification, fetching and writing of data from the memory system 510 can cause pipeline stalling of the core pipeline 516, which can degrade system performance. By employing a first bus interface unit 526 and a second bus interface unit 538, a processor 502 in accordance with the disclosure can effectively appear to the system bus 508 and system bus arbiter 514 as more than one bus master on the system bus 508. Consequently, because a processor 502 in accordance with the disclosure exists as more than one bus master on the system bus 508, the processor 502 can initiate more than one concurrent split transaction, which can reduce the effect of pipeline stalling, reduce memory idle time and increase the performance of the computer system.
The data-access pipeline stage 634 is coupled to a data cache 620, which retains a cache of data requested by the data-access pipeline stage 634. The data cache 620 retains a cache of data in the memory system 610 for high-speed delivery to the data-access pipeline stage 634. The data cache 620 is coupled to a second bus interface unit 638, which is coupled to the system bus 608. The second bus interface 638 unit communicates with components in the computer system coupled to the system bus 608 on behalf of the data cache 620. The data cache 620, however, does not generally store all of the data that may be requested by the data-access pipeline stage 634. If the data-access pipeline stage 634 requests data that is not contained in the data cache 620, the data cache 620 will request data from the memory system 610 or peripherals 612 via the second bus interface unit 638.
The data cache 620 is configured to update data contained within the data cache 620 if the core pipeline requests to overwrite data in memory system 610 that is also residing in the data cache 620. This allows the data cache 620 to eliminate the need for re-requesting data it is already caching from the memory system 610 simply because the core pipeline has submitted a request to update the data in the memory system 610.
The data cache 620 is also coupled to a write-back buffer 622, which retains a cache or buffer of data that the data-access pipeline stage 634 requests to write to the memory system 610. The write-back buffer 622 is also coupled to the second bus interface unit 638, which is coupled to the system bus 608. The write-back buffer 622 retains the requests to write to the memory generated by the data cache 620 and delivers the requests when appropriate to the memory system 610 via the second bus interface unit 638 and the system bus 608. The write-back buffer 622 can use methods or algorithms known in the art for efficiently buffering and sending requests to write to the memory system 610.
The fetch pipeline stage 728 is coupled to the instruction cache 718, which retains a cache of instructions for high-speed delivery to the fetch pipeline stage 728. As is known in the art, the instruction cache 718 can retain a cache of recently fetched instructions or apply algorithms to fetch and store frequently requested instructions or predict instructions that will be requested by the fetch pipeline stage 728. The instruction cache 718, however, does not generally store all instructions that may be requested by the core pipeline 716. If the fetch pipeline stage 728 requests an instruction that is not contained in the instruction cache 718, the instruction cache 718 will request the instruction from the memory system 710 via the first bus interface unit 726.
The data-access pipeline stage 734 is coupled to a data cache 720, which retains a cache of data requested by the data-access pipeline stage 734. The data cache 720 retains a cache of data in the memory system 710 for high-speed delivery to the core pipeline 716. The data cache 720 is coupled to a second bus interface unit 738, which is coupled to the system bus 708. The second bus interface unit 738 communicates with components in the computer system coupled to the system bus 708 on behalf of the data cache 720. The data cache 720, however, does not generally store all of the data that may be requested by the data-access pipeline stage 734. If the data-access pipeline stage 734 requests data that is not contained in the data cache 720, the data cache 720 will request data from the memory system 710 or peripherals 712 via the second bus interface unit 738.
The data cache 720 is coupled to a write-back buffer 722, which retains a cache or buffer of write data that the data-access pipeline stage 734 requests to write to the memory system 710. The write-back buffer 722 is also coupled to a third bus interface unit 740, which is coupled to the system bus 708. The third bus interface unit 740 communicates with components of the computer system also coupled to the system bus 708 on behalf of the write-back buffer 722. The write-back buffer retains write requests from the data-access pipeline stage 734 and delivers them to the memory system 710 when appropriate via the third bus interface unit 740. The write-back buffer 722 can use methods or algorithms known in the art for efficiently buffering and sending requests to write to the memory system 710.
The system bus arbiter 714 arbitrates access to the system bus 708 and determines when it is appropriate for a system bus master to read or write data to the system bus 708. As previously noted, if the system bus 708 conforms to a specification that does not allow more than one split transaction for each bus master residing on the system bus, such as the AHB specification, the memory's 710 fetching and writing of data can cause pipeline stalling of the core pipeline 716, which can degrade system performance. By employing a first bus interface unit 726, a second bus interface unit 738 and a third bus interface unit 740, a processor in accordance with the disclosure can effectively appear to the system bus 708 and system bus arbiter 714 as more than one bus master on the system bus 708. Consequently, because a processor 702 in accordance with the disclosure can effectively appear as three bus masters on the system bus 708, the processor 702 can initiate at least three concurrent split transactions, which can reduce the effect of pipeline stalling, reduce memory idle time and increase the performance of the computer system. Further, each depicted component can be further coupled to a sideband channel 709, which can be used to communicate various control signals between the depicted components coupled to the system bus 708. For example, a “split” or an “unsplit” signal can be transmitted on the sideband channel 709 so that it is not necessary to occupy the system bus 708 during the transmission of such a signal.
Memory Internal Status illustrates that, for example, the memory can begin the servicing of a data request before an instruction request has completed. The memory begins to access data requested by a data request m immediately after it has accessed a requested instruction for instruction request nt. The access of requested data occurs while the previously requested instruction is being read by the requesting bus interface unit. Subsequently, the memory can service a next instruction request while the data accessed in response to the data request is read by the requesting bus interface unit. This overlapping of processor memory requests results in improved performance and reduced memory idle time.