The invention generally relates to a technique and apparatus for combining partial write transactions.
For purposes of facilitating processing, such as graphics processing, a microprocessor may have write combining buffers. Write combining buffers may present various challenges. For example, write transactions to the write combining memory region may compete with other cacheable write transactions. Furthermore, such factors as serializing instructions, weak ordering, interrupts, context switches and entry into power saving modes may frequently evict the write combining buffers before they are full. Premature eviction happens before all write transactions to a write combining buffer are completed, resulting in a series of, for example, eight byte partial bus transactions rather than a single sixty-four byte write transaction. When partial write transactions occur on the bus, the effective rate at which data is communicated to system memory is significantly reduced. Therefore, avoiding partial-write transactions may be quite important to ensure full bus bandwidth utilization.
In conventional multi-bus server systems, it is possible for multiple processors to issue conflicting requests to the same cache-line. The chipsets in these systems typically rely on address matching to prevent the concurrent servicing of multiple conflicting transactions in order to maintain cache coherency. Subsequent conflicting transactions may be processed only after the initial transaction is completed by, for example, retrying the subsequent conflicting transactions or queuing up the transactions in a finite queue structure. A disadvantage of the retry serialization is that valuable processor request bandwidth may be wasted. The queue structure has its limitations once it gets full.
Thus, there is a continuing need for better ways to handle partial write transactions.
Referring to
In general, the north bridge 10 receives write transactions from the processors 30, which may include partial write transactions, i.e., write transactions in which the data written is less than a cache line. As described further below, the write combining hardware 20 combines the partial write transactions to preferably form full cache line, or full write, transactions, which are communicated over a memory bus 40 for purposes of storing the associated data in a memory, such as in an exemplary system memory 44.
Referring to
In general, each sub-window 60 is associated with a tracking register to track the partial write segments, or “chunks,” which are stored in corresponding entries 104 of a data buffer 100. The tracking registers store such information as the address, buffer identification and other transactional-related information. As depicted in
The write combining hardware 20 includes a partial merge write queue 90, which stores the partial data entries 92 to be preferably merged into full cache lines. The merged partial write data remains in the queue 90 until either an explicit flush is issued to the bridge 10 (
In general, a controller 70 of the write combining hardware 20 is designed to back-fill the remainder of a partial cache line before the actual write is transacted. In certain systems, the full cache line may be modified in other processor caches. The controller 70 resolves the coherency and provides the coherent cache line for the partial merge.
The write combining hardware 20 includes a write post buffer 94, which stores posted transaction entries 96 to be written to memory. In general, the controller 70 uses the merged buffer queue 90 and the write post buffer 94 to control the merging of the partial data in the buffer 100 (via a data merge circuit 110) in order to preferably form full cache line writes to the memory.
The write combining hardware 20 also includes a transaction table 80, which has entries 82 to track the accepted write transactions. In general, partial write transactions are accepted and generally handled pursuant to a technique 150 (
Referring to
If, however, the controller 70 determines (diamond 152) that the incoming partial write transaction does conflict with one of the transactions stored in the table 80, then the controller 70 determines (diamond 160) whether the partial write transaction is a match with one of the write combining windows 58, pursuant to diamond 160. If a match has occurred, then the controller 70 records (block 165) the partial write data in the appropriate subwindow 60, pursuant to block 165.
If the controller 70 determines (diamond 160) that the conflicting partial transaction does not match any of the windows 58, then the controller 70 determines pursuant to diamond 168 whether a write combining window 58 is available. If so, the controller 70 records (block 170) the partial write information in a previously unoccupied write combining window 58. Otherwise, the controller 70 generates (block 169) a retry on the front side bus 32 (see
Due to the long latency of this memory back-fill process, the processor may issue subsequent partial writes within the same cache-line (e.g. premature write combining evictions). The partial-write optimization logic described herein is able to track the partial write transactions in the write combining windows 58 and is able to complete the partial write transactions without retry. In the meantime, partial write data is merged with the back-filled cache-lines. The optimization also provides a “merged data tracking queue” structure to hold on to the merged data entry without the actual write to memory. By holding on the merged line in data-buffer, the data-buffer entries function as a small cache. Any subsequent partial write that is hit to the merged data queue can get the back-filled line immediately without requiring re-accessing memory. When the merged data tracking queue overflows, the cache-line corresponding to the oldest merged data tracking queue entry is evicted (written) to memory.
While the invention has been disclosed with respect to a limited number of embodiments, those skilled in the art, having the benefit of this disclosure, will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of the invention.