This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2013-061889, filed Mar. 25, 2013, the entire contents of which are incorporated herein by reference.
Embodiments described herein relate to a shared memory control unit having a lock transaction controller.
Conventionally, shared memory systems include a large number of master devices (for example, many core processors or hardware engines), a shared memory that is shared in common by the master devices, a large number of routers, and an interconnect that connects these devices to each other.
In such systems, when the master devices access the shared memory and issue a “test and set” (TAS) request for a portion of the shared memory via the interconnect, the TAS requests from the master devices often compete and interfere with each other. Hence, each master device must reissue a previously issued TAS request.
In other cases, one master device (hereinafter referred to as “locking master device”) reserves a locked region in the shared memory, while other master devices (hereinafter, referred to as “non-locking master devices”) repeatedly reissue TAS requests until the locked region is released. In these cases, the interconnect is overburdened by the transfer of data relating to lock processing and, hence, data transfer efficiency is lowered. In this case, even when the locking master device issues a write request to release the locked region, the transmission of the write request is obstructed by the transmission of TAS requests reissued by the non-locking master devices and, hence, the release of the locked region is delayed.
According to an embodiment, there is provided a system that includes a plurality of master devices and a shared memory, and that improves data transfer efficiency.
In one embodiment, a shared memory control unit that controls access to a shared memory by a plurality of master devices based on access requests received from the plurality of master devices is provided. The shared memory control unit comprises a memory access arbiter that receives a lock reading request to lock a portion of the shared memory, awaiting queue that stores the access requests, and a lock transaction controller. The lock transaction controller is configured to receive a plurality of access requests after the lock reading request is received by the memory access arbiter. The lock transaction controller is further configured to store the access requests in the waiting queue and to receive an unlock writing request to unlock the portion of shared memory. The lock transaction controller is further configured to release the access requests from the waiting queue after the portion of shared memory is unlocked.
According to an embodiment, there is provided a system having a plurality of master devices and a shared memory that improves data transfer efficiency.
The configuration of a multiprocessing system of a first embodiment is explained.
The respective master devices MD (for example, MD0 to MD3) are devices connected to each other via a first level router R1 (for example, R10), and comprise a processing unit or a hardware engine, for example. The first level router R1 (for example, R10) is connected to another first level router R1 (for example, R14) and the second level router R2 (for example, R20). The second level router R2 (for example, R20) is connected to other second level routers R2 (for example, R21 to R23) and to the shared memory control unit 10 via the interconnect IC.
The master devices MD perform data processing by obtaining access to the shared memory SM. The first level router R1 and the second level router R2 perform a routing control of data which a master device MD transmits or receives. The shared memory control unit 10 controls access to the shared memory SM. The shared memory SM stores data to be processed by the master devices MD therein.
The ports P are interfaces between the interconnect IC and the shared memory control unit 10. The lock transaction controller 12 controls lock transactions for the shared memory SM. Specifically, the lock transaction controller 12 controls a read access from and a write access to a locked region, and performs processing to reissue a new lock reading request that arrives during a locking operation, after putting the new lock reading request into a wait state. An example of a lock reading request is the test portion of a Test and Set (TAS) request.
The memory access bridges 140 to 143 and the memory access parts 160 to 163 correspond to the ports P0 to P3, respectively. The memory access bridges 140 to 143 receive access requests from the master devices MD via the ports P0 to P3. When the access requests which the memory access bridges 140 to 143 receive are reading requests, the memory access parts 160 to 163 hold the reading requests, while, when an access request is a writing request, the memory access parts 160 to 163 hold the writing request and writing data accompanying the writing request.
The memory access arbiter 18 arbitrates an access request that the memory access bridges 140 to 143 receive. The memory access arbiter 18 includes a lock information storage part 182. The memory access arbiter 18 stores lock information used for locking the shared memory SM in the lock information storage part 182 when the access request is a lock reading request. Although the lock information storage part 182 is disposed within the memory access arbiter 18 in
An output (read access information RD of a reading request or write access information WR of a writing request) of the memory access arbiter 18 is first stored in the portal queue 120. The output of the memory access arbiter 18 contains: a request address that indicates an access destination in the shared memory SM; a lock request flag indicating the presence or the non-presence of a lock request; an unlock flag indicating whether or not a request is an unlock writing request (e.g., the set portion of a Test and Set (TAS) request); an identification flag indicating a type of access request (a reading request or a writing request); a burst flag indicating whether or not a burst writing request is necessary; and a bridge Identification (ID) indicating the memory access bridges 140 to 143 which have received an access request.
The reading queue 122 includes a plurality of read access information storage parts 1220. A plurality of read access information RD are stored in each read access information storage part 1220 as individual queue entries. Read pointers of the reading requests stored in the reading queue 122 are stored in the read pointer circuit 121. The read pointer indicates the queue position of the reading request stored in the reading queue 122.
The write-updating logic 123 extracts information (a byte enable BE and writing data WD) for updating the writing queue 124 for a writing request.
In the writing queue 124, a plurality of writing requests are stored such that the plurality of writing requests are integrated in one queue entry. The writing queue 124 includes: a write access information storage part 1240; a byte enable storage part 1242; and a writing data storage part 1244. In the write access information storage part 1240, write access information WR of a writing request (i.e., a standard writing request or an unlock writing request for the locked region), which is input to the lock transaction controller 12, is stored. In the byte enable storage part 1242, a byte enable BE of the writing request is stored. In the writing data storage part 1244, writing data WD of the writing request is stored.
With respect to the standard writing request for the locked region before the lock is released, the write-updating logic 123 does not update the write access information storage part 1240 and, hence, the content of the writing request is reflected only in the byte enable storage part 1242 and the writing data storage part 1244.
The output selector 128 selectively outputs an output of the reading queue 122 and an output of the writing queue 124 to the memory access bridges 140 to 143.
The writing queue 124 includes: a write access information storage part 1240; a plurality of byte enable storage parts 1242; and a plurality of writing data storage parts 1244. The write access information storage part 1240 shown in
In each byte enable storage part 1242, a byte enable BE of a writing request is stored. In each writing data storage part 1244, writing data WD of a standard writing request is stored.
The manner of operation of the write-updating logic 123 of this embodiment is explained. Hereinafter, an example is explained where the write-updating logic 123 updates the byte enable storage part 1242 and the writing data storage part 1244 within a period from a point in time t1 at which a single writing request for the locked region (address “0x0001—0000”) first arrives at the writing queue 124 to points in time t2 to t4, which follow the point in time t1. In the explanation made hereinafter, it is assumed that the byte enable BE is 8 bits and writing data WD is 64 bits.
First, when a standard writing request (write address “0x0001—0000”, writing data WD “0x1234—5678—9abc_def0” and byte enable BE “0xf0”) not accompanying unlocking is input at the time t1, the byte enable BE “0xf0” is stored in the byte enable storage part 1242, and the writing data WD “0x1234—5678—0000—0000” is stored in the writing data storage part 1244.
Next, when a normal writing request (write address “0x0001—0000”, byte enable BE “0x03” and writing data WD “0xdef0—9abc—5678—1234”) is input at the time t2 (>t1), the byte enable BE “0xf3” is stored in the byte enable storage part 1242, and the writing data WD “0x1234—5678—0000—1234” is stored in the writing data storage part 1244.
Next, when a normal writing request (write address “0x0001—0000”, byte enable BE “0x10” and writing data WD “0x9abc_def0—1234—5678”) is input at the time t3 (>t2), the byte enable BE “0xf3” is stored in the byte enable storage part 1242, and the writing data WD “0x1234—56f0—0000—1234” is stored in the writing data storage part 1244.
When the unlock writing request (write address “0x0001—0000”, byte enable BE “0x08” and writing data WD “0x5678—1234_def0—9abc”) is input at a time t4 (>t3), the new byte enable BE “0x08” input at the time t4 does not overlap with the existing byte enable BE “0xf3” which is stored in the byte enable storage part 1242 at the time t3 and, hence, a portion of new writing data WD “0x5678—1234_def0—9abc” corresponding to the byte enable BE is written in the writing data storage part 1244, whereby the writing data storage part 1244 stores “0x1234—56f0_de00—1234”, and the byte enable BE “0xfb” is stored in the byte enable storage part 1242.
However, if the unlock writing request (write address “0x0001—0000,”, the byte enable BE “0x01” and the writing data WD “0x5678—1234_def0—9abc”) are input at time t4, the new byte enable BE “0x01” overlaps with the existing byte enable BE “0xf3” and, hence, the portion of new writing data WD “0x5678—1234_def0—9abc” corresponding to the byte enable BE is not written in the writing data storage part 1244, so that the writing data at the time t3 is held. In this case, the byte enable storage part 1242 also holds the byte enable at the time t3.
In this manner, the write-updating logic 123 performs writing to the writing data storage part 1244 when the existing byte enable BE and the byte enable BE of the unlock writing request do not overlap with each other. However, when the existing byte enable BE and the byte enable BE of the unlock writing request do overlap, the write-updating logic 123 does not perform a write into the writing data storage part 1244 and, hence, the order of processing (where the standard writing request waits and is processed after the unlocking of the locked region, may be maintained.
Due to the processing described above, it is possible to guarantee the order of writing write data WD contained in a writing request into the shared memory SM. When the shared memory SM is a shared cache memory, a write burst length is a cache line length, and a region to be locked is set in accordance with every cache line and hence, with the use of the writing data storage part 1244 and the write byte enable storage part 1242 having a cache line length, it is possible to support a burst write while guaranteeing the order of writing write data WD into the shared memory SM using the same processing.
The flow of the lock transaction according to the first embodiment is explained.
As shown in
As shown in
Thereafter, when a new access request (a reading request, a writing request having a different lock master ID (a writing request not requesting releasing of lock) or a new lock reading request (regardless of address)) arrives at a region to be locked in the shared memory SM, as shown in
To facilitate the understanding of the description made hereinafter, it is assumed that a reading request to a region to be locked arrives at the shared memory control unit 10 as a new access request.
Then, as shown in
When the unlock writing request arrives at the lock transaction controller 12, the state machine 126 in
In this stage, the lock transaction controller 12 outputs a temporary stop instruction to temporarily stop the reception of new access requests to the memory access bridges 140 to 143 via a temporary stop line (indicated by symbol 126b in
Then, the lock transaction controller 12 waits until remaining requests in the memory access bridge 14 and the memory access part 16 arrive at the memory access arbiter 18.
As shown in
When access completion notification arrives at the lock transaction controller 12, the state machine 126 in
As a result of this transition of the state machine 126, the lock transaction controller 12 inputs again a writing request which is obtained by integrating an unlock writing request stored in the writing queue 124 and an existing writing request that waits before the unlock writing request is stored to the memory access bridge 141 (that is, a memory access bridge through which the unlock writing request passes) ((10) in
When the writing request that is input again arrives at the memory access arbiter 18, the memory access bridge 14 asserts a reception signal to the lock transaction controller 12 via a reception notification line (indicated by symbol 126d in
When the reception signal is asserted, the state machine 126 in
Then, the lock transaction controller 12 outputs writing data WD and byte enable BE to the memory access bridge 14. When the writing data WD and the byte enable BE arrive at the memory access bridge 14, the memory access bridge 14 asserts a reception signal to the lock transaction controller 12 via the reception notification line 126d.
When the reception signal is asserted, the state machine 126 in
When the re-input unlock writing request arrives at the memory access arbiter 18, information stored in the lock information storage part 182 is deleted ((132) in
When the access completion notification arrives at the lock transaction controller 12, since, in this embodiment, a reading request exists in the reading queue, the state machine 126 shown in
As shown in
The lock transaction controller 12 determines whether or not the reading request output in (14) in
When the reception signal for the last reading request is asserted, the state machine 126 in
When the access completion notification arrives at the lock transaction controller 12, the state machine 126 transits to the IDLE state(S100) from the WAIT_READ state(S112).
When the state machine 126 returns to the IDLE state(S100), as shown in
According to this embodiment, it is possible to improve the data transfer efficiency of the system that includes a plurality of master devices and a shared memory.
At least some parts of the multiprocessing system 1 according to this embodiment may be designed by hardware or software. When the part of the multiprocessing system 1 is constituted of software, a program for realizing at least some functions of the multiprocessing system 1 may be stored in a recording medium such as a flexible disc or a CD-ROM and the program may be executed by a program readable computer. The recording medium is not limited to a recording medium which is detachable such as a magnetic disc or an optical disc, and may be a fixed-type recording medium such as a hard disc device or a memory.
Further, a program which realizes at least some functions of the multiprocessing system 1 according to this embodiment may be distributed via communication lines (including wireless communication) such as the internet. Further, such a program may be distributed via a wire line or a wireless line such as the internet in a state where the program is encrypted, modulated or compressed, or may be distributed in a state where the program is stored in a recording medium.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2013-061889 | Mar 2013 | JP | national |