The present invention relates generally to integrated circuits (ICs). More particularly, the invention relates to an improved architecture with shared memory.
An SOC may be provided with multiple processors that execute, for example, the same program. Depending on the application, the processors can execute different programs or share the same program. Generally, each processor is associated with its own memory module to improve performance because a memory module can only be accessed by one processor during each clock cycle. Thus, with its own memory, a processor need not wait for memory to be free since it is the only processor that will be accessing its associated memory module. However, the improved performance is achieved at the sacrifice of chip size since duplicate memory modules are required for each processor.
As evidenced from the above discussion, it is desirable to provide systems in which the processors can share a memory module to reduce chip size without incurring the performance penalty of conventional designs.
The invention relates, in one embodiment, to a method of sharing a memory module between a plurality of processors. The memory module is divided into n banks, where n=at least 2. Each bank can be accessed by one or more processors at any one time. The memory module is mapped to allocate sequential addresses to alternate banks of the memory, where sequential data are stored in alternate banks due to the mapping of the memory. In one embodiment, the memory banks are divided into x blocks, where x=at least 1, wherein each block can be accessed by one of the plurality of processors at any one time. In another embodiment, the method further includes synchronizing the processors to access different blocks at any one time.
The processors are coupled to a memory module 260 via respective memory buses 218a and 218b. The memory bus, for example, is 16 bits wide. Other size buses can also be used, depending on the width of each data byte. Data bytes accessed by the processors are stored in the memory module. In one embodiment, the data bytes comprise program instructions, whereby the processors fetch instructions from the memory module for execution.
In accordance with one embodiment of the invention, the memory module is shared between the processors without noticeable performance degradation, eliminating the need to provide duplicate memory modules for each processor. Noticeable performance degradation is avoided by separating the memory module into n number of independently operable banks 265, where n is an integer greater than or equal to 2. Preferably, n=the number of processors in the system (i.e. n=m). Since the memory banks operate independently, processors can simultaneously access different banks of the memory module during the same clock cycle.
In another embodiment, a memory bank is subdivided into x number of independently accessible blocks 275a-p, where x is an integer greater than or equal to 1. In one embodiment, each bank is subdivided into 8 independently accessible blocks. Generally, the greater the number of blocks, the lower the probability of contention. The number of blocks, in one embodiment, is selected to optimize performance and reduce contention.
In one embodiment, each processor (210a or 210b) has a bus (218a or 218b) coupled to each bank. The blocks of the memory array, each have, for example control circuitry 278 to appropriately place data on the bus to the processors. The control circuitry comprises, for example, multiplexing circuitry or tri-state buffers to direct the data to the right processor. Each bank, for example, is subdivided into 8 blocks. By providing independent blocks within a bank, processors can advantageously access different blocks, irrespective of whether they are from the same bank or not. This further increases system performance by reducing potential conflicts between processors.
Furthermore, the memory is mapped so that contiguous memory addresses are rotated between the different memory banks. For example, in a two-bank memory module (e.g., bank 0 and bank 1), one bank (bank 0) would be assigned the even addresses while odd addresses are assigned to the other bank (bank 1). This would result in data bytes in sequential addresses being stored in alternate memory banks, such as data byte 1 in bank 0, data byte 2 in bank 1, data byte 3 in bank 0 and so forth. The data bytes, in one embodiment, comprise instructions in a program. Since program instructions are executed in sequence with the exception of jumps (e.g., branch and loop instructions), a processor would generally access different banks of the memory module after each cycle during program execution. By synchronizing or staggering the processors to execute the program so that the processors access different memory banks in the same cycle, multiple processors can execute the same program stored in memory module 260 simultaneously.
A flow control unit (FCU) 245 synchronizes the processors to access different memory blocks to prevent memory conflicts or contentions. In the event of a memory conflict (e.g. two processors accessing the same block simultaneously), the FCU locks one of the processors (e.g. inserts a wait state or cycle) while allowing the other processor to access the memory. This should synchronize the processors to access different memory banks in the next clock cycle. Once synchronized, both processors can access the memory module during the same clock cycle until a memory conflict caused by, for example, a jump instruction, occurs. If both processors (210a and 210b) tries to access block 275a in the same cycle, a wait state is inserted in, for example, processor 210b for one cycle, such that processor 210a first accesses block 275a. In the next clock cycle, processor 210a accesses block 275b and processor 210b accesses block 275a. The processors 210a and 210b are hence synchronized to access different memory banks in the subsequent clock cycles.
Optionally, the processors can be provided with respective critical memory modules 215. The critical memory module, for example, is smaller than the main memory module 260 and is used for storing programs or subroutines which are accessed frequently by the processors (e.g., MIPS critical). The use of critical memory modules enhances system performance by reducing memory conflicts without going to the extent of significantly increasing chip size. A control circuit 214 is provided. The control circuit is coupled to bus 217 and 218 to appropriately multiplex data from memory module 260 or critical memory module 215. In one embodiment, the control circuit comprises tri-state buffers to decouple and couple the appropriate bus to the processor.
In one embodiment, the FCU is implemented as a state machine.
If no conflict exists, the processors access the memory module at step 340 in the same cycle. If a conflict exists, the FCU determines the priority of access by the processors at step 350. If processor A has a higher priority, the FCU allows processor A to access the memory while processor B executes a wait state at step 360. If processor B has a higher priority, processor B accesses the memory while processor A executes a wait state at step 370. After step 340, 360, or 370, the FCU returns to step 320 to compare the addresses for the next memory access by the processors. For example, if a conflict exists, such as at step 360, a wait state is inserted for processor B while processor A accesses the memory at address AAdd. Hence, both processors are synchronized to access different memory blocks in subsequent cycles.
In one embodiment, the FCU compares the addresses of processor A and processor B in step 440 to determine if the processors are accessing the same memory block. In the event that the processors are accessing different memory blocks (i.e., no conflict), the FCU allows both processors to access the memory simultaneously at step 430. If a conflict exists, the FCU compares, for example, the least significant bits of the current and previous addresses of processor A to determine access priority in step 460. If the least significant bits are not equal (i.e. the current and previous addresses are consecutive), processor B may have caused the conflict by executing a jump. As such, the FCU proceeds to step 470, locking processor B while allowing processor A to access the memory. If the least significant bits are equal, processor A is locked and processor B accesses the memory at step 480.
A situation may occur where both processors performed a jump. In such a case, the FCU proceeds to step 580 and examines a priority register which contains the information indicating which processor has priority. In one embodiment, the priority register is toggled to alternate the priority between the processors. As shown in
In alternative embodiments, other types of arbitration schemes can be also be employed by the FCU to synchronize the processors. In one embodiment, the processors may be assigned a specific priority level vis-à-vis the other processor or processors.
Referring to
While the invention has been particularly shown and described with reference to various embodiments, it will be recognized by those skilled in the art that modifications and changes may be made to the present invention without departing from the spirit and scope thereof. The scope of the invention should therefore be determined not with reference to the above description but with reference to the appended claims along with their full scope of equivalents.
This application claims priority of provisional patent application U.S. Ser. No. 60/333,220, filed on Nov. 6, 2001, which is herein incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP02/12398 | 11/6/2002 | WO |
Number | Date | Country | |
---|---|---|---|
60333220 | Nov 2001 | US |