Features and advantages of embodiments of the claimed subject matter will become apparent as the following Detailed Description proceeds, and upon reference to the Drawings, wherein like numerals depict like parts, and in which:
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications, and variations thereof will be apparent to those skilled in the art.
Network devices may utilize multiple threads to process data packets. These threads may use program counters to address instructions stored in program memory. The program memory may be a small, fixed resource that temporarily stores small program images. A larger pool of instructions may be stored in another, larger memory and copied into the program memory on a per-thread basis. For example, in some network devices, the program memory may be only 8k addressable, while the larger memory may be 128k, or more. At any given time, a thread's program counter may be active and used to fetch instructions stored in the program memory. As a thread requires more instructions, it may generate a copy request to the larger memory to copy instructions into the program memory.
In some conventional network devices, the program memory can be reloaded by forcing all threads to stop executing, and then instructions may be copied from the larger memory into the program memory. Yet other network devices permit “on-the-fly” reloading of the program memory from the larger memory while permitting other thread(s) to continue executing instructions. However, such “on-the-fly” processing may present problems. Each thread may be executing instructions independently of other threads, and thus each thread may be “unaware” of what part of the instructions may have been loaded into the program memory. For example, one thread could replace instructions that another thread needs to execute. Continual displacement of instructions, with little or no forward progress in execution, is known as “thrashing”.
Generally, this disclosure describes program memory that may be partitioned to provide access to instructions on a per-thread basis. For example, in a processing environment where eight threads execute instructions, an 8k program memory may be partitioned into a first 4k partition (e.g., 0-4k) and a second 4k partition (e.g., 4k-8k). The first partition may provide a common memory space to store instructions that are used frequently by two or more threads. The second partition may be further divided into 8 segments of 512 instructions per segment. Each segment may provide a dedicated memory space for each respective thread. Further, each segment may be accessed and reloaded frequently by respective threads (which may occur independently of other threads). By storing frequently-used instructions in the first partition, copy operations from a larger memory into the program memory may be reduced. Additionally, by segmenting the second partition to provide each thread its own program memory space, the possibility that other threads may displace instructions used by a given thread may be eliminated. Accordingly, efficiency of memory operations may be improved.
In this example, eight threads (Thread 0, Thread 1, . . . , Thread 7) may be utilized, although a greater or fewer number of threads may be used without departing from this embodiment. Also, in this example, the program memory 104 is an 8k memory space, the first partition 106 is 4k of addressable memory space defined greater than or equal to Ok and less than 4k. The second partition 108 is also 4k of addressable memory space defined greater than or equal to 4k and less than 8k. Each segment of the second partition may be 512 instructions of addressable memory space, defined in sequence in the second partition 108. The address that divides the first partition 106 from the second partition 108 is referred to herein as K, and in this example is at address 4k. Of course, these are arbitrary values and are used in this embodiment for exemplary purposes only, and thus, the present embodiment may be used for program memory of any size and the partitions and segments may be defined to have any size and at any location within the program memory 104.
The first partition 106 may store instructions that are addressed by at least one thread via at least one program counter. In one example, the first partition 106 may store commonly-used and/or frequently-used instructions. For example, primary branch instructions (that may be accessed frequently by two or more threads) may be stored in the first partition 106. Such instructions may not require frequent replacement, since these types of instructions may be repeatedly used by two or more threads. Instructions stored in the second partition 108 may be frequently swapped out for other instructions, for example, secondary branch instructions which may be executed and then replaced with other secondary branch instructions. In general, the instructions stored in both the first and second partitions of the program memory 104 may be copied from a different, larger memory. For example, selected instructions may be copied into the first partition 106, and, during operation, each thread may generate a copy request to copy instructions from the larger memory into respective segments of the second partition 108.
For example,
Referring again to
As an overview, program memory access circuitry 110 may include decision circuitry 112 and decoder circuitry 114. The decision circuitry 112 may be configured to determine if the active PC 120 is greater than or equal to the address defined by K, or if the active PC 120 is less than the address defined by K. In other words, the decision circuitry 112 may be configured to compare the address of the active PC 120 to K to determine if the active PC address 120 is for addressing instructions stored in the first partition 106 or the second partition 108. If the active PC 120 defines an address for instructions stored in the first partition 106 (e.g., active PC<K), the decision circuitry may generate a first address 122 to address instructions stored in the first partition 106 of the program memory 104. If the active PC 120 defines an address for instructions stored in the second partition 108 (e.g., active PC>=K), the decoder circuitry 114 may generate a second address 124 to address instructions stored in one of the segments of the second partition 108 of the program memory, based on, at least in part, the thread number 116 associated with the active PC 120 and the address of K. Once the instructions are addressed in program memory 104, the instructions may be passed to decode and control logic circuitry 130 for processing.
Access circuitry 110 may generate one or more segment bits 302 as the most significant bit(s) (MSB) of the address 124 if the active PC address 120 is addressing a location in the second partition 108 of the program memory 104 (
In this example, assume K=4k, the program memory 104 is 8k of addressable memory space (13 bit address) and the active PC 120 is a 17 bit address. Also, assume for this example that the active thread number 116 is Thread 5, represented by the binary sequence 101, and the active PC 120 address is represented by the binary sequence 1—0111—0100—1111—0001. Thus, in this example, there is a 4-bit difference between the active PC 120 address (17 bit) and the address for the program memory 104 (13 bit). Decision circuitry 112 may determine if any of the first 5 bits of the active PC 120 address are a binary “1”. This process may enable decision circuitry 112 to determine if the active PC address 120 is for instructions in the first partition 106 or the second partition 108. In other words, decision circuitry 112 may determine if the active PC address 120 is greater than or less than the address defined by K. If all of the first 5 bits are binary “0” this may indicate that the active PC address 120 is for instructions with an address less than K and is therefore in the first partition 106, and decision circuitry 112 may truncate the first 4 bits of the active PC address 120 to form a 13 bit address (e.g., address 122) to fetch instructions from the first partition 106 of program memory 104.
However, and as stated in this example, the first five bits the active PC 120 include at least one binary “1” (e.g., 1—0111). This may indicate that the active PC 120 of this example is addressing instructions in the second partition 108. In this case, decision circuitry 112 may forward the active PC address 120 to decoder circuitry 114. Decoder circuitry 114, in turn, may generate address 124, as depicted in
Of course, the foregoing example is provided to aid in understanding of the operative features of access circuitry 110, and it is not intended to limit the present disclosure to the aforementioned assumptions. It is to be understood that other values for K, the active PC address size, the size of the program memory 104, the relative sizes of the first partition 106, the second partition 108 and each segment in the second partition, as well as the size and address space of larger memory 202 are equally contemplated herein. Moreover, K may be selected to enable quicker decision processing. For example, whole number values of K (e.g., K=4k) may require less processing operations and may therefore enhance overall operations. However, as stated, any value of K is equally contemplated herein. Also, while the foregoing assumes that the first partition is less than K and the second partition is greater than or equal to K, in alternative embodiments the specific address of K could be included in either the first or second partition, in which case matching operations described herein may also determine the address is less than or equal to K or greater than K.
The embodiments of
The IC 400 may include media/switch interface circuitry 402 (e.g., a CSIX interface) capable of sending and receiving data to and from devices connected to the integrated circuit such as physical or link layer devices, a switch fabric, or other processors or circuitry. The IC 400 may also include hash and scratch circuitry 404 that may execute, for example, polynomial division (e.g., 48-bit, 64-bit, 128-bit, etc.), which may be used during some packet processing operations. The IC 400 may also include bus interface circuitry 406 (e.g., a peripheral component interconnect (PCI) interface) for communicating with another processor such as a microprocessor (e.g. Intel Pentium®, etc.) or to provide an interface to an external device such as a public-key cryptosystem (e.g., a public-key accelerator) to transfer data to and from the IC 400 or external memory. The IC may also include core processor circuitry 408. In this embodiment, core processor circuitry 408 may comprise circuitry that may be compatible and/or in compliance with the Intel® XScale™ Core micro-architecture described in “Intel® XScale™ Core Developers Manual,” published December 2000 by the Assignee of the subject application. Of course, core processor circuitry 408 may comprise other types of processor core circuitry without departing from this embodiment. Core processor circuitry 408 may perform “control plane” tasks and management tasks (e.g., look-up table maintenance, etc.). Alternatively or additionally, core processor circuitry 408 may perform “data plane” tasks (which may be typically performed by the packet engines included in the packet engine array 418, described below) and may provide additional packet processing threads.
Integrated circuit 400 may also include a packet engine array 418. The packet engine array may include a plurality of packet engines 420a, 420b, . . . , 420n. Each packet engine 420a, 420b, . . . , 420n may provide multi-threading capability for executing instructions from an instruction set, such as a reduced instruction set computing (RISC) architecture. Each packet engine in the array 218 may be capable of executing processes such as packet verifying, packet classifying, packet forwarding, and so forth, while leaving more complicated processing to the core processor circuitry 408. Each packet engine in the array 418 may include e.g., eight threads that interleave instructions, meaning that as one thread is active (executing instructions), other threads may retrieve instructions for later execution. Of course, one or more packet engines may utilize a greater or fewer number of threads without departing from this embodiment. The packet engines may communicate among each other, for example, by using neighbor registers in communication with an adjacent engine or engines or by using shared memory space.
In this embodiment, at least one packet engine, for example packet engine 420a, may include the operative circuitry of
In this embodiment, the larger memory 202 may comprise an external memory coupled to the IC (e.g., external DRAM). Integrated circuit 400 may also include DRAM interface circuitry 410. DRAM interface circuitry 410 may control read/write access to external DRAM 202. As stated, instructions (executed by one or more threads associated with a packet engine) may be stored in DRAM 202. When new instructions are requested by a thread (for example, when a branch occurs during processing), packet engine 420a may issue an instruction to DRAM interface circuitry 410 to copy the instructions into the control store memory 104. To that end, DRAM interface circuitry 410 may include mapping circuitry 414 that may be capable of mapping a DRAM address associated with the requested instruction into an address in the control store memory 104. Referring briefly again to
Memory 202 may comprise one or more of the following types of memory: semiconductor firmware memory, programmable memory, non-volatile memory, read only memory, electrically programmable memory, static random access memory (e.g., SRAM), flash memory, dynamic random access memory (e.g., DRAM), magnetic disk memory, and/or optical disk memory. Either additionally or alternatively, memory 202 may comprise other and/or later-developed types of computer-readable memory. Machine readable firmware program instructions may be stored in memory 202, and/or other memory. These instructions may be accessed and executed by the integrated circuit 400. When executed by the integrated circuit 400, these instructions may result in the integrated circuit 400 performing the operations described herein as being performed by the integrated circuit, for example, operations described above with reference to
As used in any embodiment described herein, “circuitry” may comprise, for example, singly or in any combination, hardwired circuitry, programmable circuitry, state machine circuitry, and/or firmware that stores instructions executed by programmable circuitry. It should be understood at the outset that any of the operative components described in any embodiment herein may also be implemented in software, firmware, hardwired circuitry and/or any combination thereof. A “network device”, as used in any embodiment herein, may comprise for example, a switch, a router, a hub, and/or a computer node element configured to process data packets, a plurality of line cards connected to a switch fabric (e.g., a system of network/telecommunications enabled devices) and/or other similar device.
Additionally, the operative circuitry of
Accordingly, at least one embodiment described herein may provide an integrated circuit (IC) configured to execute instructions using a plurality of threads. The IC may include a program memory for storing the instructions. The IC may be further configured to partition the program memory into a first partition and a second partition. The IC may also be configured to store instructions in the first partition and to provide access to the first partition to at least two threads. The IC may be further configured to divide the second partition into a plurality of segments, store instructions in each respective segment corresponding to each respective thread, and provide access to each respective segment for each respective thread.
The terms and expressions which have been employed herein are used as terms of description and not of limitation, and there is no intention, in the use of such terms and expressions, of excluding any equivalents of the features shown and described (or portions thereof, and it is recognized that various modifications are possible within the scope of the claims. Accordingly, the claims are intended to cover all such equivalents.