Features and advantages of embodiments of the claimed subject matter will become apparent as the following Detailed Description proceeds, and upon reference to the Drawings, wherein like numerals depict like parts, and in which:
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications, and variations thereof will be apparent to those skilled in the art.
Generally, this disclosure describes program memory that may be configured for data store capabilities. For example, a multiple threaded processing environment may include a plurality of small data registers for storing data and a larger program memory (e.g., control store memory) for storing instruction images. Some processing environments are tailored to execute small instruction images, and thus, such small instruction images may occupy only a portion of the program memory. As instructions are retrieved from the program memory and executed, data in the data registers may be loaded and reloaded to support data processing operations. To utilize unused memory space in the program memory, the present disclosure describes data write methodologies to write data stored in at least one of the data registers into the program memory. Additionally, the present disclosure provides data read methodologies to read data stored in the program memory and move that data into one or more data registers. Thus, unused space in the program memory may be used to store data that may otherwise be stored in registers and/or external, larger memory.
This embodiment may also include arithmetic logic unit (ALU) 108 configured to process one or more instructions from control circuitry 150. In addition, during processing of instructions, ALU 108 may fetch data stored in one or more data registers 106 and execute one or more arithmetic operations (e.g., addition, subtraction, etc.) and/or logical operations (e.g., logical AND, logical OR, etc.).
Control circuitry 150 may include decode circuitry 104 and one or more program counters (PC) 136. Decode circuitry 104 may be capable of fetching one or more instructions from program memory 102, decoding the instruction, and passing the instruction to the ALU 108 for processing. In general, program memory 102 may store processing instructions (as may be used during data processing), data write instructions to enable a data write operation to move data from the data registers 106 into the program memory 102, and data read instructions to enable a data read from the program memory 102 (and, in some embodiments, store that data in one or more data registers 106). When the embodiment of
As an overview, control circuitry 150 may be configured to perform a data write operation to move data stored in one or more registers 106 into program memory 102. To write data from the data registers 106 into program memory 102, control circuitry 150 may be configured to schedule a data write operation. To prevent additional instructions from interfering with a scheduled data write operation, control circuitry 150 may also be configured to steal one or more cycles from one or more instruction fetch and/or decode operations to permit data to be written into the program memory 102. Additionally, control circuitry 150 may be further configured to read data from program memory 102, and write that data into one or more of the data registers 106. To read data from the program memory 102, control circuitry 150 may be configured to schedule a data read operation. To prevent additional instructions from interfering with a scheduled data read operation, control circuitry 150 may also be configured to steal one or more cycles from one or more instruction fetch and/or decode operations to permit data to be read from the program memory 102. These operations may enable, for example, the program memory 102 to be used as both an instruction memory space and a data memory space.
In operation, before a data write or data read instruction is read out of the program memory, decode circuitry 104 may receive an address load instruction, and may pass a value into at least one of the address registers 124 and/or 126 which may point to a specific location in the program memory 102. As will be described below, if a data write or data read instruction is later read from the program memory, the address registers 124 and/or 126 may be used for the data read and/or data write operations. Boot circuitry 140 may be provided to load instruction images (e.g., processing instructions, data write instructions and data read instructions) into program memory 102 upon initialization and/or reset of the circuitry depicted in
At least one of these instruction images stored on program memory 102 may include one or more instructions to move data stored in one or more data registers 106 into the program memory 102 (this instruction shall be referred to herein as a “program memory data write instruction”). When the program memory data write instruction is fetched by decode circuitry 104 and issued from memory 102, the program memory data write instruction may specify one of one or more program memory address registers to use as the “data write address” into the program memory 102. Or, the program memory data write instruction may include a specific address to use as the “data write address” in program memory 102 where the data is to be stored. Decode circuitry 104 may pass the data write address into at least one of the address registers 124 and/or 126. Upon receiving a program memory data write instruction, decode circuitry 104 may generate a request to program memory data write scheduler circuitry 114 to schedule a data write operation.
Data write scheduler circuitry 114 may be configured to schedule one or more data write operations to write data into the program memory 102. Upon receiving a request to schedule a data write into program memory 102, data write scheduler 114 may be configured to instruct the ALU 108 to pass the data output of one or more data registers 106 (as may be specified by the program memory data write instruction) into the program memory write data register 122. For example, data write scheduler circuitry 114 may be configured to schedule a data write to occur at a predetermined future instruction fetch cycle. To that end, data write scheduler circuitry 114 may control data access cycle steal circuitry 116 to “steal” at least one future instruction fetch cycle from the decode circuitry 104. When the stolen instruction fetch cycle occurs, data access cycle steal circuitry 116 may generate a control signal to decode circuitry 104 to abort instruction fetch and/or instruction decode operations to permit a data write into program memory 102 to occur.
During a data write operation, the address stored in register 124 and/or 126 may be used instead of, for example, an address defined by the program counters 136. To that end, the program counters 136 may be frozen during data write operations so that the program counters 136 do not increment until data write operations have concluded. Once the program memory 102 is addressed, the data stored in data register 122 may be written into memory, and data access cycle steal circuitry 116 may control decode circuitry 104 to resume instruction fetch and decode operations. Of course, multiple data write instructions may be issued sequentially. In that case, program memory data write scheduler circuitry 114 may schedule multiple data write operations by stealing multiple instruction fetch and/or decode cycles from decode circuitry 104. Further, for multiple data write operations, increment circuitry 138 may increment registers 124 and/or 126 to generate additional addresses to address the program memory 102.
A stolen instruction fetch cycle may be a fixed latency from when the data write instruction was fetched (e.g., issued), and may be based on, for example, the number of processing pipeline stages present. For example, decode circuitry 104 may use two cycles to fetch and a cycle to decode an instruction. A read of the data registers 106 may use another cycle. The ALU 108 may use another cycle to process the instruction and/or move data from or within the registers 106. Additional cycles may be used to store a data write address in register 124 and/or 126 and to move the data from one or more data registers 106 into register 122. Thus, in this example, data access cycle steal circuitry 116 may steal an instruction fetch cycle from decode circuitry 104 six or seven cycles after the data write instruction is fetched. Of course, these are only examples of processing cycles and it is understood that different implementations of the concepts provided herein may use a different number of cycles to process instructions. These alternatives are within the scope of the present disclosure.
Data access cycle steal circuitry 116 may control decode circuitry 104 to suspend instruction fetching operations for a cycle prior to writing data (stored in register 122) to the program memory 102 to permit, for example, read-to-write turnaround. A read-to-write turn around operation may enable control circuitry 150 to transition from read state (during which, for example, instructions may be read out of memory 102) to a write state (to permit, for example, data to be written into program memory 102). Additionally, data access cycle steal circuitry 116 may control decode circuitry 104 to suspend instruction fetching operations and/or instruction decode operations for a cycle after the last data write to the program memory 102 to permit, for example, write-to-read turnaround. A write-to-read turnaround operation may enable control circuitry 150 to transition from write state (during which data may be written into memory 102) to a read state (to permit, for example, additional instructions to be read out of program memory 102).
Multiplexer circuitry 110, 118, 120, 128, 130, 132 and 134 depicted in
Before the data write occurs, the processor may read the contents of one or more data registers 210, and pass the data in the data register to a program memory data write register 212. To address the program memory for the data store location, the processor may load the data write address (as may be stored in one more registers) 214. The processor may also abort instruction decode and/or instruction fetch operations 216, for example, during one or more stolen instruction fetch cycles. Before data is moved from the program memory data write register into the program memory, the processor may perform a read-to-write turnaround operation during one or more stolen instruction fetch cycles 218. The processor may then write the data into the program memory during one or more stolen instruction fetch cycles 220. After data write operations have concluded, the processor may perform a write-to-read turnaround operation during an additional stolen instruction fetch cycle 220.
With continued reference to
Data read scheduler circuitry 112 may be configured to schedule one or more data read operations to read data from the program memory 102. Upon receiving a request to schedule a data read from program memory 102, data read scheduler 112 may be configured to schedule a data read to occur at a predetermined future instruction fetch cycle. To that end, data read scheduler circuitry 112 may control data access cycle steal circuitry 116 to “steal” a future instruction fetch cycle from the decode circuitry 104. When the stolen instruction fetch cycle occurs, data access cycle steal circuitry 116 may generate a control signal to decode circuitry 104 to abort instruction decode operations and/or instruction fetch operations so that a data read from program memory 102 may occur. The stolen instruction fetch cycle may occur, for example, at a fixed latency from when the data read instruction was fetched (e.g., issued). To that end, and similar to the description above, the fixed latency may be based on, for example, the number of pipeline stages present in a given processing environment.
During a data read operation, the address stored in register 124 and/or 126 may be used instead of the address defined by the program counters 136. To that end, the program counters 136 may be frozen so that the program counters 136 do not increment until data read operations have concluded. Once the program memory is addressed 102, the data stored at the specified address in the program memory may be read out of the program memory. Data read scheduler circuitry 112 may also control the decode circuitry 104 to ignore the output of the program memory 102 while the data is read out. Data read scheduler circuitry 112 may also instruct ALU 108 to pass the data (from program memory 102) without modification and return the data to one or more data registers 106. Once data read operations have completed, data access cycle steal circuitry 116 may control decode circuitry 104 to resume instruction fetch and decode operations. Of course, multiple data read instructions may be issued sequentially. In that case, program memory data read scheduler circuitry 112 may schedule multiple data read operations by stealing multiple instruction fetch and/or decode cycles from decode circuitry 104. Further, for multiple data read operations, increment circuitry 138 may increment registers 124 and/or 126 to generate additional addresses to address the program memory 102.
The embodiment of
The IC 400 may include media/switch interface circuitry 402 (e.g., a CSIX interface) capable of sending and receiving data to and from devices connected to the integrated circuit such as physical or link layer devices, a switch fabric, or other processors or circuitry. The IC 400 may also include hash and scratch circuitry 404 that may execute, for example, polynomial division (e.g., 48-bit, 64-bit, 128-bit, etc.), which may be used during some packet processing operations. The IC 400 may also include bus interface circuitry 406 (e.g., a peripheral component interconnect (PCI) interface) for communicating with another processor such as a microprocessor (e.g. Intel Pentium®, etc.) or to provide an interface to an external device such as a public-key cryptosystem (e.g., a public-key accelerator) to transfer data to and from the IC 400 or external memory. The IC may also include core processor circuitry 408. In this embodiment, core processor circuitry 408 may comprise circuitry that may be compatible and/or in compliance with the Intel® XScale™ Core micro-architecture described in “Intel® XScale™ Core Developers Manual,” published December 2000 by the Assignee of the subject application. Of course, core processor circuitry 408 may comprise other types of processor core circuitry without departing from this embodiment. Core processor circuitry 408 may perform “control plane” tasks and management tasks (e.g., look-up table maintenance, etc.). Alternatively or additionally, core processor circuitry 408 may perform “data plane” tasks (which may be typically performed by the packet engines included in the packet engine array 418, described below) and may provide additional packet processing threads.
Integrated circuit 400 may also include a packet engine array 418. The packet engine array may include a plurality of packet engines 420a, 420b, . . . ,420n. Each packet engine 420a, 420b, . . . ,420n may provide multi-threading capability for executing instructions from an instruction set, such as a reduced instruction set computing (RISC) architecture. Each packet engine in the array 418 may be capable of executing processes such as packet verifying, packet classifying, packet forwarding, and so forth, while leaving more complicated processing to the core processor circuitry 408. Each packet engine in the array 418 may include e.g., eight threads that interleave instructions, meaning that as one thread is active (executing instructions), other threads may retrieve instructions for later execution. Of course, one or more packet engines may utilize a greater or fewer number of threads without departing from this embodiment. The packet engines may communicate among each other, for example, by using neighbor registers in communication with an adjacent engine or engines or by using shared memory space.
In this embodiment, at least one packet engine, for example packet engine 420a, may include the operative circuitry of
Integrated circuit 400 may also include memory interface circuitry 410. Memory interface circuitry 410 may control read/write access to external memory 414. Memory 414 may comprise one or more of the following types of memory: semiconductor firmware memory, programmable memory, non-volatile memory, read only memory, electrically programmable memory, random access memory, flash memory (e.g., SRAM), dynamic random access memory (e.g., DRAM), magnetic disk memory, and/or optical disk memory. Either additionally or alternatively, memory 202 may comprise other and/or later-developed types of computer-readable memory. Machine readable firmware program instructions may be stored in memory 414, and/or other memory. These instructions may be accessed and executed by the integrated circuit 400. When executed by the integrated circuit 400, these instructions may result in the integrated circuit 400 performing the operations described herein as being performed by the integrated circuit, for example, operations described above with reference to
In addition to moving data from one or more data registers 106 into program memory 102, control circuitry 150 of this embodiment may be configured to read move data stored in memory 414 into the program memory 102, in a manner described above. Also, during a data read operation, control circuitry 150 may read data from the program memory 102 and write the data into memory 414.
As used in any embodiment described herein, “circuitry” may comprise, for example, singly or in any combination, hardwired circuitry, programmable circuitry, state machine circuitry, and/or firmware that stores instructions executed by programmable circuitry. It should be understood at the outset that any of the operative components described in any embodiment herein may also be implemented in software, firmware, hardwired circuitry and/or any combination thereof. A “network device”, as used in any embodiment herein, may comprise for example, a switch, a router, a hub, and/or a computer node element configured to process data packets, a plurality of line cards connected to a switch fabric (e.g., a system of network/telecommunications enabled devices) and/or other similar device. Also, the term “cycle” as used herein may refer to clock cycles. Alternatively, a “cycle” may be defined as a period of time over which a discrete operation occurs which may take one or more clock cycles (and/or fraction of a clock cycle) to complete.
Additionally, the operative circuitry of
Accordingly, at least one embodiment described herein may provide an integrated circuit (IC) that includes a program memory for storing instructions and at least one data register for storing data. The IC may be configured to perform one or more fetch operations to retrieve one or more instructions from the program memory. The IC may be further configured to schedule a write instruction to write data from said at least one data register into the program memory, and to steal one or more cycles from one or more fetch operations to move the data in at least one data register into the program memory.
The terms and expressions which have been employed herein are used as terms of description and not of limitation, and there is no intention, in the use of such terms and expressions, of excluding any equivalents of the features shown and described (or portions thereof), and it is recognized that various modifications are possible within the scope of the claims. Accordingly, the claims are intended to cover all such equivalents.