Embodiments described herein are related to processors, and more particularly to context switching in processors.
Processors are designed to an instruction set architecture (ISA). The ISA defines a set of instructions, including the behavior of each instruction (i.e. the operands of the instruction, the operation(s) performed, the result, any exception conditions and how they are reported, etc.), the coding of the instruction in memory (i.e. so that the processor can distinguish between the instructions defined in the ISA for execution), and various other processor state that can affect the instruction execution (e.g. various modes, configuration register values, etc.). The ISA defines a set of processor state. The processor state can have a predefined set of values at reset (i.e., the values taken on by the various resources in the processor state at reset can be defined in the ISA), although some state may be considered undefined at reset (e.g. the reset may not force a particular value into that resource). Undefined state can be initialized though instruction execution. After the execution of one or more instructions defined in the ISA, generally the processor state has been modified to reflect the result of the one or more instructions. In some cases, an exception condition can result in undefined state or unpredictable state, as defined in the ISA. The unpredictable/undefined state can be reinitialized via further instruction execution. The ISA can serve as the interface between software (programmed using the instructions in the ISA) and processor hardware (which implements the ISA). Software written to the ISA can be executed correctly on various different implementations of the ISA.
The architected state of the processor is included in a context of the processor. The context at a given point in the execution of a program is the result of executing the instructions in the program prior to that point. A process is an instance of a program, and can have one or more threads of execution according to the program's design. If a process/thread is interrupted on the processor to execute another process/thread, the context can be saved to memory so that the process/thread can continue execution from the interrupted point, either on the same processor or another processor, by loading the context from memory to that processor.
The architected state includes a variety of registers that can be used to store operands and instruction execution results for instructions. In many ISAs, there are multiple sets of registers for different data types (e.g. integer, floating point, vector, etc.). Accordingly, the size of the context can be significant. The memory footprint (i.e. the amount of memory consumed) for saved contexts can be a significant portion of the available memory, especially for processors using a local memory that is separate from the main memory in a system. Additionally, reading and writing the context consumes power, which can be an issue in systems that operate (at least part of the time) from a finite energy supply such as a battery. Still further, the amount of time consumed by reading and writing contexts affects the performance of program execution in the processor. The performance impacts increase with the frequency of the context switching.
In an embodiment, a processor may include a register file including one or more sets of registers for one or more data types specified by the ISA implemented by the processor. The processor may have a processor mode in which the context is reduced, as compared to the full context. For example, for at least one of the data types, the registers included in the reduced context exclude one or more of the registers defined in the ISA for that data type. In an embodiment, one half or more of the registers for the data type may be excluded. When the processor is operating in a reduced context mode, the processor may detect instructions that use excluded registers, and may signal an exception for such instructions to prevent use of the excluded registers.
In an embodiment, the reduced context may reduce the memory footprint for processes by reducing the amount of memory consumed by the context. In an embodiment, the performance of context switches using the reduced context may increase, since the amount of data read and written is reduced. In an embodiment, power consumed by the context switches may also be reduced since the reading/writing of memory is reduced.
The following detailed description makes reference to the accompanying drawings, which are now briefly described.
While embodiments described in this disclosure may be susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include”, “including”, and “includes” mean including, but not limited to. As used herein, the terms “first,” “second,” etc. are used as labels for nouns that they precede, and do not imply any type of ordering (e.g., spatial, temporal, logical, etc.) unless specifically stated.
Within this disclosure, different entities (which may variously be referred to as “units,” “circuits,” other components, etc.) may be described or claimed as “configured” to perform one or more tasks or operations. This formulation—[entity] configured to [perform one or more tasks]—is used herein to refer to structure (i.e., something physical, such as an electronic circuit). More specifically, this formulation is used to indicate that this structure is arranged to perform the one or more tasks during operation. A structure can be said to be “configured to” perform some task even if the structure is not currently being operated. A “clock circuit configured to generate an output clock signal” is intended to cover, for example, a circuit that performs this function during operation, even if the circuit in question is not currently being used (e.g., power is not connected to it). Thus, an entity described or recited as “configured to” perform some task refers to something physical, such as a device, circuit, memory storing program instructions executable to implement the task, etc. This phrase is not used herein to refer to something intangible. In general, the circuitry that forms the structure corresponding to “configured to” may include hardware circuits. The hardware circuits may include any combination of combinatorial logic circuitry, clocked storage devices such as flops, registers, latches, etc., finite state machines, memory such as static random access memory or embedded dynamic random access memory, custom designed circuitry, analog circuitry, programmable logic arrays, etc. Similarly, various units/circuits/components may be described as performing a task or tasks, for convenience in the description. Such descriptions should be interpreted as including the phrase “configured to.”
The term “configured to” is not intended to mean “configurable to.” An unprogrammed FPGA, for example, would not be considered to be “configured to” perform some specific function, although it may be “configurable to” perform that function. After appropriate programming, the FPGA may then be configured to perform that function.
Reciting in the appended claims a unit/circuit/component or other structure that is configured to perform one or more tasks is expressly intended not to invoke 35 U.S.C. § 112(f) interpretation for that claim element. Accordingly, none of the claims in this application as filed are intended to be interpreted as having means-plus-function elements. Should Applicant wish to invoke Section 112(f) during prosecution, it will recite claim elements using the “means for” [performing a function] construct.
In an embodiment, hardware circuits in accordance with this disclosure may be implemented by coding the description of the circuit in a hardware description language (HDL) such as Verilog or VHDL. The HDL description may be synthesized against a library of cells designed for a given integrated circuit fabrication technology, and may be modified for timing, power, and other reasons to result in a final design database that may be transmitted to a foundry to generate masks and ultimately produce the integrated circuit. Some hardware circuits or portions thereof may also be custom-designed in a schematic editor and captured into the integrated circuit design along with synthesized circuitry. The integrated circuits may include transistors and may further include other circuit elements (e.g. passive elements such as capacitors, resistors, inductors, etc.) and interconnect between the transistors and circuit elements. Some embodiments may implement multiple integrated circuits coupled together to implement the hardware circuits, and/or discrete elements may be used in some embodiments. Alternatively, the HDL design may be synthesized to a programmable logic array such as a field programmable gate array (FPGA) and may be implemented in the FPGA.
As used herein, the term “based on” or “dependent on” is used to describe one or more factors that affect a determination. This term does not foreclose the possibility that additional factors may affect the determination. That is, a determination may be solely based on specified factors or based on the specified factors as well as other, unspecified factors. Consider the phrase “determine A based on B.” This phrase specifies that B is a factor is used to determine A or that affects the determination of A. This phrase does not foreclose that the determination of A may also be based on some other factor, such as C. This phrase is also intended to cover an embodiment in which A is determined based solely on B. As used herein, the phrase “based on” is synonymous with the phrase “based at least in part on.”
This specification includes references to various embodiments, to indicate that the present disclosure is not intended to refer to one particular implementation, but rather a range of embodiments that fall within the spirit of the present disclosure, including the appended claims. Particular features, structures, or characteristics may be combined in any suitable manner consistent with this disclosure.
Turning now to
The context switch control circuit 24 is configured to perform context switch operations (or more briefly, context switches) for the processor 10. The context switch may generally include writing the context of the currently executing process to memory (e.g. a “context save area” in memory) and reading another context from memory (e.g. a “context restore area” for this particular context switch, although the context restore area may also be a context save area from a previous context switch, or a new context created for a process that is beginning its initial execution). A pointer (not shown in
Context switches may occur in response to certain types of external interrupts, for example. The interrupt may be sourced by a peripheral component that is requesting service. The interrupt may also be sourced by a timer circuit programmed by an operating system to switch out a process that has been executing for a period of time, in order to switch in another process to execute on the processor 10. Any mechanism for signaling context switches may be used. The portion of the context switch that stores the current context to memory may be referred to as a context save operation (or more briefly a context save); and the portion of the context switch that loads a different context from memory may be referred to as a context restore operation (or more briefly a context restore).
The context of the processor may generally include the processor state that reflects execution of instructions in a process. If the process is interrupted and the context is saved and later restored, the process may continue execution after the restoration at the next instruction in the process (i.e. the instruction following the instruction after which the process was interrupted) and the result of the process is the same as if the process executed from beginning to end without interruption. The context may include the architected state of the processor. The architected state is the state defined in the ISA implemented by the processor. The architected state may include various configuration/control registers. The configuration/control registers may include special purpose registers and/or model-specified registers that may be programmed with various processor modes. A processor mode may be any programmable configuration which affects the operation of the processor in a desired fashion. For example, a processor mode may impact the execution of all instructions, or all instructions that operate on a particular data type, or all instruction of another defined subset that includes multiple instructions. On the other hand, operands affect the operation (e.g. the result) of a single instruction, for example. The architected state may also include one or more sets of registers, each of a different data type defined in the ISA. A data type defines how the processor interprets the bits stored in the register. For example, an integer data type interprets the bits as an integer. A floating point data type interprets the bits as a floating point number (e.g. a sign bit, exponent bits, and mantissa bits). A vector data type interprets the bits as multiple independent numbers abutting each other in the register. The numbers may be various types, including integer and floating point, for example. The registers in the sets of registers may be used as operands for the instructions defined in the ISA (e.g. explicitly coded into the instruction, implicitly referenced by the instruction, etc.). Thus, an instruction that operates on a particular data type may use operands from the corresponding set of registers.
In an embodiment, the register 26 is one of the configuration/control registers and stores an indication of a processor mode (e.g. one or more reduced context modes and a full context mode). The register 26 may be programmed to indicate if the processor 10 is operating with a reduced context or a full context. For example, in the embodiment of
In one embodiment, the reduced context may include fewer, but more than zero, registers for at least one of the data types supported by the processor 10. For example, the number of registers may be the architected number (as specified in the ISA) divided by a power of 2 (e.g. ½ of the architected registers, ¼ of the architected registers, etc.). Any amount of reduced context may be supported, and multiple levels of reduced context may be supported, in various embodiments. In an embodiment, reduced context may be supported for more than one data type. The reduction may be the same for each data type, or different amounts of reduction may be supported for different data types, in various embodiments.
The reduced context allows for instructions using the data type to be executed, but reduces the registers that may be used for operands/results. If code being executed by the processor 10 uses the data type but not as frequently as other data types, the reduced context may provide sufficient state to support performance while also reducing the amount of data saved and restored for the contexts. In contrast, a full context may include all of the architected registers for each data type.
Since the reduced context excludes some architected registers, the values in those excluded registers are not saved or restored in the context save/restore operations. Thus, the values in the registers may be unpredictable and should not be used. Particularly, the data in the excluded registers may different between a context save for a given context and the ensuing context restore of the given context. In an embodiment, the processor 12 may generate an exception if the reduced context is enabled and one of the excluded registers is used in an instruction (e.g. as a source operand or a destination). In particular, the exception generation circuit 28 may receive the enable indication from the register 28, and may examine the operands used by each instruction. If an excluded register is used, the exception generation circuit 28 may signal the exception for the instruction. While the exception generation circuit 28 is illustrated in
The front end circuit 12 may generally include the hardware to fetch instructions, decode the instructions, perform register renaming (for embodiments that implement register renaming) and issue instruction operations for execution. In an embodiment, the front end circuit 12 may include an instruction cache configured to store instructions fetched (or prefetched) by the processor 10. The front end circuit 12 may include various branch prediction mechanisms to predict branch instructions (e.g. taken or not taken, and/or the branch target address for indirection branch instruction, call/return instructions, etc.). If the front end circuit 12 detects misspeculation or other exception conditions, the front end circuit 12 may flush the incorrectly fetched instructions and redirect fetch to the correct instructions (or may fetch instructions at the exception vector, in the case of an exception). The front end circuit 12 may indicate the exception or redirect to the retire circuit 20, which may track the in-order sequence of instructions and ensure the correct retirement of the instructions when execution is complete and the exception conditions have been cleared. The retirement of an instruction may include committing the results of the instruction to architected state, and thus the instruction's effect on the processor state may be complete and any subsequent redirect or exception may not undo the effect.
In one embodiment, the exception conditions detected by the front end circuit 12 may include the use of a register by an instruction (as a source operand or a destination for results) if the register is not included in the reduced context and the reduced context is enabled in the register 26. As mentioned previously, the exception generation circuit 28 may detect this exception. In other embodiments the execution circuits 22A-22D may detect the exception (and thus the exception generation circuit 28, or multiple instances of the circuit 28, may be included in the execution circuits 22A-22D). The execution circuits 22A-22D may also detect other exceptions/redirects (e.g. branch mispredictions, exceptions on load/store operations, etc.), which the execution circuits 22A-22D may report to the retire circuit 20 and the front end circuit 12.
The front end circuit 12 may decode the instructions. In an embodiment, the front end circuit 12 may decode each instruction into one or more instruction operations. Generally, an instruction operation may be an operation that the execution circuits 22A-22D are designed to perform. In some embodiments, a given instruction may be decoded into one or more instruction operations, depending on the complexity of the instruction. Particularly complex instructions may be microcoded, in some embodiments. In such embodiments, the microcode routine for the instruction may be coded in instruction operations. In other embodiments, each instruction in the instruction set architecture implemented by the processor 10 may be decoded into a single instruction operation, and thus the instruction operation may be essentially synonymous with instruction (although it may be modified in form by the decoder). The term “instruction operation” may be more briefly referred to herein as “op.”
The architected registers determined, by the decoders, to be referenced by a given op may be mapped to physical registers via register renaming. That is, there may be more physical registers of a given data type than the number of architected registers defined in the ISA for the given data type, and the results of speculative instructions may be written to the register files 18A-18B speculatively. A current speculative copy of the mapping of architected registers to physical registers may be represented in the speculative register map 14. As ops that update registers have those registers renamed, the speculative register map 14 may be updated to indicate the mappings assigned by the renamer. Additionally, source register for each op may be renamed in the op by reading the speculative register map for each architected source register. Ops that are renamed in parallel may override the speculative register map 14 if an older instruction (in program order) that is being renamed in parallel writes a register that is a source of a younger instruction. The architected register map 16, on the other hand, may store the mapping of physical registers to architected registers based on the most recently retired instruction. Accordingly, as ops are retired, the architected register map 16 may be updated to reflect the destination registers that have been written by the retired ops, associating the physical register written by the ops with the architected register. Accordingly, when an exception or other interrupt occurs, the ops prior to the op on which the exception/interrupt is taken (and the op on which the interrupt is taken, for interrupts and some exceptions) may be retired. The architected register map 16 at that point may indicate the current architected state of the processor 10 for the registers. The exception/interrupt may be taken and the architected register map 16 may be copied to the speculative register map 14.
Additionally, in the case of a context switch, the ops up to and include the op on which the context switch occurs may be retired. The architected register map 16 at that point may indicate which physical registers in the register files 18A-18B store the architected state of the processor 10. The context switch control circuit 24 may use the architected register map to read the corresponding physical registers for each architected register in the full context or reduced context. In an embodiment, the context restore operation may also write the same physical registers with restored context. In another embodiment, the rename circuit in the front end circuit 12 may assign different physical registers to the restored context. In an embodiment, the assignment of different physical registers may allow the context save and restore operations to occur in parallel, and execution in the restored context may even begin prior to the completion of the context save operation. For example, if the rename circuit assigns physical registers from a free list, the physical register storing the context being saved may not be added to the free list until the values are stored to the context save area in memory.
The ops may be issued by the front end circuit 12 for execution. In an embodiment, the front end circuit 12 may include a centralized scheduler that determines when each op has its dependencies satisfied, and may schedule the op at any point after the dependencies are satisfied. The dependencies may be satisfied if the source operands are available in the register files 18A-18B or if the source operands will be available for forwarding to the op prior to the op reaching the execution circuits 22A-22D. Alternatively, there may be reservation stations for each execution circuit 22A-22D, either before the register files 18A-18B in the pipeline or after the register files 18A-18B.
As mentioned above, the register files 18A-18B may include physical registers for various data types. For example, the register file 18A may include integer physical registers, while the register file 18B may include floating point physical registers or vector physical registers. Any set of data types may be supported in various embodiments, based on the ISA implemented by the processor 10. In embodiments that implement register renaming, the physical registers may be the rename registers and the maps 14 and 16 may map the architected registers to the physical registers. In other embodiments, the processor 10 may use a reorder buffer to store speculative results and the architected registers may have a one-to-one, fixed mapping to registers in the register files 18A-18B. In still other embodiments, the processor 10 may employ in-order execution and the architected registers may have a one-to-one, fixed mapping to registers in the register files 18A-18B. In such embodiments, the context switch control circuit need not consult a register map to read the context from the register files 18A-18B and write the context to the register files 18A-18B. In an embodiment, the register files 18A-18B may be implemented as independent memory arrays or other storage devices (e.g. registers, latches, flip-flops, etc.). Alternatively, one or more register files 18A-18B may be implemented as one memory array or other storage devices.
The execution circuits 22A-22D may each include circuitry to execute one or more ops. The execution circuits 22A-22D may be arranged by data type. For example, the execution circuits 22A-22B may be integer execution circuits; the execution circuits 22C-22D may be floating point execution circuits, other execution circuits (not shown) may be vector execution circuit; etc. The number of execution circuits may differ for different data types. The execution circuits may be symmetrical (e.g. each execution circuit of a given data type may be configured to execute the same set of ops) or asymmetrical (e.g. different execution circuits may be configured to execute different subsets of ops that operate on the data type).
The retire circuit 20 may manage the in-order retirement of instructions/ops, for embodiments that implement out-of-order execution. The retire circuit 20 may ensure that ops prior to an exception/interrupt are completed and retired prior to the interrupt/exception being taken (and may also ensure that no subsequent ops are retired). Similarly, the retire circuit 20 may ensure that the instruction/ops prior to a context switch have retired prior to performing the context switch (and may also ensure that no subsequent ops are retired). In an embodiment, the retire circuit 20 may implement a reorder buffer-like structure to update the architected register map 16 as instructions/ops are completed and retired. In-order embodiments need not include a retire circuit 20.
In the embodiment shown, the full processor context 30 includes areas storing the values from the architected registers of each data type (e.g. data types 1 to N in
The reduced processor context 32 includes areas for architected registers of each data type, and the other architected state. However, ½ of the registers for data type 1 are included (e.g. the other half of the registers are excluded). For example, if there are M registers of data type 1, the register numbers 0 to M/2−1 may be included, and register numbers M/2 to M−1 may be excluded. In other embodiments, the register numbers 0 to M/2−1 may be excluded and M/2 to M−1 may be included, the odd-numbered registers may be excluded and the even-numbered registers may be included, or the odd-numbered registers may be included and the even-numbered registers may be excluded. Any mechanism for identifying which registers are included or excluded may be used.
As can be seen visually in
In various embodiments, more than one data type may have reduced context, and/or different data types may be reduced by different amounts. The reduced context 32, for example, reduces that context for data type 2 to ¼ of the full context for that data type, and data type N by ½. Any amount of reduction for any number of data types may be supported. The determination of which data types to reduce, and the size of the reduction, may be based on the frequency of use of the data types in expected workloads, the expected footprint reduction for the context (which may affect the amount of time required to save/restore the context, power expended saving and, restoring context, the memory consumed for saved contexts, etc.), etc.
The context switch control circuit 24 may select a data type for which to save the registers (block 30). If the reduced context is enabled (via register 26) (decision block 32, “yes” leg), the context switch control circuit 24 may read the reduced register set, excluding the registers that are not included in the reduced context, and may write the values from the reduced register set to the context save area (block 34). If the reduced context is not enabled (decision block 32, “no” leg), or if the data type is one for which the reduced context and the full context are the same, the context switch control unit 26 may read all the architected registers for that data type and write the values to the context save area (block 36). If there are additional data types to save (decision block 38, “yes” leg), the context switch control circuit 24 may repeat blocks 30, 32, 34, and 36 for the next data type. Thus, for an ISA that specifies N data types, blocks 30, 32, 34, and 36 may be repeated N times. In some embodiments, data types may be processed in parallel. Once the data types have been processed (decision block 38, “no” leg), the context switch control circuit 24 may read the other architected state (e.g. configuration/control registers that are part of the context) and write the values to the context save area (block 40).
The context switch control circuit 24 may also perform the context restore of the new context. The context pointer in the context switch control circuit may be changed to point to the context restore area storing the new context (e.g. specified by a pointer in an ISA-dependent fashion as part of the context switch). The context restore operation may include selecting each data type (block 42), determining if the reduced context is enabled (decision block 44), reading the values for the reduced register set from the context save area if the reduced context is enabled and writing the reduced register set in the register file 18A-18B (block 46), or restoring all the architected registers if the reduced context is not enabled or the reduced context is the same as the full context for the data type (block 48), and repeating blocks 42, 44, 46, and 48 for each data type (e.g. N times—decision block 50, which may be performed in parallel in other embodiments), followed by reading the other architected state from the context restore area and storing in to the appropriate registers (block 52).
In the case of the context restore, the determination of whether or not the reduced context is enabled (decision block 44) may be based on the contents of the register 26 in the new context. That is, the context switch control circuit 24 may be configured to read the value of the register 26 from the new context prior to beginning the restore process (e.g. prior to block 40). Alternatively, the reduced context enable/disable (or selection from multiple forms of reduced context, in some embodiments) may be considered to be a relatively static choice programmed into the processor 10 during initialization and remaining the same across contexts.
If the reduced context is not enabled in the processor 10 (decision block 60, “no” leg), the processor 10 may check for any other exceptions, if any other exceptions are defined in the ISA for the instruction (block 62). If an exception is detected (decision block 64, “yes” leg), the processor 10 may report the exception (block 66). If an exception is not detected (decision block 64, “no” leg), the processor 10 may execute the instruction (e.g. one or more ops representing the instruction) and subsequently retire the instruction assuming no preceding instructions case a redirect or exception (block 68).
On the other hand, if the reduced context is enabled (decision block 60, “yes” leg), the processor may check the register operands of the instruction to determine if any operand (source or destination) is outside the range of registers that are useable in the reduced context mode (decision block 70). If so (decision block 70, “yes” leg), the processor 10 may report the exception (block 66). If not (decision block 70, “no” leg), the processor 10 may check for any other exceptions and proceed as described above (blocks 62, 64, 66, and 68).
The memory controller 102 may generally include the circuitry for receiving memory operations from the other components of the SOC 90 and for accessing the memory 92 to complete the memory operations. The memory controller 102 may be configured to access any type of memory 92. For example, the memory 92 may be static random access memory (SRAM), dynamic RAM (DRAM) such as synchronous DRAM (SDRAM) including double data rate (DDR, DDR2, DDR3, DDR4, etc.) DRAM. Low power/mobile versions of the DDR DRAM may be supported (e.g. LPDDR, mDDR, etc.). The memory controller 102 may include queues for memory operations, for ordering (and potentially reordering) the operations and presenting the operations to the memory 92. The memory controller 102 may further include data buffers to store write data awaiting write to memory and read data awaiting return to the source of the memory operation. In some embodiments, the memory controller 102 may include a memory cache to store recently accessed memory data. In SOC implementations, for example, the memory cache may reduce power consumption in the SOC by avoiding reaccess of data from the memory 92 if it is expected to be accessed again soon. In some cases, the memory cache may also be referred to as a system cache, as opposed to private caches such as the shared cache or caches in the processors, which serve only certain components. Additionally, in some embodiments, a system cache need not be located within the memory controller 102.
The CPU cluster 88 may be configured to store CPU contexts in the memory 92 (e.g. the contexts 84 shown in
The workload of the processors 10A-10B may be characterized as having more frequent context switches than the workload of the CPU processors in the cluster 88. In some cases, the context switches may be much more frequent (e.g. one or more orders of magnitude more frequent). Additionally, the workload of processors 10A-10B may also be characterized by infrequent, but non-zero, use of one or more data types specified in the ISA. For example, in an embodiment, the workload may include infrequent, but non-zero use of vector registers. Accordingly, reducing the context saved and restored in the processors 10A-10B may be significant in terms of improved performance, reduced power consumption, and memory footprint. Improving performance is generally useful for any workload. Reducing power consumption may be desirable in SOCs that will be used in mobile devices or other devices that may operate from a limited power supply such as a battery. Additionally, reducing power consumption may reduce heat generation, which may be helpful in thermally-constrained systems. The size of the local memories 100A-100B may be limited, e.g. compared to the memory 92, and storage in the local memories 100A-100B may be used for other data besides the contexts 82A-82B, so reducing the context memory footprint may improve performance as well since more local memory space may be available for process data other than context save data.
The peripherals 98A-98B may be any set of additional hardware functionality included in the SOC 90. For example, the peripherals 98A-98B may include video peripherals such as an image signal processor configured to process image capture data from a camera or other image sensor, display controllers configured to display video data on one or more display devices, graphics processing units (GPUs), video encoder/decoders, scalers, rotators, blenders, etc. The peripherals may include audio peripherals such as microphones, speakers, interfaces to microphones and speakers, audio processors, digital signal processors, mixers, etc. The peripherals may include interface controllers for various interfaces external to the SOC 90 (e.g. the peripheral 98B) including interfaces such as Universal Serial Bus (USB), peripheral component interconnect (PCI) including PCI Express (PCIe), serial and parallel ports, etc. The peripherals may include networking peripherals such as media access controllers (MACs). Any set of hardware may be included.
The communication fabric 86 may be any communication interconnect and protocol for communicating among the components of the SOC 90. The communication fabric 86 may be bus-based, including shared bus configurations, cross bar configurations, and hierarchical buses with bridges. The communication fabric 86 may also be packet-based, and may be hierarchical with bridges, cross bar, point-to-point, or other interconnects.
The SOC PMGR 96 may be configured to control the supply voltage magnitudes requested from the PMU in the system. There may be multiple supply voltages generated by the PMU for the SOC 90. For example, the a voltage may be generated for the CPU cluster 88, and another voltage may be generated for other components in the SOC 90. In an embodiment, the other voltage may serve the memory controller 102, the peripherals 98A-98B, the SOC PMGR 96, and the other components of the SOC 90 and power gating may be employed based on power domains. There may be multiple supply voltages for the rest of the SOC 90, in some embodiments. In some embodiments, there may also be a memory supply voltage for various memory arrays in the CPU cluster 88 and/or the SOC 90. The memory supply voltage may be used with the voltage supplied to the logic circuitry, which may have a lower voltage magnitude than that required to ensure robust memory operation.
It is noted that the number of components of the SOC 90 may vary from embodiment to embodiment. There may be more or fewer of each component/subcomponent than the number shown in
Turning next to
The peripherals 154 may include any desired circuitry, depending on the type of system 150. For example, in one embodiment, the system 150 may be a computing device (e.g., personal computer, laptop computer, etc.), a mobile device (e.g., personal digital assistant (PDA), smart phone, tablet, etc.), or an application specific computing device. In various embodiments of the system 150, the peripherals 154 may include devices for various types of wireless communication, such as wifi, Bluetooth, cellular, global positioning system, etc. The peripherals 154 may also include additional storage, including RAM storage, solid state storage, or disk storage. The peripherals 154 may include user interface devices such as a display screen, including touch display screens or multitouch display screens, keyboard or other input devices, microphones, speakers, etc. In other embodiments, the system 150 may be any type of computing system (e.g. desktop personal computer, laptop, workstation, net top etc.).
The external memory 158 may include any type of memory. For example, the external memory 158 may be SRAM, dynamic RAM (DRAM) such as synchronous DRAM (SDRAM), double data rate (DDR, DDR2, DDR3, etc.) SDRAM, RAMBUS DRAM, low power versions of the DDR DRAM (e.g. LPDDR, mDDR, etc.), etc. The external memory 158 may include one or more memory modules to which the memory devices are mounted, such as single inline memory modules (SIMMs), dual inline memory modules (DIMMs), etc. Alternatively, the external memory 158 may include one or more memory devices that are mounted on the IC 152 in a chip-on-chip or package-on-package implementation. The memory 158 may include the memory 92 shown in
Generally, the electronic description 162 of the IC 152 stored on the computer accessible storage medium 160 may be a database which can be read by a program and used, directly or indirectly, to fabricate the hardware comprising the IC 152. For example, the description may be a behavioral-level description or register-transfer level (RTL) description of the hardware functionality in a high level design language (HDL) such as Verilog or VHDL. The description may be read by a synthesis tool which may synthesize the description to produce a netlist comprising a list of gates from a synthesis library. The netlist comprises a set of gates which also represent the functionality of the hardware comprising the IC 152. The netlist may then be placed and routed to produce a data set describing geometric shapes to be applied to masks. The masks may then be used in various semiconductor fabrication steps to produce a semiconductor circuit or circuits corresponding to the IC 152. Alternatively, the description 162 on the computer accessible storage medium 300 may be the netlist (with or without the synthesis library) or the data set, as desired.
While the computer accessible storage medium 160 stores a description 162 of the IC 152, other embodiments may store a description 162 of any portion of the IC 152, as desired (e.g. the processor 10, as mentioned above).
Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.