This disclosure pertains to computing system, and in particular (but not exclusively) to module low latency communication.
As electronic apparatuses become more complex and ubiquitous in the everyday lives of users, more and more diverse requirements are placed upon them. To satisfy many of these requirements, many electronic apparatuses comprise many different devices, such as a CPU, a communication device, a graphics accelerator, etc. In many circumstances, there may be a large amount of communication between these devices. Furthermore, many users have high expectations regarding apparatus performance. Users are becoming less tolerant of waiting for operations to be performed by their apparatuses. In addition, many apparatuses are performing increasingly complex and burdensome tasks that may involve a large amount of inter-device communication. Therefore, there may be some communication between these devices that would benefit from rapid communication.
In the following description, numerous specific details are set forth, such as examples of specific types of processors and system configurations, specific hardware structures, specific architectural and micro architectural details, specific register configurations, specific instruction types, specific system components, specific measurements/heights, specific processor pipeline stages and operation etc. in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that these specific details need not be employed to practice the present invention. In other instances, well known components or methods, such as specific and alternative processor architectures, specific logic circuits/code for described algorithms, specific firmware code, specific interconnect operation, specific logic configurations, specific manufacturing techniques and materials, specific compiler implementations, specific expression of algorithms in code, specific power down and gating techniques/logic and other specific operational details of computer system haven't been described in detail in order to avoid unnecessarily obscuring the present invention.
Although the following embodiments may be described with reference to energy conservation and energy efficiency in specific integrated circuits, such as in computing platforms or microprocessors, other embodiments are applicable to other types of integrated circuits and logic devices. Similar techniques and teachings of embodiments described herein may be applied to other types of circuits or semiconductor devices that may also benefit from better energy efficiency and energy conservation. For example, the disclosed embodiments are not limited to desktop computer systems or Ultrabooks™. And may be also used in other devices, such as handheld devices, tablets, other thin notebooks, systems on a chip (SOC) devices, and embedded applications. Some examples of handheld devices include cellular phones, Internet protocol devices, digital cameras, personal digital assistants (PDAs), and handheld PCs. Embedded applications typically include a microcontroller, a digital signal processor (DSP), a system on a chip, network computers (NetPC), set-top boxes, network hubs, wide area network (WAN) switches, or any other system that can perform the functions and operations taught below. Moreover, the apparatus', methods, and systems described herein are not limited to physical computing devices, but may also relate to software optimizations for energy conservation and efficiency. As will become readily apparent in the description below, the embodiments of methods, apparatus', and systems described herein (whether in reference to hardware, firmware, software, or a combination thereof) are vital to a ‘green technology’ future balanced with performance considerations.
As computing systems are advancing, the components therein are becoming more complex. As a result, the interconnect architecture to couple and communicate between the components is also increasing in complexity to ensure bandwidth requirements are met for optimal component operation. Furthermore, different market segments demand different aspects of interconnect architectures to suit the market's needs. For example, servers require higher performance, while the mobile ecosystem is sometimes able to sacrifice overall performance for power savings. Yet, it's a singular purpose of most fabrics to provide highest possible performance with maximum power saving. Below, a number of interconnects are discussed, which would potentially benefit from aspects of the invention described herein.
Referring to
In one embodiment, a processing element refers to hardware or logic to support a software thread. Examples of hardware processing elements include: a thread unit, a thread slot, a thread, a process unit, a context, a context unit, a logical processor, a hardware thread, a core, and/or any other element, which is capable of holding a state for a processor, such as an execution state or architectural state. In other words, a processing element, in one embodiment, refers to any hardware capable of being independently associated with code, such as a software thread, operating system, application, or other code. A physical processor (or processor socket) typically refers to an integrated circuit, which potentially includes any number of other processing elements, such as cores or hardware threads.
A core often refers to logic located on an integrated circuit capable of maintaining an independent architectural state, wherein each independently maintained architectural state is associated with at least some dedicated execution resources. In contrast to cores, a hardware thread typically refers to any logic located on an integrated circuit capable of maintaining an independent architectural state, wherein the independently maintained architectural states share access to execution resources. As can be seen, when certain resources are shared and others are dedicated to an architectural state, the line between the nomenclature of a hardware thread and core overlaps. Yet often, a core and a hardware thread are viewed by an operating system as individual logical processors, where the operating system is able to individually schedule operations on each logical processor.
Physical processor 100, as illustrated in
As depicted, core 101 includes two hardware threads 101a and 101b, which may also be referred to as hardware thread slots 101a and 101b. Therefore, software entities, such as an operating system, in one embodiment potentially view processor 100 as four separate processors, i.e., four logical processors or processing elements capable of executing four software threads concurrently. As alluded to above, a first thread is associated with architecture state registers 101a, a second thread is associated with architecture state registers 101b, a third thread may be associated with architecture state registers 102a, and a fourth thread may be associated with architecture state registers 102b. Here, each of the architecture state registers (101a, 101b, 102a, and 102b) may be referred to as processing elements, thread slots, or thread units, as described above. As illustrated, architecture state registers 101a are replicated in architecture state registers 101b, so individual architecture states/contexts are capable of being stored for logical processor 101a and logical processor 101b. In core 101, other smaller resources, such as instruction pointers and renaming logic in allocator and renamer block 130 may also be replicated for threads 101a and 101b. Some resources, such as re-order buffers in reorder/retirement unit 135, ILTB 120, load/store buffers, and queues may be shared through partitioning. Other resources, such as general purpose internal registers, page-table base register(s), low-level data-cache and data-TLB 115, execution unit(s) 140, and portions of out-of-order unit 135 are potentially fully shared.
Processor 100 often includes other resources, which may be fully shared, shared through partitioning, or dedicated by/to processing elements. In
Core 101 further includes decode module 125 coupled to fetch unit 120 to decode fetched elements. Fetch logic, in one embodiment, includes individual sequencers associated with thread slots 101a, 101b, respectively. Usually core 101 is associated with a first ISA, which defines/specifies instructions executable on processor 100. Often machine code instructions that are part of the first ISA include a portion of the instruction (referred to as an opcode), which references/specifies an instruction or operation to be performed. Decode logic 125 includes circuitry that recognizes these instructions from their opcodes and passes the decoded instructions on in the pipeline for processing as defined by the first ISA. For example, as discussed in more detail below decoders 125, in one embodiment, include logic designed or adapted to recognize specific instructions, such as transactional instruction. As a result of the recognition by decoders 125, the architecture or core 101 takes specific, predefined actions to perform tasks associated with the appropriate instruction. It is important to note that any of the tasks, blocks, operations, and methods described herein may be performed in response to a single or multiple instructions; some of which may be new or old instructions. Note decoders 126, in one embodiment, recognize the same ISA (or a subset thereof). Alternatively, in a heterogeneous core environment, decoders 126 recognize a second ISA (either a subset of the first ISA or a distinct ISA).
In one example, allocator and renamer block 130 includes an allocator to reserve resources, such as register files to store instruction processing results. However, threads 101a and 101b are potentially capable of out-of-order execution, where allocator and renamer block 130 also reserves other resources, such as reorder buffers to track instruction results. Unit 130 may also include a register renamer to rename program/instruction reference registers to other registers internal to processor 100. Reorder/retirement unit 135 includes components, such as the reorder buffers mentioned above, load buffers, and store buffers, to support out-of-order execution and later in-order retirement of instructions executed out-of-order.
Scheduler and execution unit(s) block 140, in one embodiment, includes a scheduler unit to schedule instructions/operation on execution units. For example, a floating point instruction is scheduled on a port of an execution unit that has an available floating point execution unit. Register files associated with the execution units are also included to store information instruction processing results. Exemplary execution units include a floating point execution unit, an integer execution unit, a jump execution unit, a load execution unit, a store execution unit, and other known execution units.
Lower level data cache and data translation buffer (D-TLB) 150 are coupled to execution unit(s) 140. The data cache is to store recently used/operated on elements, such as data operands, which are potentially held in memory coherency states. The D-TLB is to store recent virtual/linear to physical address translations. As a specific example, a processor may include a page table structure to break physical memory into a plurality of virtual pages.
Here, cores 101 and 102 share access to higher-level or further-out cache, such as a second level cache associated with on-chip interface 110. Note that higher-level or further-out refers to cache levels increasing or getting further way from the execution unit(s). In one embodiment, higher-level cache is a last-level data cache—last cache in the memory hierarchy on processor 100—such as a second or third level data cache. However, higher level cache is not so limited, as it may be associated with or include an instruction cache. A trace cache—a type of instruction cache—instead may be coupled after decoder 125 to store recently decoded traces. Here, an instruction potentially refers to a macro-instruction (i.e. a general instruction recognized by the decoders), which may decode into a number of micro-instructions (micro-operations).
In the depicted configuration, processor 100 also includes on-chip interface module 110. Historically, a memory controller, which is described in more detail below, has been included in a computing system external to processor 100. In this scenario, on-chip interface 11 is to communicate with devices external to processor 100, such as system memory 175, a chipset (often including a memory controller hub to connect to memory 175 and an I/O controller hub to connect peripheral devices), a memory controller hub, a northbridge, or other integrated circuit. And in this scenario, bus 105 may include any known interconnect, such as multi-drop bus, a point-to-point interconnect, a serial interconnect, a parallel bus, a coherent (e.g. cache coherent) bus, a layered protocol architecture, a differential bus, and a GTL bus.
Memory 175 may be dedicated to processor 100 or shared with other devices in a system. Common examples of types of memory 175 include DRAM, SRAM, non-volatile memory (NV memory), and other known storage devices. Note that device 180 may include a graphic accelerator, processor or card coupled to a memory controller hub, data storage coupled to an I/O controller hub, a wireless transceiver, a flash device, an audio controller, a network controller, or other known device.
Recently however, as more logic and devices are being integrated on a single die, such as SOC, each of these devices may be incorporated on processor 100. For example in one embodiment, a memory controller hub is on the same package and/or die with processor 100. Here, a portion of the core (an on-core portion) 110 includes one or more controller(s) for interfacing with other devices such as memory 175 or a graphics device 180. The configuration including an interconnect and controllers for interfacing with such devices is often referred to as an on-core (or un-core configuration). As an example, on-chip interface 110 includes a ring interconnect for on-chip communication and a high-speed serial point-to-point link 105 for off-chip communication. Yet, in the SOC environment, even more devices, such as the network interface, co-processors, memory 175, graphics processor 180, and any other known computer devices/interface may be integrated on a single die or integrated circuit to provide small form factor with high functionality and low power consumption.
In one embodiment, processor 100 is capable of executing a compiler, optimization, and/or translator code 177 to compile, translate, and/or optimize application code 176 to support the apparatus and methods described herein or to interface therewith. A compiler often includes a program or set of programs to translate source text/code into target text/code. Usually, compilation of program/application code with a compiler is done in multiple phases and passes to transform hi-level programming language code into low-level machine or assembly language code. Yet, single pass compilers may still be utilized for simple compilation. A compiler may utilize any known compilation techniques and perform any known compiler operations, such as lexical analysis, preprocessing, parsing, semantic analysis, code generation, code transformation, and code optimization.
Larger compilers often include multiple phases, but most often these phases are included within two general phases: (1) a front-end, i.e. generally where syntactic processing, semantic processing, and some transformation/optimization may take place, and (2) a back-end, i.e. generally where analysis, transformations, optimizations, and code generation takes place. Some compilers refer to a middle, which illustrates the blurring of delineation between a front-end and back end of a compiler. As a result, reference to insertion, association, generation, or other operation of a compiler may take place in any of the aforementioned phases or passes, as well as any other known phases or passes of a compiler. As an illustrative example, a compiler potentially inserts operations, calls, functions, etc. in one or more phases of compilation, such as insertion of calls/operations in a front-end phase of compilation and then transformation of the calls/operations into lower-level code during a transformation phase. Note that during dynamic compilation, compiler code or dynamic optimization code may insert such operations/calls, as well as optimize the code for execution during runtime. As a specific illustrative example, binary code (already compiled code) may be dynamically optimized during runtime. Here, the program code may include the dynamic optimization code, the binary code, or a combination thereof.
Similar to a compiler, a translator, such as a binary translator, translates code either statically or dynamically to optimize and/or translate code. Therefore, reference to execution of code, application code, program code, or other software environment may refer to: (1) execution of a compiler program(s), optimization code optimizer, or translator either dynamically or statically, to compile program code, to maintain software structures, to perform other operations, to optimize code, or to translate code; (2) execution of main program code including operations/calls, such as application code that has been optimized/compiled; (3) execution of other program code, such as libraries, associated with the main program code to maintain software structures, to perform other software related operations, or to optimize code; or (4) a combination thereof.
In the example of
In circumstances where cache 204 and cache 214 relate to corresponding memory addresses, there are various cache coherence techniques that may be utilized to allow module 202 to sufficiently rely on the coherence of the data of cache 204, and for module 212 to sufficiently relay on coherence of the data of cache 214. For example, if module 202 writes data to a memory address that corresponds with a memory address related to cache 214, the coherence technique allows for operation of module 212 without referencing the data of the memory address as represented by the non-updated information of cache 214. For simplicity, caches 204 and 214 will be discussed generally as functional units, however, in at least one example embodiment, operations discussed pertaining to caches 204 and 214 may be attributable to a subpart of the cache, such as a cache controller, cache memory, and/or the like.
In at least one example embodiment, the components of
In at least one example embodiment, in circumstances where a cache relates to a memory address, the cache receives a snoop notification when another component performs a write to the memory address. In at least one example embodiment, snoop notification is a signal that provides an indication to the cache that the information stored at the memory location may have been written to, and that the information in the cache may no longer be a valid representation of the information stored at the memory address. For example, if cache 204 and cache 214 both relate to the same memory address, performance of a write by module 202 will cause a snoop notification to be sent to cache 214. Therefore, receipt of a snoop notification indicates a write to a memory address.
In at least one example embodiment, caching of a memory address is a prerequisite for receiving a snoop notification indicating a write to the memory address. For example, if a single module is reading from a memory address, a write to that memory address from the module will not cause a snoop notification. Management of the cache coherency technique may comprise monitoring which caches (i.e. which modules) are relying on a memory address. This reliance may be determined based on read and write activity. For example, if a module performs a read on the memory address, the management of the cache coherency technique may recognize the dependency of the cache on the memory address, for example by setting a shared bit associated with the memory address. Therefore, performing a write to the memory address causes enablement of a subsequent snoop notification associated with the next write performed to the memory address. Consequently, a module may cause enablement of a subsequent snoop notification by performing a read to the memory address.
As caching has become more prevalent, there have been many architectural and design advances that have resulted in high efficiency and low latency for caching and cache coherency mechanisms. For example, cache coherency signaling is often faster than communication by way of shared memory. Therefore, it may be desirable to utilize cache coherency techniques as a communication mechanism.
As described in
Knowledge that a specific module, or set of modules, is performing the write to a specific address that causes the snoop notification provides an inference that receipt of a snoop notification signifies a write by the specific module or set of modules. Furthermore, if a different specific module, or set of modules, is monitoring (for example has cached) the specific address, writing to the specific memory address provides an inference that writing to the specific memory address will cause the different specific module or set of modules to receive the snoop notification. Relating back to
The example of
In the example of
Even though the example of
In at least one example embodiment, memory 310 comprises address space associated with cache based messaging, such as memory address space 400 of
In at least one example embodiment, at least one of sender 302 or receiver 304 implement cache management signaling, absent cache memory, such that cache coherency information associated with address space 312 will be applied regarding the cache management signaling. For example, in absence of cache memory, the module may comprise a reading agent to perform reads to memory addresses within address space 312 and a monitoring agent to receive and act upon snoop notifications received in association with address space 312. For example, even though such a module has no cache to keep coherent with address space 312, the module may utilize a monitoring agent to receive snoop notification to be informed of snoop notifications that signify a write to a memory address comprised by address space 312. Furthermore, after receiving a snoop notification associated with a memory address, a receiver, such as receiver 304, may utilize a reading agent to enable receipt of a subsequent snoop notification associated with the memory address. For example, the reading agent may preclude exclusive ownership of the memory address by another component. For example the reading agent may perform a read to the memory address. In at least one example embodiment, the reading agent performs a read to the memory address under circumstances where receiver 304 has no regard for the information stored at the memory address. For example, receiver 304 may perform a read to the memory address without regard for the information retrieved by way to the read. Without limiting the claims in any way, at least one technical advantage associated with the receiver causing enablement of snoop notifications and receiving snoop notification is to allow the receiver to utilize the low latency benefits of the cache coherency system. For example, even if the receiver fails to include a cache, the receiver may be able to receive the low latency snoop notifications as a low latency communication mechanism. In this manner, a memory address may be associated with a cache from the perspective of the cache coherency mechanisms, even in the absence of actual cache memory.
In at least one example embodiment, address space 312 is allocated such that a memory address represents a massage. For example, a write to the memory address, without regard for the information written to the memory address, may represent a message to invoke an operation by receiver 304, such as a buffer flush. In at least one example embodiment, the memory address represents a message such that a write to the memory address by sender 302, such as shown in interaction 322, serves as a message to receiver 304 by way of a resulting snoop notification. Therefore, receiver 304 may interpret a snoop notification, itself, as a message based on the memory address associated with the snoop notification. In at least one example embodiment, receiver 304 determines that the snoop notification signifies receipt of a message based, at least in part, on the memory address. For example, receiver 304 may associate the memory address with the message. In such an example, a memory address may be associated with a message, and a different memory address may be associated with a different message, such that the receiver determines a snoop notification associated with the memory address to signify the message and determines a snoop notification associated with the different memory address to signify the different message. In at least one example embodiment, information indicating the association between a message and a memory address is referred to as message memory allocation information. Message memory allocation information may be based on predetermined information, such as information provided in a configuration file, determine at compile time, and/or the like. Message memory allocation information may be based on information received during operation of receiver 304, such as information received from sender 302. In at least one example embodiment, determination that a snoop notification signifies a message is based, at least in part, on correlation between the memory address and the message memory allocation information.
In at least one example embodiment, sender 302 sends message memory allocation information to receiver 304. For example, sender 302 may send message memory allocation information by way of MMCFG address space, input/output cycle communication, and/or the like. Receiver 304 may utilize the received message memory allocation information as, at least part of, a basis to determine that a snoop notification associated with a memory address signifies a particular message. In at least one example embodiment, sender 302 determines the message memory allocation information. For example, the message memory allocation information may be determined based on predetermined information, such as compile time information, a configuration file, and/or the like, or may be determined dynamically, such as by way of a request for allocation of address space 312. In at least one example embodiment, sender 302 causes allocation of memory 310 to address space 312. Allocation of memory 310 to address space 312 may relate to reserving address space 312 within memory 310 such that another program does not receive allocation of memory overlapping with address space 312.
In at least one example embodiment, the receiver may prepare for communication with the sender by way of causing enablement of receiving a subsequent snoop notification. For example, receiver 304 may preclude exclusive ownership of address space 312 by sender 302. In such an example, receiver 304 may perform reads to the memory addresses comprised by address space 312. Such reads may enable a subsequent write to a memory address comprised by address space 312 to cause a snoop notification associated with the memory address.
In at least one example embodiment, sender 302 may determine to send a message to receiver 304. Such determination may be caused by execution of software 306. For example software 306 may desire receiver 304 to perform an operation indicated by the message. In at least one example embodiment, determination to send the message causes sender to determine a memory address to trigger a snoop notification that signifies the message. For example, the sender may utilize message memory allocation information to determine which memory address is associated with the determined message, such that a write to the memory address will cause a snoop notification associated with the address to be sent to receiver 304. In at least one example embodiment, sender 302 performs a write to the memory address to cause the snoop notification to communicate the determined message. In this manner, the performance of the write, itself, may serve as sending the message. In at least one example embodiment, the information written to the memory address is not pertinent to the message. For example, the message may relate to a notification, a directive, and/or the like, that does not rely on any accompanying information, such as a message without any payload. Such a message may be utilized to invoke a known operation without conveying additional information to govern the operation. In at least one example embodiment, the determined message may have a message payload associated with the message. A message payload may relate to information that provides additional information regarding the message, such as a parameter. For example, the message may rely on a variable, a buffer, and/or the like. In such an example, the payload may comprise the variable, the buffer, and/or the like. In at least one example embodiment, sender 302 performs the write of the payload to the memory address cause the snoop notification to communicate the determined message.
Consequently, receiver 304 may perform an operation based, at least in part on the message conveyed by the snoop notification. The operation may pertain to any action that receiver 304 performs based on receipt of the message. For example, the operation may involve storing information, sending a signal to hardware, starting a set of operations, terminating a set of operations, and/or the like. In at least one example embodiment, operation is based on the message without regard for information stored in the memory address. For example, the message may relate to a notification, a directive, and/or the like, that does not rely on any accompanying information, such as a message without any payload. In at least one example embodiment, the message may have a message payload associated with the message. A message payload may relate to information that provides additional information regarding the message, such as a parameter. For example, the message may rely on a variable, a buffer, and/or the like. In such an example, the read to the memory address may provide the variable, the buffer, and/or the like.
In at least one example embodiment, it may be desirable to provide for operations associated with messages to be performed in a specific order. For example, sender 306 may perform a first write to serve as sending of a first message and perform a second write to serve as sending of a second message. In such an example, the sender may desire that the receiver performs an operation associated with the first message before performance of an operation associated with the second message. In at least one example embodiment, the sender may provide enablement of sequence preservation. In at least one example embodiment, the sender may provide message sequence information in the payload of a message. For instance, in the previously discussed example, the first message payload may comprise message sequence information indicating that the first message is associated with an ordering before the second message. For example, the first message payload may comprise a message sequence number that is lower that a message sequence number comprised by the second message payload. In at least one example embodiment, the sender may await acknowledgement of a message from the receiver before sending another message. In at least one example embodiment, the receiver may provide an acknowledgement by way of a message, a function call, and or the like. For example, a similar mechanism may be used to allow the sender to receive communication from the receiver. In at least one example embodiment, the sender enables receipt of a snoop notification associated with the memory address after performing a write to the memory address that signifies the sending of the message. In such an embodiment, after receiving the message, the receiver may perform a write to the memory address to serve as an acknowledgement to the received message. In such an embodiment, the sender may predicate sending of the other message on receipt of the snoop notification associated the acknowledgement.
In at least one example embodiment, a cache prefetch operation may cause a snoop notification that does not correspond to a write to the memory address from sender 302. For example, when sender 302 performs a write to a memory address, the write may cause a prefetch of information associated with a different memory address adjacent to the memory address. In such an example, the prefetch of the information at the different memory address may cause a snoop notification associated with the different memory address. It may be desirable to be able to avoid circumstances where the prefetch of the different memory address causes the sender to determine that the snoop notification associated with the prefetch indicates a message associated with the different memory address. Therefore, it may be desirable for the receiver to determine that the snoop notification was not caused by a write performed to the memory address. For example, the receiver may store the information associated with the memory address so that, upon receiving a later received snoop notification, the receiver may compare the stored information with information at the memory address after the later received snoop notification. In such circumstances, a lack of difference between the stored information and the information associated with the later received snoop notification is indicative of a snoop notification that was not caused by a write to the memory address. Under such circumstances, the receiver may determine that the snoop notification fails to signify receipt of a message.
Without limiting the scope of the claims in any way, at least one technical advantage associated with the communication represented by interactions 322 and 324 is a large reduction in latency over other forms of communication, such as the communication represented by interactions 332 and 334. For example, the communication represented by interactions 332 and 334 may relate to a latency time of 284 nanoseconds, and the communication represented by interactions 322 and 324 may relate to a latency time of 85 nanoseconds.
In at least one example embodiment, the memory allocation examples of
The example of
The example of
As described above, it may be desirable to avoid having a receiver determine that snoop notification caused by a cache prefetch signifies a message. In at least one example embodiment, the message memory allocation information designates memory addresses to be associated with messages such that there is only a single memory address associated with a message within a memory region that is the size of a prefetch page. The example of
At block 502, the apparatus receives a snoop notification indicating a write to a memory address associated with a cache. The receiving of the snoop notification may be similar as described regarding
At block 552, the apparatus determines to send a message. The determination to send the message may be similar as described regarding
As described in
At block 602, the apparatus receives message memory allocation information, similar as describer regarding
As described in
At block 652, the apparatus determines message memory allocation information, similar as described regarding
As described in
At block 672, the apparatus determines message memory allocation information, similar as described regarding block 652 of
As described in
At block 702, the apparatus receives a snoop notification indicating a write to a memory address associated with a cache, similar as described regarding block 502 of
In at least some circumstances, it may be desirable to perform enablement of subsequent snoop notifications prior to performing an operation based on a message. For example, the causation of enablement of the subsequent snoop notification may comprise performance of a read to the memory address. In such circumstances, the message may relate to a payload associated with information written to the memory address. Therefore, in such circumstances, it may be desirable to read to the memory address to serve the function of both, obtaining the message payload, and to cause enablement of the subsequent snoop notification.
At block 802, the apparatus receives a snoop notification indicating a write to a memory address associated with a cache, similar as described regarding block 502 of
As described in
At block 902, the apparatus receives a snoop notification indicating a write to a memory address associated with a cache, similar as described regarding block 502 of
As described in
At block 952, the apparatus determines to send a message, similar as described regarding block 552 of
As described in
At block 1002, the apparatus receives a snoop notification indicating a write to a memory address associated with a cache, similar as described regarding block 502 of
Note that the apparatus', methods', and systems described above may be implemented in any electronic device or system as aforementioned. As specific illustrations, the figures below provide exemplary systems for utilizing the invention as described herein. As the systems below are described in more detail, a number of different interconnects are disclosed, described, and revisited from the discussion above. And as is readily apparent, the advances described above may be applied to any of those interconnects, fabrics, or architectures.
Turning to
Embodiments are not limited to computer systems. Alternative embodiments of the present invention can be used in other devices such as handheld devices and embedded applications. Some examples of handheld devices include cellular phones, Internet Protocol devices, digital cameras, personal digital assistants (PDAs), and handheld PCs. Embedded applications can include a micro controller, a digital signal processor (DSP), system on a chip, network computers (NetPC), set-top boxes, network hubs, wide area network (WAN) switches, or any other system that can perform one or more instructions in accordance with at least one embodiment.
In this illustrated embodiment, processor 1102 includes one or more execution units 1108 to implement an algorithm that is to perform at least one instruction. One embodiment may be described in the context of a single processor desktop or server system, but alternative embodiments may be included in a multiprocessor system. System 1100 is an example of a ‘hub’ system architecture. The computer system 1100 includes a processor 1102 to process data signals. The processor 1102, as one illustrative example, includes a complex instruction set computer (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a processor implementing a combination of instruction sets, or any other processor device, such as a digital signal processor, for example. The processor 1102 is coupled to a processor bus 1110 that transmits data signals between the processor 1102 and other components in the system 1100. The elements of system 1100 (e.g. graphics accelerator 1112, memory controller hub 1116, memory 1120, I/O controller hub 1124, wireless transceiver 1126, Flash BIOS 1128, Network controller 1134, Audio controller 1136, Serial expansion port 1138, I/O controller 1140, etc.) perform their conventional functions that are well known to those familiar with the art.
In one embodiment, the processor 1102 includes a Level 1 (L1) internal cache memory 1104. Depending on the architecture, the processor 1102 may have a single internal cache or multiple levels of internal caches. Other embodiments include a combination of both internal and external caches depending on the particular implementation and needs. Register file 1106 is to store different types of data in various registers including integer registers, floating point registers, vector registers, banked registers, shadow registers, checkpoint registers, status registers, and instruction pointer register.
Execution unit 1108, including logic to perform integer and floating point operations, also resides in the processor 1102. The processor 1102, in one embodiment, includes a microcode (ucode) ROM to store microcode, which when executed, is to perform algorithms for certain macroinstructions or handle complex scenarios. Here, microcode is potentially updateable to handle logic bugs/fixes for processor 1102. For one embodiment, execution unit 1108 includes logic to handle a packed instruction set 1109. By including the packed instruction set 1109 in the instruction set of a general-purpose processor 1102, along with associated circuitry to execute the instructions, the operations used by many multimedia applications may be performed using packed data in a general-purpose processor 1102. Thus, many multimedia applications are accelerated and executed more efficiently by using the full width of a processor's data bus for performing operations on packed data. This potentially eliminates the need to transfer smaller units of data across the processor's data bus to perform one or more operations, one data element at a time.
Alternate embodiments of an execution unit 1108 may also be used in micro controllers, embedded processors, graphics devices, DSPs, and other types of logic circuits. System 1100 includes a memory 1120. Memory 1120 includes a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, flash memory device, or other memory device. Memory 1120 stores instructions and/or data represented by data signals that are to be executed by the processor 1102.
Note that any of the aforementioned features or aspects of the invention may be utilized on one or more interconnect illustrated in
Turning next to
Here, SOC 1200 includes 2 cores—1206 and 1207. Similar to the discussion above, cores 1206 and 1207 may conform to an Instruction Set Architecture, such as an Intel® Architecture Core™-based processor, an Advanced Micro Devices, Inc. (AMD) processor, a MIPS-based processor, an ARM-based processor design, or a customer thereof, as well as their licensees or adopters. Cores 1206 and 1207 are coupled to cache control 1208 that is associated with bus interface unit 1209 and L2 cache 1210 to communicate with other parts of system 1200. Interconnect 1210 includes an on-chip interconnect, such as an IOSF, AMBA, or other interconnect discussed above, which potentially implements one or more aspects of the described invention.
Interface 1210 provides communication channels to the other components, such as a Subscriber Identity Module (SIM) 1230 to interface with a SIM card, a boot rom 1235 to hold boot code for execution by cores 1206 and 1207 to initialize and boot SOC 1200, a SDRAM controller 1240 to interface with external memory (e.g. DRAM 1260), a flash controller 1245 to interface with non-volatile memory (e.g. Flash 1265), a peripheral control Q1650 (e.g. Serial Peripheral Interface) to interface with peripherals, video codecs 1220 and Video interface 1225 to display and receive input (e.g. touch enabled input), GPU 1215 to perform graphics related computations, etc. Any of these interfaces may incorporate aspects of the invention described herein.
In addition, the system illustrates peripherals for communication, such as a Bluetooth module 1270, 3G modem 1275, GPS 1285, and WiFi 1285. Note as stated above, a UE includes a radio for communication. As a result, these peripheral communication modules are not all required. However, in a UE some form a radio for external communication is to be included.
While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.
A design may go through various stages, from creation to simulation to fabrication. Data representing a design may represent the design in a number of manners. First, as is useful in simulations, the hardware may be represented using a hardware description language or another functional description language. Additionally, a circuit level model with logic and/or transistor gates may be produced at some stages of the design process. Furthermore, most designs, at some stage, reach a level of data representing the physical placement of various devices in the hardware model. In the case where conventional semiconductor fabrication techniques are used, the data representing the hardware model may be the data specifying the presence or absence of various features on different mask layers for masks used to produce the integrated circuit. In any representation of the design, the data may be stored in any form of a machine readable medium. A memory or a magnetic or optical storage such as a disc may be the machine readable medium to store information transmitted via optical or electrical wave modulated or otherwise generated to transmit such information. When an electrical carrier wave indicating or carrying the code or design is transmitted, to the extent that copying, buffering, or re-transmission of the electrical signal is performed, a new copy is made. Thus, a communication provider or a network provider may store on a tangible, machine-readable medium, at least temporarily, an article, such as information encoded into a carrier wave, embodying techniques of embodiments of the present invention.
A module as used herein refers to any combination of hardware, software, and/or firmware. As an example, a module includes hardware, such as a micro-controller, associated with a non-transitory medium to store code adapted to be executed by the micro-controller. Therefore, reference to a module, in one embodiment, refers to the hardware, which is specifically configured to recognize and/or execute the code to be held on a non-transitory medium. Furthermore, in another embodiment, use of a module refers to the non-transitory medium including the code, which is specifically adapted to be executed by the microcontroller to perform predetermined operations. And as can be inferred, in yet another embodiment, the term module (in this example) may refer to the combination of the microcontroller and the non-transitory medium. Often module boundaries that are illustrated as separate commonly vary and potentially overlap. For example, a first and a second module may share hardware, software, firmware, or a combination thereof, while potentially retaining some independent hardware, software, or firmware. In one embodiment, use of the term logic includes hardware, such as transistors, registers, or other hardware, such as programmable logic devices.
Use of the phrase ‘to’ or ‘configured to,’ in one embodiment, refers to arranging, putting together, manufacturing, offering to sell, importing and/or designing an apparatus, hardware, logic, or element to perform a designated or determined task. In this example, an apparatus or element thereof that is not operating is still ‘configured to’ perform a designated task if it is designed, coupled, and/or interconnected to perform said designated task. As a purely illustrative example, a logic gate may provide a 0 or a 1 during operation. But a logic gate ‘configured to’ provide an enable signal to a clock does not include every potential logic gate that may provide a 1 or 0. Instead, the logic gate is one coupled in some manner that during operation the 1 or 0 output is to enable the clock. Note once again that use of the term ‘configured to’ does not require operation, but instead focus on the latent state of an apparatus, hardware, and/or element, where in the latent state the apparatus, hardware, and/or element is designed to perform a particular task when the apparatus, hardware, and/or element is operating.
Furthermore, use of the phrases ‘capable of/to,’ and or ‘operable to,’ in one embodiment, refers to some apparatus, logic, hardware, and/or element designed in such a way to enable use of the apparatus, logic, hardware, and/or element in a specified manner. Note as above that use of to, capable to, or operable to, in one embodiment, refers to the latent state of an apparatus, logic, hardware, and/or element, where the apparatus, logic, hardware, and/or element is not operating but is designed in such a manner to enable use of an apparatus in a specified manner.
A value, as used herein, includes any known representation of a number, a state, a logical state, or a binary logical state. Often, the use of logic levels, logic values, or logical values is also referred to as 1's and 0's, which simply represents binary logic states. For example, a 1 refers to a high logic level and 0 refers to a low logic level. In one embodiment, a storage cell, such as a transistor or flash cell, may be capable of holding a single logical value or multiple logical values. However, other representations of values in computer systems have been used. For example the decimal number ten may also be represented as a binary value of 1010 and a hexadecimal letter A. Therefore, a value includes any representation of information capable of being held in a computer system.
Moreover, states may be represented by values or portions of values. As an example, a first value, such as a logical one, may represent a default or initial state, while a second value, such as a logical zero, may represent a non-default state. In addition, the terms reset and set, in one embodiment, refer to a default and an updated value or state, respectively. For example, a default value potentially includes a high logical value, i.e. reset, while an updated value potentially includes a low logical value, i.e. set. Note that any combination of values may be utilized to represent any number of states.
The embodiments of methods, hardware, software, firmware or code set forth above may be implemented via instructions or code stored on a machine-accessible, machine readable, computer accessible, or computer readable medium which are executable by a processing element. A non-transitory machine-accessible/readable medium includes any mechanism that provides (i.e., stores and/or transmits) information in a form readable by a machine, such as a computer or electronic system. For example, a non-transitory machine-accessible medium includes random-access memory (RAM), such as static RAM (SRAM) or dynamic RAM (DRAM); ROM; magnetic or optical storage medium; flash memory devices; electrical storage devices; optical storage devices; acoustical storage devices; other form of storage devices for holding information received from transitory (propagated) signals (e.g., carrier waves, infrared signals, digital signals); etc, which are to be distinguished from the non-transitory mediums that may receive information there from.
Instructions used to program logic to perform embodiments of the invention may be stored within a memory in the system, such as DRAM, cache, flash memory, or other storage. Furthermore, the instructions can be distributed via a network or by way of other computer readable media. Thus a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), but is not limited to, floppy diskettes, optical disks, Compact Disc, Read-Only Memory (CD-ROMs), and magneto-optical disks, Read-Only Memory (ROMs), Random Access Memory (RAM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), magnetic or optical cards, flash memory, or a tangible, machine-readable storage used in the transmission of information over the Internet via electrical, optical, acoustical or other forms of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.). Accordingly, the computer-readable medium includes any type of tangible machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In the foregoing specification, a detailed description has been given with reference to specific exemplary embodiments. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense. Furthermore, the foregoing use of embodiment and other exemplarily language does not necessarily refer to the same embodiment or the same example, but may refer to different and distinct embodiments, as well as potentially the same embodiment.
The following examples pertain to embodiments in accordance with this Specification. One or more embodiments may provide an apparatus, a system, a machine readable storage, a machine readable medium, and a method for receiving a snoop notification indicating a write to a memory address associated with a cache, determining that the snoop notification signifies receipt of a message based, at least in part, on the memory address, and performing an operation based, at least in part, on the message.
One or more example embodiments further provide causation of a subsequent snoop notification based, at least in part, on determination that the snoop notification signifies the message.
In at least one example embodiment, causation of subsequent snoop notifications comprises performing a read of the memory address.
In at least one example embodiment, receipt of the snoop notification is performed by a monitoring agent.
One or more example embodiments further provide receiving message memory allocation information, such that determination that the snoop notification denotes receipt of the message is further based, at least in part, on correlation between the memory address and the message memory allocation information.
In at least one example embodiment, the operation is based, at least in part, on the message without regard for information stored at the memory address.
In at least one example embodiment, the operation is based, at least in part, on the message and information stored in association with the memory address.
One or more example embodiments further provide performance of a write to the memory address to acknowledge receipt of the message.
In at least one example embodiment, determination that the snoop notification signifies a message is further based, at least in part, on determination that the snoop notification was not caused by a cache prefetch.
In at least one example embodiment, determination that the snoop notification was not caused by the cache prefetch comprises determining that a write was performed at the memory address.
One or more embodiments may provide an apparatus, a machine readable storage, a machine readable storage medium, and a method for determining to send a message, determining a memory address to trigger a snoop notification that signifies the message, and performing a write to the memory address to cause the snoop notification to communicate the message.
In at least one example embodiment, information written to the memory address is not pertinent to the message.
One or more example embodiments further provide determining message payload information, wherein performance of the write to the memory address comprises performance of the write of the message payload information to the memory address.
One or more example embodiments further provide determining message memory allocation information that indicates, at least, that the memory address is associated with the message.
In at least one example embodiment, determination of the message memory address to trigger a snoop notification is based, at least in part, on the memory allocation information.
One or more example embodiments further provide sending the message memory allocation information.
In at least one example embodiment, the message allocation information is configured to preclude the memory address being within a cache prefetch page of another address that is associated with a different message.
One or more example embodiments further provide determining to send another message, such that the other message is received subsequent to the message, determining another memory address to trigger a snoop notification that signifies the other message, receiving a snoop notification associated with the memory address, and performing a write to the other memory address to cause the snoop notification to communicate the other message, based at least in part on receipt of the snoop notification associated with the memory address.