This disclosure pertains to computing systems, and in particular (but not exclusively) to on-die interconnection of devices.
In various embodiments, a burst command encoding for a packet-based communication may be used that is independent of data (e.g., word) width, and thus a packet may be sent throughout a system such as a network on chip (NoC) having links of different link widths without the need for any change to the command encoding, whenever upsizing and downsizing of link widths is incurred in the network on chip. As a consequence, a packet traversing a network on chip from source to destination can be upsized and downsized with regard to link widths by merging or dissecting packet units (e.g., flow control units or flits). Embodiments enable such width changes without any re-encoding and without additional storage or latency to adapt to the changed link width, although inherent storage and latency may occur when multiple smaller flits are to be collected to form a wider flit.
According to embodiments, a burst command encoding can include at a minimum two orthogonal pieces of information, both independent of the link width. The first information is the size of the burst. This burst size remains in a static or fixed quantity (bits, bytes, nibbles, etc.), independent of the actual size of a word, flit, or particular link width for a segment of the network communication. The second information is a wrapping border information (if any), where an address sequence stops incrementing at the end border and jumps back to the start border, again using a fixed quantity (address bits, byte address, . . . ) independent of flit width to express the borders. In this way, reduced complexity of upsizing and downsizing functions for link width changes within a network on chip (or other interconnect solutions) is realized, because command re-coding is not needed inside the network on any width change. Still further, a reduced area of a design for a packet-based NoC, e.g., that uses wormhole routing, may be realized because there is no need to collect several first flits of a packet in a storage inside the upsize or downsize function to be able to change packet contents. Also, reduced latency for packet-based NoCs may be realized because no time is spent on packet inspection and modification of command encoding during any upsize or downsize functions. Note that in some embodiments, the smaller the link widths, the more prominent the savings are, because at small link widths, the number of flits to store and inspect to obtain access to the command encoding is higher than for wide link widths, where all of that information may be present in a first flit.
In embodiments described herein, the encoding of a burst is independent of the current word width. Different approaches may be used to realize such encoding. Although the scope of the present invention is not limited in this regard, example encodings include: basing a burst description on the smallest data transfer quantity used in a system (e.g., bit, byte, half-word, or word); basing a burst description on the native word width of an initiator agent (and transmitting the original word width as part of the burst description); basing a burst description on a native word width of a target or destination component and potentially performing an encoding conversion at the initiator agent; or basing a burst description on a configurable quantity and transmitting that quantity information along with the message.
In all these cases, any size conversion operation for a link width change on the path from initiator to target can avoid inspecting the packet and modifying the burst encoding. Instead such a size conversion operation takes the form of a rearrangement of the flits (e.g., in parallel or serially at every resizing component). All encoding/decoding conversions are to be done on initiator or on target side or both only.
Generally, a burst can be described in an industry standard bus or other bus by a start address, the amount of data (also referred to as beat size) transferred in a single clock cycle (one beat), the number of beats, and an address sequence mechanism. In different cases, an address sequence mechanism may be incrementing, in which every beat is incrementally adjacent to its predecessor in memory space; fixed, in which every beat is sent to the same address (the start address); or wrapping, in which when an address that is a multiple of the beat size times the beat number has been reached, the burst continues at the next lower integer multiple of the beat size times the beat number. Thus typical bus burst descriptions are based on the beat size, which is a function of the word width of the links.
A command portion of a packet can be encoded to be independent of link width in different manners as mentioned above. In Table 1, an implementation example is shown in which a smallest data transfer granularity for the system is assumed to be a byte. As shown in Table 1, a burst description includes a start address as a byte address; a total length in bytes (calculated from beat size times number of beats); a wrap border specified as a power-of-two multiple of bytes; and a means to distinguish different burst types (e.g., wrapping, fixed and incrementing bursts). For a fixed address burst, the wrap border can be set to the beat size of the initiator, such that a fixed burst is a special case of a wrapping burst that spans a constrained address space multiple times. For an incrementing burst, the wrap border may be set to a special size that does not make sense for fixed or wrapping bursts (e.g., the maximum encodable wrap border value, which may be bigger than the maximum encodable burst length). Otherwise an encoding for a command portion of a header may include a fourth parameter value to distinguish among incrementing, wrapping, and fixed bursts.
In the approach of Table 1, the density of the encoding can be improved where the wrap border is not directly given as an integer number counting the byte address when to wrap around, but as an integer exponent only representing boundaries that are restricted to power-of-two values. Note that such restriction is reasonable as most components in modern SoCs have bus interfaces of a width of a power of two and limit wrapping burst lengths to powers of two.
In Table 2, an implementation example is shown for a burst description based on initiator data width. As shown in Table 2, this encoding includes a start address in a known quantity (not necessarily bytes); a beat size of the initiator (Init-size); a number of beats; and an indicator of burst kind (e.g., incrementing, wrapping, or fixed).
An encoding as in Table 2 closely resembles common bus protocol encoding. As such, transaction encoding on an initiator side may be comparably simple. In this case, a target agent may be configured to be capable of converting different incoming encodings to the target bus width, which may increase complexity of decoding logic of a target network interface.
In Table 3, an implementation example is shown in which a burst encoding is based on a target data width. As shown in Table 3, a burst description includes a start address in a known quantity (not necessarily bytes); a beat size of a target (which may be omitted (this is shown in italics in Table 3) in transmission, because the target knows implicitly this size and all other components on the path do not need such information, except the unit computing the burst encoding); a number of beats; and an indicator of burst kind (e.g., incrementing, wrapping, or fixed).
Note the beat size may be larger than current data width and the address need not be properly aligned: it will be fixed at the target network interface; and target data width transmission in message is optional, since it is to be used in the target only for translation (and potential transaction splitting for alignment) to bus protocol.
An encoding as in Table 3 enables the possibility to remove encoding of a target bus size from the packet format and thus from the bandwidth requirements for message transmission through the network. However, initiator network interfaces may be configured to comprehend target interface bus width and have proper encoding logic for all these cases, which may add complexity to initiator interfaces. Note that with such encoding, a target interface may be configured to segment or split misaligned bursts originating from initiators with reduced data width. These may be split into a sequence of smaller bursts on the target side finally.
In Table 4, an implementation example is shown for a burst description based on a configured data width. As shown in Table 4, this encoding includes a start address in a known quantity (not necessarily bytes); a beat size of a selected granularity; a number of beats; and an indicator of burst kind (e.g., incrementing, wrapping, or fixed).
An encoding as in Table 4 having a configurable data width is very close to the approach of Table 1 for those cases where the configured data width is the same throughout all the network, except that the granularity is not bytes but another unit, which can improve encoding density. If on the other hand, every initiator agent uses a private configurable data width, the effect on the target protocol translators may be about the same as in the approach of Table 2 in which all combinations that exist may be supported in translation.
To achieve the objective to prevent packet inspection and modification for upsizing and downsizing components, packet format encoding may further have a payload size encoding that is independent of flit width, and a packet end detection that is independent of flit width.
In various embodiments, payload size encoding may be achieved by adding a utilized-payload parameter to a message, whose size is the size of the burst (e.g., the Length parameter in the examples above that specifies the amount of transmitted data as a multiple of the fixed quantity size). The utilized-payload parameter gives the size of the actual payload data only, not including header, command or padding. Header and command length are known from the packet format and are either a fixed or a packet-format dependent quantity. Then even potentially required padding of the payload to fill a wide flit in upsizing would not matter, because the utilized-payload parameter indicates which payload bytes are padding bytes that can be ignored. An end of message encoding may be achieved by inclusion of end of packet indicators, which may be implemented using one or more marker bits that accompany every flit to distinguish the last flit of a message from the others, or by encoding packet flits in a manner such that a specifically encoded tail flit can complete a message uniquely.
Tables 5 and 6 below describe further example burst descriptions.
In this example, a downsize function manipulates only the EOP column of the flit without need for packet inspection.
In packet formats that use specifically encoded tail flits after a message (often carrying additional information like a checksum), instead of an EOP column the existing flit-type designator can be manipulated in the up- and downsize functions.
In this example, the upsize function manipulates only the Flit-type column of the flit without need for packet inspection. If there is a checksum in the tail flit, it may be recomputed, depending on checksum algorithm.
In this example, the downsize function manipulates only the Flit-type column of the flit without need for packet inspection. If there is a checksum in the tail flit, it may be recomputed, depending on checksum algorithm.
Although specific encodings and ordering of encoding information are shown in Tables 1-6, understand that a command encoding in accordance with an embodiment of the present invention can take many different forms.
Note that in all implementation alternatives described above, the current data width of the link is not part of the burst encoding. As such, burst encoding remains unchanged over upsizing and downsizing components and no packet inspection occurs for any upsizing or downsizing operation.
Referring now to
In the context of
Still referring to
Due to the NoC arrangement in which different links may operate at different link widths, appropriate conversion circuitry such as upconversion and/or downconversion logic may be included within and/or associated with individual routers 20 and/or NIs 30 as appropriate for a particular configuration.
Referring now to
With further reference to router 20, a plurality of ingress ports 220-227 are provided, each configured to receive incoming information from a source location and provide the information to a destination location via a crossbar 28 which, in an embodiment may be a sparsely populated crossbar to enable appropriate connections between the different ports of the router. A plurality of egress ports 240-247 are also provided, each configured to receive information from a corresponding arbiter 250-257 and output such information to a destination location. Note that each arbiter 25 may receive multiple information streams and select an appropriate stream for output from the corresponding egress port. Note that each pair of ingress port 22 and egress port 24 may be associated with a particular network interface, interconnect or other agent (or remain unconnected).
In certain cases, link width changes may occur due to changes in native bit width at which different agents operate. Accordingly as shown in
Referring now to
Each instantiation of downconversion logic 50 within an NoC may have a configurable input bit width and output bit width, as appropriate for interconnection between different logics, agents and or interconnects operating at different native bit widths. Note that control logic 54 need not inspect the incoming information or any portion of it (such as a header portion or command portion) to determine a type of packet or to perform any type of re-coding of any information therein. Instead, control logic 54 simply controls selection logic 52 to re-size incoming information of one flow control unit size to outgoing information of a different flow control unit size.
Referring now to
Still with reference to
Each instantiation of upconversion logic 60 within an NoC may have a configurable input bit width and output bit width, as appropriate for interconnection between different logics, agents and or interconnects operating at different native bit widths. Note that control logic 64 need not inspect the incoming information or any portion of it (such as a header portion or command portion) to determine a type of packet or to perform any type of re-coding of any information therein. Instead, control logic 66 simply controls selection logic 62 to re-size incoming information of one flow control unit size to outgoing information of a different flow control unit size. Of the flit information, only the EOP or flit-type bit column(s) may be manipulated to produce legal packet frames.
Referring now to
Still referring to
Referring now to
Method 150 begins by extracting and processing header information of the packet (block 160). Understand that such header information may include various information, including source and destination information, among other such information, such as packet format, if there are multiple formats of different length and command information content, quality of service (QoS) indications and so forth. Next control passes to block 170 where command information of the packet may be extracted and processed. This command information may correspond to the encoded burst command as generated above with regard to
Referring now to
Referring now to
Otherwise, when the appropriate number of flow control units have been received, control passes to block 285, where the multiple flits (e.g., 4) can be merged into a single flow control unit of the second flit (block 285). Note however that no analysis of the incoming packet or re-coding is performed. Control next passes to block 290 where the flow control unit of the second width can be output. At this point, method 260 concludes for the particular packet to be upconverted and output. This upconversion process thus is performed in which no re-coding of command or other information occurs; rather the packet is simply upsized from one width to another, only the end-of-packet information is given with the last flit of the upsized packet.
Referring now to
In the high level illustration of
For the embodiments described herein, encoding logic 322 may encode a burst command as described above into a command portion of the packet. In turn, packet generation logic 324 receives the various portions from encoding logic 322 and generates a packet therefrom. The packet then may be communicated via network interface 330 to a desired destination location. Understand that in a receive direction, incoming packets received from network interface 330 are provided to a packet parsing logic portion of logic 324 where the packet can be parsed into its constituent portions, which are provided to decoding logic portion of logic 322 so that the portions may be decoded and appropriate information provided to IP logic 310 (e.g., a transaction within the packet). Understand while shown at this high level in the embodiment of
Because IP block 300 and its associated network interface may operate at a native bit width different than other portions of the SoC, understand that interposed between network interface 330 and a router or other destination may be appropriate upconversion and/or downconversion logic as described herein. Of course in other implementations, such conversion logic may be present within the network interface itself, within a router or other network elements of the SoC, and in some cases such conversion logic may be present within an IP block itself.
Referring now to
As further shown in
With further reference to
Referring now to
Understand that SoCs (or other integrated circuits) including a NoC as described herein can be used in many different systems, ranging from small portable devices to high performance computing systems and networks. Referring now to
In the high level view shown in
Core unit 910 may also include an interface such as a network interface to enable interconnection to additional circuitry of the SoC. In an embodiment, core unit 910 couples to a coherent fabric formed of an on-die interconnect that implements the command encoding described herein and which may act as a primary cache coherent on-die interconnect that in turn couples to a memory controller 935. In turn, memory controller 935 controls communications with a memory such as a DRAM (not shown for ease of illustration in
In addition to core unit, additional processing engines are present within the processor, including a modem 915 (which may include an NoC that implements the described command encoding), at least one graphics unit 920 which may include one or more graphics processing units (GPUs) to perform graphics processing as well as to possibly execute general purpose operations on the graphics processor (so-called GPGPU operation). In addition, at least one image signal processor 925 may be present. Signal processor 925 may be configured to process incoming image data received from one or more capture devices, either internal to the SoC or off-chip.
Other accelerators also may be present. In the illustration of
In some embodiments, SoC 900 may further include a non-coherent fabric coupled to the coherent fabric to which various peripheral devices may couple. One or more interfaces 960a-960d enable communication with one or more off-chip devices. Such communications may be according to a variety of communication protocols such as PCIe™, GPIO, USB, I2C, UART, MIPI, SDIO, DDR, SPI, HDMI, among other types of communication protocols. Although shown at this high level in the embodiment of
Referring now to
In turn, application processor 1210 can couple to a user interface/display 1220, e.g., a touch screen display. In addition, application processor 1210 may couple to a memory system including a non-volatile memory, namely a flash memory 1230 and a system memory, namely a dynamic random access memory (DRAM) 1235. As further seen, application processor 1210 further couples to a capture device 1240 such as one or more image capture devices that can record video and/or still images.
Still referring to
As further illustrated, a near field communication (NFC) contactless interface 1260 is provided that communicates in a NFC near field via an NFC antenna 1265. While separate antennae are shown in
A power management integrated circuit (PMIC) 1215 couples to application processor 1210 to perform platform level power management. To this end, PMIC 1215 may issue power management requests to application processor 1210 to enter certain low power states as desired. Furthermore, based on platform constraints, PMIC 1215 may also control the power level of other components of system 1200.
To enable communications to be transmitted and received, various circuitry may be coupled between baseband processor 1205 and an antenna 1290. Specifically, a radio frequency (RF) transceiver 1270 and a wireless local area network (WLAN) transceiver 1275 may be present. In general, RF transceiver 1270 may be used to receive and transmit wireless data and calls according to a given wireless communication protocol such as 3G or 4G wireless communication protocol such as in accordance with a code division multiple access (CDMA), global system for mobile communication (GSM), long term evolution (LTE) or other protocol. In addition a GPS sensor 1280 may be present. Other wireless communications such as receipt or transmission of radio signals, e.g., AM/FM and other signals may also be provided. In addition, via WLAN transceiver 1275, local wireless communications, such as according to a Bluetooth™ standard or an IEEE 802.11 standard such as IEEE 802.11a/b/g/n can also be realized.
Referring now to
A variety of devices may couple to SoC 1310. In the illustration shown, a memory subsystem includes a flash memory 1340 and a DRAM 1345 coupled to SoC 1310. In addition, a touch panel 1320 is coupled to the SoC 1310 to provide display capability and user input via touch, including provision of a virtual keyboard on a display of touch panel 1320. To provide wired network connectivity, SoC 1310 couples to an Ethernet interface 1330. A peripheral hub 1325 is coupled to SoC 1310 to enable interfacing with various peripheral devices, such as may be coupled to system 1300 by any of various ports or other connectors.
In addition to internal power management circuitry and functionality within SoC 1310, a PMIC 1380 is coupled to SoC 1310 to provide platform-based power management, e.g., based on whether the system is powered by a battery 1390 or AC power via an AC adapter 1395. In addition to this power source-based power management, PMIC 1380 may further perform platform power management activities based on environmental and usage conditions. Still further, PMIC 1380 may communicate control and status information to SoC 1310 to cause various power management actions within SoC 1310.
Still referring to
As further illustrated, a plurality of sensors 1360 may couple to SoC 1310. These sensors may include various accelerometer, environmental and other sensors, including user gesture sensors. Finally, an audio codec 1365 is coupled to SoC 1310 to provide an interface to an audio output device 1370. Of course understand that while shown with this particular implementation in
Turning next to
Here, SoC 2000 includes 2 cores—2006 and 2007. Similar to the discussion above, cores 2006 and 2007 may conform to an Instruction Set Architecture, such as an Intel® Architecture Core™-based processor, an Advanced Micro Devices, Inc. (AMD) processor, a MIPS-based processor, an ARM-based processor design, or a customer thereof, as well as their licensees or adopters. Cores 2006 and 2007 are coupled to cache control 2008 that is associated with bus interface unit 2009 and L2 cache 2010 to communicate with other parts of system 2000. Interconnect 2010 includes an on-chip interconnect, and which may implement the command encoding described herein.
Interconnect 2010 provides communication channels to the other components, such as a boot ROM 2035 to hold boot code for execution by cores 2006 and 2007 to initialize and boot SOC 2000, a SDRAM controller 2040 to interface with external memory (e.g. DRAM 2060), a flash controller 2045 to interface with non-volatile memory (e.g. Flash 2065), a peripheral controller 2050 (e.g. Serial Peripheral Interface) to interface with peripherals, video codecs 2020 and Video interface 2025 to display and receive input (e.g. touch enabled input) via one of MIPI or HDMI/DP interface, GPU 2015 to perform graphics related computations, etc.
In addition, the system illustrates peripherals for communication, such as a Bluetooth module 2070, 3G modem 2075, GPS 2080, and WiFi 2085. Also included in the system is a power controller 2055.
The following examples pertain to further embodiments.
In one example, a NoC comprises: an IP logic including at least one processing element to perform operations on data; a protocol logic to generate a transaction to be sent from the IP logic, and a packet insertion logic to insert the packet into a network. In an embodiment, the protocol logic comprises: an encoding logic to encode a command portion of a packet associated with the transaction, the command portion having a width independent encoding and including a first field to indicate a burst size and a second field to indicate a wrapping border to delineate between a start border and an end border; and a packet generation logic to generate the packet including a header portion, the command portion and a payload portion
In an example, the NoC further comprises an upconversion logic to receive a second packet at a first width and to upconvert the second packet to a second width, where the upconversion logic is to maintain an original encoding of a command portion of the second packet, the second width greater than the first width.
In an example, the upconversion logic is to merge a plurality of flow control units of the second packet, the plurality of flow control units of the first width, into one or more flow control units of the second width, and maintain the original encoding of the command portion.
In an example, the upconversion logic comprises: a selection logic to receive the plurality of flow control units; a control logic to control the selection logic; and a buffer coupled to the selection logic to store the one or more flow control units of the second width.
In an example, the NoC of one or more of the above Examples further comprises a downconversion logic to receive the packet at a second width and to downconvert the packet to a first width, where the downconversion logic is to maintain an original encoding of the command portion of the packet, the second width greater than the first width.
In an example, the downconversion logic is to separate a flow control unit of the packet, the flow control unit of the second width, into a plurality of flow control units of the first width, and maintain the original encoding of the command portion.
In an example, the downconversion logic comprises: a selection logic to receive the plurality of flow control units; and a control logic to control the selection logic.
In an example, the encoding logic is further to encode an address sequence type into the command portion.
In an example, a packet may be re-packetized one or more times in transmission from the packet insertion logic to a destination network interface coupled to a destination IP logic, the re-packetization to occur in which the width independent encoding of the command portion is maintained.
Note that the above NoC can be implemented using various means.
In an example, the NoC may be implemented in a SoC in turn incorporated in a user equipment touch-enabled device.
In another example, a system comprises a display and a memory, and includes a processor having the NoC of one or more of the above examples.
In another example, an apparatus comprises: a source agent including at least one logic unit to perform instructions; an encoder to encode a burst command portion of a packet having a first field to indicate a burst size and a second field to indicate a data width of one of the source agent and a destination agent, where the burst size and the data width are to remain fixed when the packet is to be re-sized one or more times during transmission from the source agent to the destination agent; and transmission logic to transmit the packet including the burst command portion, where the packet is to further include a data portion.
In an example, the apparatus further comprises a first router coupled to the source agent via a first link having a first link width and to route the packet from the source agent to the destination agent via a second link having a second link width, the first link width different than the second link width.
In an example, the apparatus further comprises a first upconversion logic to receive the packet at the first link width and to output the packet at the second link width, the second link width greater than the first link width, where the first upconversion logic is coupled to an ingress port of the first router and comprises a selection logic to receive a plurality of flow control units of the first link width, a control logic to control the selection logic, and a buffer coupled to the selection logic to store one or more flow control units of the second link width.
In an example, the apparatus further comprises a first downconversion logic to receive the packet at the second link width and to output the packet at a third link width, the third link width less than the second link width, where the first downconversion logic is coupled to an egress portion of the first router and comprises a selection logic to receive at least one flow control unit of the second link width, and a control logic to control the selection logic.
In an example, the encoder of one of the above Examples is to encode the burst command portion to be independent of a native data width of the source agent.
In an example, the encoder is to encode the first field according to a smallest data transfer quantity of an agent within a network.
In an example, the encoder is to encode the first field according to a smallest data transfer quantity of a selected one of the source agent and the destination agent.
In another example, a machine-readable medium has stored thereon data, which if used by at least one machine, causes the at least one machine to fabricate at least one integrated circuit to perform a method comprising: encoding a command portion of a packet having an independent burst size information and an independent wrapping border information; generating a packet of a first width including a header portion, the command portion and a payload portion; and injecting the packet into a network as one or more flow control units, where the independent burst size information and the independent wrapping border information are to be unmodified when a width of the packet is modified one or more times in communication from a source agent to a destination agent.
In an example, the method further comprises receiving a second packet at a first width and upconverting the second packet to a second width, the second width greater than the first width, including merging a plurality of flow control units of the second packet, the plurality of flow control units of the first width, into one or more flow control units of the second width.
In an example, the method further comprises: receiving the packet at a second width and downconverting the packet to a first width, the second width greater than the first width, including separating a flow control unit of the packet, the flow control unit of the second width, into a plurality of flow control units of the first width.
In an example, the method further comprises encoding the independent wrapping border information as a power of two.
In an example, the method further comprises encoding the independent burst size information according to a smallest data transfer quantity of a selected one of the source agent and the destination agent.
In another example, a computer readable medium including instructions is to perform the method of any of the above examples.
In another example, an apparatus comprises means for performing the method of any one of the above examples.
Understand that various combinations of the above examples are possible.
Embodiments may be used in many different types of systems. For example, in one embodiment a communication device can be arranged to perform the various methods and techniques described herein. Of course, the scope of the present invention is not limited to a communication device, and instead other embodiments can be directed to other types of apparatus for processing instructions, or one or more machine readable media including instructions that in response to being executed on a computing device, cause the device to carry out one or more of the methods and techniques described herein.
Embodiments may be implemented in code and may be stored on a non-transitory storage medium having stored thereon instructions which can be used to program a system to perform the instructions. Embodiments also may be implemented in data and may be stored on a non-transitory storage medium, which if used by at least one machine, causes the at least one machine to fabricate at least one integrated circuit to perform one or more operations. The storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, solid state drives (SSDs), compact disk read-only memories (CD-ROMs), compact disk rewritables (CD-RWs), and magneto-optical disks, semiconductor devices such as read-only memories (ROMs), random access memories (RAMs) such as dynamic random access memories (DRAMs), static random access memories (SRAMs), erasable programmable read-only memories (EPROMs), flash memories, electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, or any other type of media suitable for storing electronic instructions.
While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.