This application claims the benefit of Korean Patent Application No. 10-2012-0088680, filed on Aug. 14, 2012, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference.
1. Field of the Invention
The present invention relates to a method and apparatus for improving performance and energy efficiency in an on-chip network. This research was supported by the SW Computing R&D Program of KEIT(2011-10041313, UX-oriented Mobile SW Platform) funded by the Ministry of Knowledge Economy.
2. Description of the Related Art
An on-chip network router may function to receive, from an input port, a flit (flow control digit) that is a flow control unit of a packet, and to transfer the received flit to an output port along a routing path of the packet. Flow control manages the allocation of resources to packets along their route and resolves contentions. There can be several flow control mechanisms including bufferless and buffered. When contention occurs, buffered flow control temporarily stores blocked packets in the buffer, while bufferless flow control misroute these packets.
For on-chip networks with both flow control mechanisms, when a high load is applied to an on-chip network, the on-chip network may experience network congestion, and packets contend for shared network resources frequently, and thus may reduce overall performance.
For a bufferless on-chip network, when a high load is applied and a number of contentions between packets is increased, and a large number of packets may be deflected, which may lead to a reduction in performance of the bufferless on-chip network. Additionally, due to the deflected packets, an energy reduction effect that may be obtained by the bufferless on-chip network may be reduced.
An aspect of the present invention provides a credit-based flow control method and apparatus that may improve performance of a router by reducing a number of contentions in an on-chip network.
According to an aspect of the present invention, there is provided a credit-based flow control method, including: generating, in a core, a memory access request; throttling an injection of the memory access request until credits become available; and injecting the memory access request into a memory controller (MC) via an on-chip network, when the credits become available.
The credits may represent approximate availability of a destination buffer at a destination of the memory access request, and the destination buffer may represent a memory access request queue of the MC.
The clumsy represents that present invention may use the inexact or approximate number of credit for destination buffer to improve performance and energy-efficiency. A credit count may be set, and may be set automatically or manually based on a required performance.
The generating of the memory access request may include generating a memory read request and a memory write request during a program. A credit for the memory read request and a credit for the memory write request may be individually maintained.
A credit count may be decremented once a memory access request is injected into the MC, and a number of available credits may be increased once a reply to the memory access request is generated and transferred from the MC to the core.
According to another aspect of the present invention, there is provided a credit-based flow control apparatus, including: a core; and an MC. When a memory access request is generated in the core, an injection of the memory access request may be throttled until credits become available. When the credits become available, the memory access request may be injected into an MC via an on-chip network.
According to embodiments of the present invention, a manycore accelerator architecture in which a high load is applied to an on-chip network may be applied to a bufferless on-chip network, and accordingly it is possible to obtain performance similar to performance of a buffered on-chip network, and simultaneously to improve an energy efficiency of the bufferless on-chip network.
Additionally, according to embodiments of the present invention, it is possible to be applied to a buffered on-chip network, and accordingly it is possible to improve performance by reducing contention in the network.
Additionally, according to embodiments of the present invention, it is possible to provide a credit-based flow control method and apparatus that may be applied to a design of an on-chip network of a manycore processor.
These and/or other aspects, features, and advantages of the invention will become apparent and more readily appreciated from the following description of exemplary embodiments, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to exemplary embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. Exemplary embodiments are described below to explain the present invention by referring to the figures.
Hereinafter, a clumsy flow control method and apparatus will be described in detail with reference to the accompanying drawings.
In an existing buffered on-chip network of
In the present invention, a credit may represent approximate availability of a buffer at a destination, and accordingly the present invention may provide a clumsy flow control method and apparatus.
In embodiments of the present invention, a number of memory access requests may be limited by a proposed credit, and accordingly a number of memory access requests that may be transferred in a network may be limited. Thus, a number of contentions in an on-chip network and a number of times deflection routing occurs may be reduced.
In operation 210, a memory access request may be generated in a core. The memory access request may be injected into a memory controller (MC). However, when a credit used to transfer the memory access request is unavailable, injection of the memory access request into the MC may be throttled until the credit becomes available in operation 220.
Conversely, when the credit is available, the memory access request may be injected into the MC via an on-chip network in operation 230. In a structure of the present invention, traffic from the core to the MC may occur, and the memory access request may be adjusted by setting a credit count and by adjusting an amount of traffic in the on-chip network.
The credit count may be set based on situations. In an example, when it is difficult to quickly process a request using a small number of credits due to a large number of memory access requests, the credit count may be incremented. In another example, when credit-based flow control performance is regarded to be more important, a low credit count may be limited, or may be automatically or manually set based on a required performance.
As described above, a credit in the present invention may represent approximate availability of a destination buffer at a destination of a memory access request, that is, each credit may represent ability for each core to inject a memory access request into a network. Additionally, the destination buffer may represent a memory access request queue of the MC.
According to embodiments, the memory access request generated in operation 210 may be classified into a memory read request and a memory write request, and accordingly each core may classify credits into two credit types for each MC that is a destination, and may maintain a credit count.
Additionally, available credits may indicate that a number of available credits exist. Since a credit is used every time a memory access request is injected into an MC, a number of available credits may be reduced. Conversely, when a memory access request is injected into an MC, and when a reply to the memory access request is generated and transferred from the MC to a core, the memory access request may be completed, and a number of available credits may be increased.
In other words, when a credit is unavailable, injection of a memory access request may be throttled until the credit becomes available. When the credit becomes available since the memory access request is completed, the memory access request may be injected into an MC.
Referring to
For a memory read request, when rij>0, a core may inject the memory read request into an on-chip network, and rij may be decremented by ‘1.’ When rij=0, injection of the memory read request may be throttled until a credit becomes available, that is, until rij becomes greater than ‘0.’ When a reply to the memory read request returns from the MC j to the core i, rij may be incremented again by ‘1.’
For a memory write request, the above-described example may be applied. When wij>0, a core may inject the memory write request into an on-chip network, and wij may be decremented by ‘1,’ since a credit is available. Additionally, when wij=0, injection of the memory write request may be throttled until the credit becomes available, that is, until wij becomes greater than ‘0.’ When a reply to the memory write request returns from the MC j to the core i, wij may be incremented again by ‘1,’ and accordingly a number of available credits may be increased.
As values of r and w that are given as initial values decrease, contention may be reduced, and a number of memory access requests that may be transferred by each core may be reduced. Accordingly, overall performance may be limited. Additionally, as the values of r and w increase, an amount of traffic input to an on-chip network may be increased and accordingly, more contention may occur. In embodiments, a case in which the values of r and w approach infinity may correspond to an existing bufferless router without a credit-based flow control.
The credit-based flow control apparatus 400 may be used as an apparatus to perform the above-described credit-based flow control method, and each component of the credit-based flow control apparatus 400 may be replaced or changed by similar components. Additionally, in embodiments of the present invention, an effect and performance of the credit-based flow control apparatus 400 may be similarly exhibited, despite a change in each component of the credit-based flow control apparatus 400.
In embodiments of the present invention, a number of memory access requests may be limited by a proposed credit, and accordingly a number of memory access requests that may be transferred in a network may be limited. Thus, a number of contentions in an on-chip network. And for the bufferless on-chip network, a number of times deflection routing occurs may be reduced.
A credit proposed by the present invention may represent approximate availability of a destination buffer at a destination of a memory access request, and the destination buffer may represent a memory access request queue of an MC that is a destination of a memory access request. A queue may refer to queuing information to process an input request, or refer to a waiting line of information, since the input request randomly arrives.
A memory access request may be generated in the core 410, and may be injected into the MC 420. However, when a credit used to transfer the memory access request is unavailable, injection of the memory access request into the MC 420 may be throttled until the credit becomes available.
Conversely, when the credit is available, or when a state of the credit is changed from an unavailable state to an available state, the core 410 may inject the memory access request into the MC 420 via an on-chip network. In a structure of the present invention, traffic from the core 410 to the MC 420 may occur, and the memory access request may be adjusted by setting a credit count and by adjusting an amount of traffic in the on-chip network.
The memory access request generated in the core 410 may be classified into a memory read request and a memory write request, and accordingly the core 410 may maintain two credit counts for each MC 420 that is a destination.
For example, when a memory read request is generated, and when credits are unavailable, transmission of the memory read request may be throttled until the credits become available. When a reply to the memory read request is generated, the credits may become available, and the memory read request may be transmitted. Similarly, a memory write request may be processed.
In embodiments, an initial value of each of a credit for a memory read request and a credit for a memory write request, that is, an amount of traffic input to an on-chip network may be set. The initial value may be set for each of the memory read request and the memory write request, or may be set based on performance required by the credit-based flow control apparatus 400.
For example, when a low initial credit value is set, contention may be reduced, and a number of memory access requests that may be transferred by the core 410 may be reduced, and accordingly overall performance may be limited. Conversely, when a high initial credit value is set, an amount of traffic input to an on-chip network may be increased, and accordingly more contention may occur. In embodiments, a case in which a credit value approaches infinity may correspond to an existing bufferless router without a credit-based flow control.
As described above, according to embodiments of the present invention, it is possible to provide a credit-based flow control method and apparatus that may increase performance of a router by reducing a number of contentions in an on-chip network. For a case of applying the present invention to a bufferless router, by reducing a number of times deflection occurs through adjustment of an amount of on-chip network traffic, this may increase an energy-efficiency.
The clumsy flow control method according to the embodiments of the present invention may be recorded in non-transitory computer-readable media including program instructions to implement various operations embodied by a computer. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of non-transitory computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM disks and DVDs; magneto-optical media such as optical discs; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like. Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter. The described hardware devices may be configured to act as one or more software modules in order to perform the operations of the above-described embodiments of the present invention, or vice versa.
Although a few exemplary embodiments of the present invention have been shown and described, the present invention is not limited to the described exemplary embodiments. Instead, it would be appreciated by those skilled in the art that changes may be made to these exemplary embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
1020120088680 | Aug 2012 | KR | national |