Embodiments of the present invention generally relate to allocating one or more memory buffers in a computing system with a plurality of memory channels.
Due to the demand for increasing processing speed and volume, many computer systems employ multiple client devices (e.g., computing devices). In typical computer systems with multiple client devices, each of the client devices can communicate with multiple memory devices via a system bus. A source of inefficiency in the system bus relates to a recovery time period of a memory device when the client devices request successive data transfers from the same memory bank of the memory device (also referred to herein as “memory bank contention”). The recovery time period refers to a delay time exhibited by the memory device between a first access and an immediately subsequent second access to the memory device. While the memory device accesses data, no data can be transferred on the system bus during the recovery time period, thus leading to inefficiency in the system bus.
Since the system bus can only be used by one client device at a time, one approach to improve bus efficiency involves interleaving memory addresses within the multiple memory devices on the system bus. When the memory addresses are interleaved on the system bus, successive memory storage locations (e.g., memory locations having consecutive addresses) are placed in separate memory devices. By placing successive memory locations in separate memory devices, the effects from the recovery time period for a given memory device, and thus memory bank contention, can be reduced.
However, in a computer system with multiple client devices, interleaving memory addresses within the multiple memory devices may not lead to an optimal use of the system bus. In particular, the system bus typically enters an arbitration state to determine which of the client devices can access the system bus and interleaved memory addresses within the multiple memory devices. For instance, the arbitration state can allow a first client device to access the system bus and successive memory locations within the multiple memory devices prior to a second client device. However, the arbitration state cannot guarantee that the second client device will immediately access the same successive memory locations as the first client device, thus compromising the benefits of the interleaved memory architecture (e.g., reduction of memory bank contention).
Methods and systems are needed to reduce, or eliminate, memory bank contention in computer systems with multiple client devices.
Embodiments of the present invention include a method for allocating one or more memory buffers in a computing system with a plurality of memory channels. The method can include the following: allocating a first memory buffer to a first plurality of memory banks, where the first plurality of memory banks spans over a first set of one or more memory channels; allocating a second memory buffer to a second plurality of memory banks, where the second plurality of memory banks spans over a second set of one or more memory channels; associating a first sequence identifier and a second sequence identifier with the first memory buffer and the second memory buffer, respectively; and, accessing the first and second memory buffers based on the first and second sequence identifiers. The method can also include executing a first memory operation associated with the first memory buffer at a first operating frequency. Similarly, the method can include executing a second memory operation associated with the second memory buffer at a second operating frequency, where the first operating frequency is different from the second operating frequency.
Embodiments of the present invention additionally include a computer program product that includes a computer-usable medium having computer program logic recorded thereon for enabling a processor to allocate one or more memory buffers in a computing system with a plurality of memory channels. The computer program logic can include the following: first computer readable program code that enables a processor to allocate a first memory buffer to a first plurality of memory banks, where the first plurality of memory banks spans over a first set of one or more memory channels; second computer readable program code that enables a processor to allocate a second memory buffer to a second plurality of memory banks, where the second plurality of memory banks spans over a second set of one or more memory channels; third computer readable program code that enables a processor to associate a first sequence identifier and a second sequence identifier with the first memory buffer and the second memory buffer, respectively; and, fourth computer readable program code that enables a processor to access the first and second memory buffers based on the first and second sequence identifiers. The computer program logic can also include the following: fifth computer readable program code that enables a processor to execute a first memory operation associated with the first memory buffer at a first operating frequency; and, sixth computer readable program code that enables a processor to execute a second memory operation associated with the second memory buffer at a second operating frequency, where the first operating frequency is different from the second operating frequency.
Embodiments of the present invention further include a computing system. The computing system can include a first client device, a second client device, a plurality of memory channels, and a memory controller. The plurality of memory channels can include a plurality of memory devices (e.g., Dynamic Random Access Memory (DRAM) devices). The memory controller is configured to communicatively couple the first and second client devices to the plurality of memory channels. The memory controller is also configured to perform the following functions: allocate a first memory buffer to a first plurality of memory banks, where the first plurality of memory banks spans over a first set of one or more memory channels; allocate a second memory buffer to a second plurality of memory banks, where the second plurality of memory banks spans over a second set of one or more memory channels; associate a first sequence identifier and a second sequence identifier with the first memory buffer and the second memory buffer, respectively; and, access the first and second memory buffers based on the first and second sequence identifiers. Further, the memory controller is also configured to execute a first memory operation associated with the first memory buffer at a first operating frequency and to execute a second memory operation associated with the second memory buffer at a second operating frequency, where the first operating frequency is different from the second operating frequency.
Further features and advantages of the invention, as well as the structure and operation of various embodiments of the present invention, are described in detail below with reference to the accompanying drawings. It is noted that the invention is not limited to the specific embodiments described herein. Such embodiments are presented herein for illustrative purposes only. Additional embodiments will be apparent to persons skilled in the relevant art based on the teachings contained herein.
The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate embodiments of the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the relevant art to make and use the invention.
The following detailed description refers to the accompanying drawings that illustrate exemplary embodiments consistent with this invention. Other embodiments are possible, and modifications can be made to the embodiments within the spirit and scope of the invention. Therefore, the detailed description is not meant to limit the invention. Rather, the scope of the invention is defined by the appended claims.
It would be apparent to one of skill in the art that the present invention, as described below, can be implemented in many different embodiments of software, hardware, firmware, and/or the entities illustrated in the figures. Thus, the operational behavior of embodiments of the present invention will be described with the understanding that modifications and variations of the embodiments are possible, given the level of detail presented herein.
Based on the description herein, a person skilled in the relevant art will recognize that multi-client computing system 100 can include more or less than two computing devices, more than one memory controller, more or less than four memory devices, or a combination thereof. These different configurations of multi-client computing system 100 are within the scope and spirit of the embodiments described herein. However, for ease of explanation, the embodiments contained herein will be described in the context of the system architecture depicted in
In an embodiment, each of computing devices 110 and 120 can be, for example and without limitation, a central processing unit (CPU), a graphics processing unit (GPU), an application-specific integrated circuit (ASIC) controller, other similar types of processing units, or a combination thereof. Computing devices 110 and 120 are configured to execute instructions and to carry out operations associated with multi-client computing system 100. For instance, multi-client computing system 100 can be configured to render and display graphics. Multi-client computing system 100 can include a CPU (e.g., computing device 110) and a GPU (e.g., computing device 120), where the GPU can be configured to render two- and three-dimensional graphics and the CPU can be configured to coordinate the display of the rendered graphics onto a display device (not shown in
In reference to
In an embodiment, one or more memory buffers are allocated to, or associated with, a plurality of memory banks, where the plurality of memory banks can span over one or more memory channels.
In reference to
A function of memory management unit 310, among others, is to allocate, or associate, one or more memory buffers to operations associated with computing devices 110 and 120. In an embodiment, memory management unit 310 allocates (or associates) memory buffers at a memory channel/memory bank granularity. This granularity refers to a number of memory channels and a number of memory banks (within the memory channels) that are allocated to the one or more memory buffers. In an embodiment, the granularity can be dictated by computing devices 110 and 120, as described in further detail below.
In an embodiment, memory management unit 310 is configured to allocate, or associate, a first memory buffer to a first plurality of memory banks, where the first plurality of memory banks spans over a first set of one or more memory channels. An example of the first memory buffer is memory buffer 220 of
As would be understood by a person skilled in the relevant art, memory buffers in computing systems (e.g., multi-client computing system 100) are typically used when moving data between operations or processes executed by computing devices (e.g., computing devices 110 and 120 of
In an embodiment, computing device 120 is a GPU and the second memory buffer (e.g., memory buffer 250 of
In another embodiment, the first and second memory buffers can be used in the execution of operations by computing device 110 or computing device 120. In an embodiment, computing device 110 is a GPU and the first and second memory buffers can be used in the execution of operations by computing device 110. For instance, memory buffer 210 of
A benefit, among others, in allocating memory buffers 210-250 across all of the memory channels in multi-computing system 100 of
In reference to
In reference to
Each of memory buffers 220, 230, 240, and 250 can be assigned a sequence identifier, according to an embodiment of the present invention. In an embodiment, the sequence identifier provides a reference for memory controller 130 and memory devices 140, 150, 160, and 170 of
For a portion of the video decode pipeline operation, memory controller 130 and memory devices 140-170 may address/access memory buffers 220, 230, 240, and 250 in a particular sequence, according to an embodiment of the present invention. The sequence identifiers of memory buffers 220, 230, 240, and 250 can be used as parameters for the particular sequence. For example, if the particular sequence is ‘1’, ‘2’, and ‘4’, memory buffer 250 will be addressed/accessed first, memory buffer 240 will be addressed/accessed second, and memory buffer 220 will be addressed/accessed last. In another example, if the particular sequence is ‘1’, ‘3’, and ‘4’, memory buffer 250 will be addressed/accessed first, memory buffer 230 will be addressed/accessed second, and memory buffer 220 will be addressed/accessed last. In both of these examples, the particular sequences do not have ‘2’ and ‘3’ occurring one after another. As a result, memory bank contention issues are not only reduced, or avoided, in memory channel 160, but the full bandwidth of the memory channels in multi-client computing system 100 can also be utilized.
In instances where memory management unit 310 does not have information on the workload expectation of computing devices 110 and 120, a default memory buffer arrangement can be used for operations associated with computing devices 110 and 120, according to an embodiment of the present invention. In an embodiment, the default memory buffer arrangement can span across all memory banks of and across all memory channels. An example of this memory buffer arrangement is illustrated as memory buffer 210 of
In addition to assessing the workload expectation of computing devices 110 and 120, memory management unit 310 is configured to operate each of memory channels 140, 150, 160, and 170 at a particular operating frequency. As a result, the bandwidth per memory channel can be assessed based on the allocated memory buffers across one or more of the memory channels. For instance, based on a particular arrangement of memory buffers across memory channels 140, 150, 160, and 170 (e.g., memory buffers 210, 220, 230, 240, and 250 of
In reference to
In an embodiment, scheduler 320 operates in conjunction with memory management unit 310 to sort threads of arbitration between computing devices 110 and 120 of
In an embodiment, after an operation associated with computing devices 110 and 120 of
In step 510, a first memory buffer is allocated to, or associated with, a first plurality of memory banks, where the first plurality of memory banks spans over a first set of one or more memory channels. Memory management unit 310 of
In step 520, a second memory buffer is allocated to, or associated with, a second plurality of memory banks, where the second plurality of memory banks spans over a second set of one or more memory channels. In an embodiment, the second plurality of memory banks is different from the first plurality of memory banks (in step 510). In another embodiment, the second plurality of memory banks is the same as the first plurality of memory banks. Memory management unit 310 of
In step 530, a first sequence identifier and a second sequence identifier are associated with the first memory buffer and the second memory buffer, respectively. Memory management unit 310 of
In step 540, the first and second memory buffers are accessed based on the first and second sequence identifiers. In an embodiment, the first and second memory buffers are accessed in sequence to avoid memory bank contention and to utilize a full bandwidth of the plurality of memory channels. Memory management unit 310 and scheduler 320 of
Further, in an embodiment, when executing a first memory operation associated with the first memory buffer and a second memory operation associated with the second memory buffer, the first and second memory operations are executed at a first operating frequency and a second operating frequency, respectively. The first and second operating frequencies are different from one another, according to an embodiment of the present invention.
In step 550, after the first and second memory operations associated with the first and second memory buffers, respectively, are executed, the first and second memory buffers are de-allocated from their respective memory spaces. With the de-allocation of the first and second memory buffers, memory buffers associated with other memory operations can be allocated to the free memory space.
Various aspects of the present invention may be implemented in software, firmware, hardware, or a combination thereof.
It should be noted that the simulation, synthesis and/or manufacture of various embodiments of this invention may be accomplished, in part, through the use of computer readable code, including general programming languages (such as C or C++), hardware description languages (HDL) such as, for example, Verilog HDL, VHDL, Altera HDL (AHDL), or other available programming and/or schematic capture tools (such as circuit capture tools). This computer readable code can be disposed in any known computer-usable medium including a semiconductor, magnetic disk, optical disk (such as CD-ROM, DVD-ROM). As such, the code can be transmitted over communication networks including the Internet. It is understood that the functions accomplished and/or structure provided by the systems and techniques described above can be represented in a core (such as a GPU core) that is embodied in program code and can be transformed to hardware as part of the production of integrated circuits.
Computer system 600 includes one or more processors, such as processor 604. Processor 604 may be a special purpose or a general purpose processor. Processor 604 is connected to a communication infrastructure 606 (e.g., a bus or network).
Computer system 600 also includes a main memory 608, preferably random access memory (RAM), and may also include a secondary memory 610. Secondary memory 610 can include, for example, a hard disk drive 612, a removable storage drive 614, and/or a memory stick. Removable storage drive 614 can include a floppy disk drive, a magnetic tape drive, an optical disk drive, a flash memory, or the like. The removable storage drive 614 reads from and/or writes to a removable storage unit 618 in a well known manner. Removable storage unit 618 can comprise a floppy disk, magnetic tape, optical disk, etc. which is read by and written to by removable storage drive 614. As will be appreciated by persons skilled in the relevant art, removable storage unit 618 includes a computer-usable storage medium having stored therein computer software and/or data.
In alternative implementations, secondary memory 610 can include other similar devices for allowing computer programs or other instructions to be loaded into computer system 600. Such devices can include, for example, a removable storage unit 622 and an interface 620. Examples of such devices can include a program cartridge and cartridge interface (such as those found in video game devices), a removable memory chip (e.g., EPROM or PROM) and associated socket, and other removable storage units 622 and interfaces 620 which allow software and data to be transferred from the removable storage unit 622 to computer system 600.
Computer system 600 can also include a communications interface 624. Communications interface 624 allows software and data to be transferred between computer system 600 and external devices. Communications interface 624 can include a modem, a network interface (such as an Ethernet card), a communications port, a PCMCIA slot and card, or the like. Software and data transferred via communications interface 624 are in the form of signals which may be electronic, electromagnetic, optical, or other signals capable of being received by communications interface 624. These signals are provided to communications interface 624 via a communications path 626. Communications path 626 carries signals and can be implemented using wire or cable, fiber optics, a phone line, a cellular phone link, a RF link or other communications channels.
In this document, the terms “computer program medium” and “computer-usable medium” are used to generally refer to media such as removable storage unit 618, removable storage unit 622, and a hard disk installed in hard disk drive 612. Computer program medium and computer-usable medium can also refer to memories, such as main memory 608 and secondary memory 610, which can be memory semiconductors (e.g., DRAMs, etc.). These computer program products provide software to computer system 600.
Computer programs (also called computer control logic) are stored in main memory 608 and/or secondary memory 610. Computer programs may also be received via communications interface 624. Such computer programs, when executed, enable computer system 600 to implement embodiments of the present invention as discussed herein. In particular, the computer programs, when executed, enable processor 604 to implement processes of embodiments of the present invention, such as the steps in the methods illustrated by flowchart 500 of
Embodiments of the present invention are also directed to computer program products including software stored on any computer-usable medium. Such software, when executed in one or more data processing device, causes a data processing device(s) to operate as described herein. Embodiments of the present invention employ any computer-usable or -readable medium, known now or in the future. Examples of computer-usable mediums include, but are not limited to, primary storage devices (e.g., any type of random access memory), secondary storage devices (e.g., hard drives, floppy disks, CD ROMS, ZIP disks, tapes, magnetic storage devices, optical storage devices, MEMS, nanotechnological storage devices, etc.), and communication mediums (e.g., wired and wireless communications networks, local area networks, wide area networks, intranets, etc.).
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be understood by persons skilled in the relevant art that various changes in form and details can be made therein without departing from the spirit and scope of the invention as defined in the appended claims. It should be understood that the invention is not limited to these examples. The invention is applicable to any elements operating as described herein. Accordingly, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
This application is a continuation of U.S. patent application Ser. No. 12/881,663, filed Sep. 14, 2010, which is incorporated by reference as if fully set forth.
Number | Name | Date | Kind |
---|---|---|---|
4530090 | Priamo | Jul 1985 | A |
4912698 | Bitzinger | Mar 1990 | A |
4928234 | Kitamura | May 1990 | A |
4970666 | Welsh | Nov 1990 | A |
5392443 | Sakakibara | Feb 1995 | A |
5526507 | Hill | Jun 1996 | A |
6032219 | Robinson | Feb 2000 | A |
6260088 | Gove | Jul 2001 | B1 |
6347344 | Baker | Feb 2002 | B1 |
6347366 | Cousins | Feb 2002 | B1 |
6377268 | Jeddeloh | Apr 2002 | B1 |
6401176 | Fadavi-Ardekani | Jun 2002 | B1 |
6658546 | Calvignac | Dec 2003 | B2 |
6678813 | Le | Jan 2004 | B1 |
6769047 | Kurupati | Jul 2004 | B2 |
6948030 | Gupta | Sep 2005 | B1 |
6965974 | Bays | Nov 2005 | B1 |
7003628 | Wiedenman | Feb 2006 | B1 |
7240143 | Scheffler | Jul 2007 | B1 |
7360035 | Jenkins | Apr 2008 | B2 |
7747989 | Kissell | Jun 2010 | B1 |
7991921 | Fischer | Aug 2011 | B2 |
8065493 | Burchard | Nov 2011 | B2 |
8510496 | Totolos, Jr. | Aug 2013 | B1 |
20010007533 | Kobayashi | Jul 2001 | A1 |
20010007538 | Leung | Jul 2001 | A1 |
20010011356 | Lee | Aug 2001 | A1 |
20020050959 | Buckelew | May 2002 | A1 |
20020118693 | Calvignac | Aug 2002 | A1 |
20020145611 | Dye | Oct 2002 | A1 |
20020184455 | Cho | Dec 2002 | A1 |
20030016578 | Janik | Jan 2003 | A1 |
20030084246 | Tran | May 2003 | A1 |
20030110329 | Higaki | Jun 2003 | A1 |
20040030849 | Borkenhagen | Feb 2004 | A1 |
20040078532 | Tremaine | Apr 2004 | A1 |
20040133754 | Zsohar | Jul 2004 | A1 |
20040177216 | Asari | Sep 2004 | A1 |
20050080874 | Fujiwara | Apr 2005 | A1 |
20050086425 | Okuno | Apr 2005 | A1 |
20050108492 | Zsohar | May 2005 | A1 |
20050160406 | Duncan | Jul 2005 | A1 |
20050253858 | Ohkami | Nov 2005 | A1 |
20060012603 | Lindholm | Jan 2006 | A1 |
20060221945 | Chin | Oct 2006 | A1 |
20060268649 | Tokieda | Nov 2006 | A1 |
20070002880 | Chien | Jan 2007 | A1 |
20070147115 | Lin | Jun 2007 | A1 |
20070180216 | Brown | Aug 2007 | A1 |
20080147915 | Kleymenov | Jun 2008 | A1 |
20090055580 | Moscibroda | Feb 2009 | A1 |
20090077403 | Hayashi | Mar 2009 | A1 |
20090164677 | Ware | Jun 2009 | A1 |
20090172499 | Olbrich | Jul 2009 | A1 |
20090234989 | Fischer | Sep 2009 | A1 |
20090248922 | Hinohara | Oct 2009 | A1 |
20090249393 | Shelton | Oct 2009 | A1 |
20090322784 | Sartori | Dec 2009 | A1 |
20090327596 | Christenson | Dec 2009 | A1 |
20100169519 | Zhang | Jul 2010 | A1 |
20110022791 | Iyer | Jan 2011 | A1 |
20110060868 | Haukness | Mar 2011 | A1 |
20110296120 | Khan | Dec 2011 | A1 |
20110296415 | Khan | Dec 2011 | A1 |
20110302376 | Zhou | Dec 2011 | A1 |
20120201088 | Rajan | Aug 2012 | A1 |
20130339631 | Ware | Dec 2013 | A1 |
20140192583 | Rajan | Jul 2014 | A1 |
Number | Date | Country |
---|---|---|
1513071 | Mar 2005 | EP |
2002-342266 | Nov 2002 | JP |
2008-060700 | Mar 2008 | JP |
2008-287717 | Nov 2008 | JP |
2006134550 | Dec 2006 | WO |
Entry |
---|
International Search Report and Written Opinion, dated Dec. 7, 2011, for PCT Appln. No. PCT/US2011/051156, 9 pages. |
Number | Date | Country | |
---|---|---|---|
20180239722 A1 | Aug 2018 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12881663 | Sep 2010 | US |
Child | 15958805 | US |