The present disclosure relates generally to memory devices and relates more particularly to scheduling memory accesses for a memory device.
Memory latency and bandwidth are significant performance bottlenecks in many processing systems. These performance factors typically are impacted by memory cell access times and the propagation delays of signals traversing various interconnects within the memory. In conventional memory controller implementations, a worst-cast approach is taken for scheduling memory accesses to a memory subsystem whereby the memory controller assumes fixed worst-case memory timing parameters that apply to the entire memory subsystem.
The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference symbols in different drawings indicates similar or identical items.
Stacked-die, or three-dimensional (3D), dynamic random access memory (DRAM)-based memory subsystems often are considered for use as large on-chip caches. However, due to certain aspects of the DRAM architecture, due to die-to-die process variation, and due to the interconnect configurations resulting the stacked arrangement of the memory dies in a stacked-die memory, such memory subsystems often exhibit significant region-to-region variations in memory timing. This, in conjunction with ability to locally store memory timing information on a region-by-region basis makes stacked-die memory well suited for region-based memory access scheduling. Accordingly, example region-based memory access scheduling techniques are described herein in the example context of stacked-die DRAM memory. However, these techniques are not limited to this example context, but also are equally applicable to any variety of stacked or non-stacked memory architectures including, but not limited to, static random access memory (SRAM), ferroelectric RAM (F-RAM), magneto-resistive RAM (MRAM), Phase-Change RAM, resistive RAM, Flash memory, and the like.
In the depicted example, the processing system 100 includes one or more processors 101, one or more memory controllers 102, and a memory array 104. The processor 101 includes one or more processor cores 116, such as one or more central processing units (CPU), graphics processing units (GPU), digital signal processors (DSP), and the like. The processor 101 further includes an integrated northbridge 120, an input/output (I/O) controller 122, one or more levels of cache 118, and the like. The northbridge 120 is connected to the memory controller 102 via host bus 126. The host bus 126 can be implemented in accordance with any of a variety of conventional interconnect or bus architectures, such as a Peripheral Component Interconnect-Express (PCI-E) architecture, a HyperTransport architecture, a QuickPath Interconnect (QPI) architecture, and the like. Alternatively, host bus 126 can be implemented in accordance with a proprietary bus architecture. The I/O controller 122 can be connected to one or more external devices 124, such as one or more external interfaces to display components, storage devices, input devices (e.g., a mouse or keyboard), and the like.
The processing components of system 100 are coupled to a memory subsystem comprised of the memory controller 102 and the memory array 104. The memory array 104 may implement any variety of storage devices including, but not limited to, memory architectures such as dynamic random access memory (DRAM), static random access memory (SRAM), ferroelectric RAM (F-RAM) magneto-resistive RAM (MRAM) and the like. For the purposes of this disclosure, various embodiments are described in the context of DRAM arrays and DRAM memory controllers. In this example context, the memory array 104 comprises a plurality of DRAM arrays 106. As described in detail below, the memory subsystem may be implemented as stacked-die memory devices and thus the DRAM arrays 106 may be implemented as, for example, separate memory die within the stacked-die memory.
In operation, the memory controller 102 services memory read and memory write requests issued by the processing elements of the processing system 100. To this end, the memory controller 102 implements controller logic 108 for accessing the DRAM arrays 106 and for interfacing with the host bus 126. In response to memory access requests, the controller logic 108 issues commands to the DRAM arrays 106. In response to a read request for a corresponding address, the controller logic 108 transmits signaling to the DRAM arrays 106 to read stored data at the addressed location. In response to a write request, the controller logic 108 transmits signaling to the DRAM arrays 106 to write data to the addressed location.
The correct operation of each memory transaction is predicated on the controller logic 108 adhering to certain memory timing parameters, such as the RAS precharge, or “tRP” timing parameter, which reflects the time needed between disabling access to one line of data and enabling access to another line of data. In implementation, some or all of the memory timing parameters may vary depending on the region of the memory array 104 being accessed. Accordingly, in some embodiments, the controller logic 108 is configured to buffer, reorder, or schedule memory requests based on region-specific memory timing parameters so as to improve performance and efficiency. To this end, the memory controller 102 also implements profiling logic 110 and a timing data store 112 to determine and store region-specific memory timing information. As described in more detail below with reference to
In response to a trigger event, the profiling logic 110 initiates a test process to measure the corresponding memory timing parameters for each region of the applicable regions of the memory array 104. This trigger event can include, for example, a power-on reset, a lapse of a timer, a command from the processor 101 or other controller, a sensed change in temperature that exceeds a threshold (and thereby indicating a likely temperature-induced change in the timing parameters), a trigger initiated during a dynamic voltage and frequency scaling (DVFS) operation, and the like. The test process can include, for example, repeatedly writing and subsequently accessing a test pattern to the corresponding region while varying the values for the applicable memory timing parameters. The timing parameters could be also updated by performing a lookup to a table that maintains latency values under different temperature or service times or stores representations of formulae that may be used to calculate the latency using temperature and service time as an input. When reliable minimum memory timing parameters are identified for the region via this test process, the profiling logic 110 can store data representative of these identified memory timing parameters to the timing data store 112. In some embodiments, the timing data store 112 is implemented as a set of registers, cache, or CAM and the data is stored in a location associated with the region. In other embodiments, the timing data store 112 is implemented as a bloom filter or other compressed data structure, and this data is incorporated into the compressed data structure accordingly.
In the course of typical operation, the memory parameter timing information is accessible from the timing data store 112 by the controller logic 108. In response to receiving a memory access request from the processor 101, the controller logic 108 uses the address targeted by the memory access request to access the memory timing information associated with the region associated with the targeted address. With this region-specific memory timing information, the controller logic 108 can manage the timing of memory commands sent to the memory array 104 according the stored memory timing parameters for a given region of memory so as to more efficiently schedule memory accesses. For example, when a memory read request is received by the memory controller 102, the controller logic 108 identifies the region of memory to be accessed based on the address of the memory read request and then communicates with the profiling logic 110 and timing data store 112 to determine the memory timing parameters to that region. The controller logic then schedules and transmits commands to DRAM arrays 106 according to the stored timing parameters. As described in great detail below with reference to
Significant variations in memory timing parameters are often encountered in stacked arrangement of memory dies. Memory controllers implementing region-based memory access scheduling techniques are able to exploit these variations to more efficiently schedule memory requests and are thus well-suited to stacked arrangements of memory die.
The die 202 implements the processing components of the processor 101 as described above, including one or more processor cores 221 and 222 and a cache 224, and is thus referred to herein as the “processor die 202”. The die 204 implements the memory controller 102, which includes the profiling logic 110. The memory controller 102 implements hardwired logic for accessing the memory circuitry of stacked die 206-212 as well as interfacing to the die 202 to service read and write memory access requests. The dies 202 and 204 also can include any of a variety of peripheral devices such as northbridges, southbridges, input/output controllers, network interfaces, and the like. The dies 202 and 204 thus are referred to herein as “logic dies.”
The timing data store 112 may be implemented as a cache, CAM, bloom filter, or register set on a logic die of the stacked-die processing system 200. To illustrate, the timing data store 112 may be implemented as non-volatile memory (e.g., flash memory) on die 204 along with the other components of the memory controller 102 to facilitate timely access to the memory timing information by the profiling logic and controller logic 108 of the memory controller 102. In other embodiments, stacked-die processing system 200 can take advantage of the sizeable storage capacity of the memory array 104 by implementing the timing data store 112 in the memory cell circuitry 230 of one or more of the memory dies. In yet other embodiments, the timing data store 112 may be implemented using both logic die and memory die. To illustrate, the timing data store 112 may be implemented as a small cache on a logic die and a larger cache on a memory die, with the small cache storing the memory timing information for the most recently accessed regions of memory and larger cache storing the memory timing information for all of the memory regions.
In some embodiments, the components of the processor 101 and the memory controller 102 may be partitioned in other combinations across multiple dies or implemented on single die. To illustrate, the memory controller 102 and the processor 101 may be implemented on the same die. As another example, the processor 101 may be implemented on one die, the controller logic 108 (
Stacked memory subsystems typically employ a separate logic layer to implement the memory controller and related logic functions so that a different semiconductor fabrication process can be used to fabricate the logic dies 204 than that used to fabricate the memory layers 206-212. Accordingly, the vertical die stack of processing system 200 may be fabricated using any of a variety of 3D integrated circuit fabrication processes. In one approach, the die 202-212 each are implemented as a separate substrate (e.g., bulk silicon) with active devices and one or more metal routing layers formed at an active surface. This approach can include a wafer-on-wafer process whereby a wafer comprising a matrix of dice is fabricated and thinned, and TSVs are etched through the bulk silicon. Multiple wafers are then stacked to achieve the illustrated layer configuration (e.g., a stack of four wafers comprising memory circuitry die for the three memory layers and a wafer comprising the logic die for a logic layer), aligned, and then joined via thermocompression. The resulting stacked wafer set is singulated to separate the individual 3D IC devices.
In a die-on-die process, the wafer implementing each corresponding layer is first singulated, and then the die are separately stacked and joined to fabricate the 3D IC devices. In a die-on-wafer approach, wafers for one or more layers are singulated to generate the die for one or more layers, and these die are then aligned and bonded to the corresponding die areas of another wafer, which is then singulated to produce the individual 3D IC devices. One benefit of fabricating the die 202-212 on separate wafers is that a different fabrication process can be used to fabricate the logic layers (die 202-204) than that used to fabricate the memory die (die 206-212). Thus, a fabrication process that provides improved performance and lower power consumption may be used to fabricate dies 202 and 204 (and thus provide faster and lower-power interface logic and circuitry for the memory controller 214 and profiler 216), whereas a fabrication process that provides improved cell density and improved leakage control may be used to fabricate the dies 206-212 (and thus provide more dense, lower-leakage bitcells for the stacked memory).
In another approach, the layers 202-212 are fabricated using a monolithic 3D fabrication process whereby a single substrate is used and each die layer is formed on a preceding die layer using a layer transfer process, such as an ion-cut process. The stacked memory device also may be fabricated using a combination of techniques. For example, logic layers (dies 204 and 206) may be fabricated using a monolithic 3D technique, the memory layers (dies 206-212) may be fabricated using a die-on-die or wafer-on-wafer technique, or vice versa, and the resulting logic layer stack and memory layer stack then may be bonded together and then to bonded to the interposer substrate.
Although the processing system 100 and the processing system 200 each implement a single instance of memory controller 102, in other embodiments multiple memory controllers may be implemented in order to increase system performance, with each memory controller controlling separate portions of system memory. Each of the multiple memory controllers may implement separate instances of the region-based memory access scheduling techniques described herein.
The depicted processing system 300 may be implemented with a variety packaging techniques. In some embodiments, memory subsystem implemented as dies 304-312 may be implemented as a separate vertical stack, horizontally disposed on the surface of an interposer along with die 302. In this embodiment, the host bus 322 can include conductors implemented in the metal layers of the interposer. In another embodiment, the processor die 302 and the stack of dies 304-312 may be contained in separate integrated circuit (IC) packages. In such implementations, the IC packages may be placed on a circuit board or similar substrate and the host bus 322 implemented using conductors of the circuit board, including pads, pins, traces, vias, solder bumps and the like.
The scheduler 420 selects memory access requests from queues 413 and 414 and translates those requests into DRAM command sequences. In some embodiments, the scheduler 420 implements a set of algorithms and heuristics for reordering queued memory access requests. These algorithms and heuristics may consider DRAM structure such as banks, rows and columns, as well as region-specific memory timing parameters in order to generate a schedule of access to the memory array 104 (
Each DRAM operation generates different demands and claims on various DRAM resources such as the internal banks, address lines and data lines. When creating a schedule of memory access requests, the scheduler 420 takes into account potential resource conflicts and ensures that the required resources are available at the appropriate time. In addition to considering resource constraints, the scheduler 420 also generates a schedule that conforms to the DRAM timing constraints and timing parameters in order to ensure correct operation. Examples of DRAM timing parameters considered by the scheduler 420 include, but are not limited to: the row to column command delay (tRCD) parameter, the time between column command and data out (tCL) parameter, the time between column commands (tCCD) parameter, the precharge time (tRP) parameter, and the minimum row open time (tRAS) parameter.
In conventional systems, these memory timing parameters are typically loaded into the memory controller and applied for scheduling memory access requests without specific regard to the region of memory being accessed. However, as described above, the memory controller 102, in some embodiments, considers the memory timing parameters on a region-by-region basis when scheduling memory access requests. To this end, the profiling logic 410 implements logic to test and measure one or more of the memory timing parameters for each region of a set of regions of the memory array 104. For a DRAM implementation, DRAM timing parameters measured may include, but are not limited to tRCD (row to column command delay), tCL (time between column command and data out), tCCD (time between column commands), tRP (precharge time), tRAS (minimum row open time), tFAW (multi-bank activation window), tWTR (time between read and write), tWR (write recovery time), and the like. For arrays utilizing memory technologies other than DRAM, the timing information may include other timing parameters as appropriate for those types of memory. The profiling logic 410 may accomplish measurement of timing parameters by generating sequences of memory read and memory write commands and observing the results while varying the parameter being tested. In the depicted example, the profiling logic 410 generates sequences of commands and sends them to the arbiter 406 to be eventually placed into the queues 413 and 414. In other embodiments, the memory controller 102 may implement separate queues for the testing sequences generated by the profiling logic 410. The regions of the memory array 104 may be cell-based, row-based, bank-based, rank-based, chip-based, and the like. Moreover, the profiling logic 410 may maintain different levels of region granularity. For example, the profiling logic 410 may determine DRAM timing parameters that apply to each row of one bank, but determine DRAM timing parameters on a sub-regions of another bank (e.g., based on a row-by-row basis for the other bank). In such instances, a sub-region is a smaller unit of granularity compared to the corresponding region (e.g., a sub-region of a bank region could be a row, a sub-region of a row region could be a bit cell or a subset of the bit cells of the row, etc.) Moreover, the granularity of the memory region may depend on the memory timing parameters. For example, the profiling logic 410 may track the tRP parameters on a bank-by-bank basis while tracking the tRCD parameters on a row-by-row basis.
The region-by-region timing parameter analysis may be triggered by any of a variety of trigger events. For example, the profiling logic 410 may include one or more static or configurable trigger timers to initiate periodic reevaluation. As another example, the trigger events can include certain system events or signaling directed by other system components, such as reevaluation signaling initiated by an operating system, hypervisor, basic input output system (BIOS), firmware or a combination thereof. To illustrate, at system boot-up the BIOS may direct the profiling logic 410 to perform an initial set of measurements early in the boot process and then periodically at some pre-determined interval. In another example, detection of a correctable or uncorrectable memory error may cause the operating system or hypervisor to trigger the profiling logic 410. As circuit operation often is influenced by temperature, the profiling logic 410 may use a temperature threshold or a temperature change threshold trigger to initiate reevaluation of the memory timing parameters.
The timing data store 412 receives and stores the DRAM timing parameters generated by the profiling logic 410 so as to be subsequently accessible by scheduler 420. In some embodiments, this timing information may be stored as the actual value of the parameter. For example, the tRP parameter stored for a given row may be the actual number of cycles required to recharge the given row. Alternatively, the timing information may be categorized into several levels or bins, and an indication of the bin number is stored. For example the tRP parameter could be classified into 4 levels or bins, such as: <=7 cycles; 8-9 cycles; 10-11 cycles; and 12-13 cycles.
The timing data store 412 may be implemented by any number of storage mechanisms, including but not limited to bloom filters, content addressable memory (CAMs), caches, registers and the like. The storage mechanism may be chosen based on the size and number of stored parameters in order to reduce storage space. In some embodiments, the timing data store 412 may implement different types of storage (bloom filters, CAM arrays and such) for each timing parameter tracked by the profiling logic 410. To illustrate, a bloom filter may be used to store a binned tRP value as described above. One bloom filter is implemented for each tRP bin. Bloom filter 1 represents bin 1, bloom filter 2 represents bin 2 and so on. In operation, the profiling logic 410 determines that the tRP for say, row 5, is classified into the second bin. The profiling logic 410 sends the row number and bin number to the timing data store 412. The timing data store 412 enters the index of row 5 into the second tRP bloom filter. Later in subsequent operations, the scheduler 420 may consult the timing data store 412 to determine the tRP for row 5. It sends the row number (5) and an indication of the requested parameter (tRP) to timing data store 412. The timing data store 412 consults the tRP bloom filters, and returns the tRP bin for row 5 back to the scheduler 420.
With the region-by-region memory timing parameters identified and stored in the timing data store 412, the scheduler 420 may utilize the stored region-based memory timing parameters to more optimally schedule memory access requests based on the regions they target. Thus, in response to receiving a memory access request to a target address, the scheduler 420 accesses from the timing data store 412 the DRAM timing parameters associated with the region containing the target address, and uses the accessed DRAM timing parameters rather than (or in addition to) memory-wide timing parameters to generate more optimized memory access schedules. For example, the tRCD (the row to column command delay) timing parameter is a fundamental portion of the overall latency of a DRAM read operation. tRCD represents the minimum delay required between an ‘activation row’ DRAM command and the subsequent ‘column read’ DRAM command. Suppose for a given bank tRDC is 5 cycles for some rows and is 4 cycles for other faster rows. A conventional memory controller would use the most conservative timing of tRCD=5 for all rows. In contrast, the present invention having stored data representative of tRCD for each row in the timing data store 412, allows the scheduler 420 to utilize tRCD=4 for the faster rows, thus reducing by one cycle the latency of a DRAM read operation to those rows. The profiling logic 410 implements logic to test and measure one or more of the memory timing parameters for each region of a set of regions of the memory array 104.
At block 502, the profiling logic 410 initiates the region-by-region memory profiling in response to a trigger event, such as an interval timeout or other system event as described earlier. At block 504, the profiling logic 410 selects a DRAM row buffer of a DRAM array 106 (
With the memory timing parameters identified for a given region and stored in the timing data store 412, the scheduler 420 may utilize the stored region-based memory timing parameters to more optimally schedule memory access requests. The resulting improvement in memory latency and bandwidth may be better understood by examining a timing diagram for a DRM ‘read’ operation.
At clock cycle T0, the memory controller 102 issues an ‘activate row’ command 600 to row X of the memory array 104 (
If the memory controller 102 were to use the conventional worst-case timing parameters across the entire memory, the tRCD parameter for the entire memory would have been set to 6 and thus the ‘Read’ command 608 would not have been issued until 6 cycles after the ‘activate row’ command 606. Likewise, the memory controller 102 would have also assumed that the memory data 610 would not be returned until 6 cycles after the ‘column read’ command 608 was issued. However, by using profiling logic 410 to exploit the variations in the tRCD and tCL timing parameters on a row-by-row basis, the memory controller 102 is able to issue the ‘column read’ command 608 to row Y two cycles earlier and access the resulting memory data 610 one cycle earlier, thereby reducing the latency of the memory data 610 returned for the ‘read’ command 608 to row Y by three cycles, and thereby increasing the overall processing performance of the system.
In some embodiments, the apparatus and techniques described above are implemented in a system comprising one or more integrated circuit (IC) devices (also referred to as integrated circuit packages or microchips), such as the processing systems of
A computer readable storage medium may include any storage medium, or combination of storage media, accessible by a computer system during use to provide instructions and/or data to the computer system. Such storage media can include, but is not limited to, optical media (e.g., compact disc (CD), digital versatile disc (DVD), Blu-Ray disc), magnetic media (e.g., floppy disc, magnetic tape, or magnetic hard drive), volatile memory (e.g., random access memory (RAM) or cache), non-volatile memory (e.g., read-only memory (ROM) or Flash memory), or microelectromechanical systems (MEMS)-based storage media. The computer readable storage medium may be embedded in the computing system (e.g., system RAM or ROM), fixedly attached to the computing system (e.g., a magnetic hard drive), removably attached to the computing system (e.g., an optical disc or Universal Serial Bus (USB)-based Flash memory), or coupled to the computer system via a wired or wireless network (e.g., network accessible storage (NAS)).
At block 702 a functional specification for the IC device is generated. The functional specification (often referred to as a micro architecture specification (MAS)) may be represented by any of a variety of programming languages or modeling languages, including C, C++, SystemC, Simulink, or MATLAB.
At block 704, the functional specification is used to generate hardware description code representative of the hardware of the IC device. In some embodiments, the hardware description code is represented using at least one Hardware Description Language (HDL), which comprises any of a variety of computer languages, specification languages, or modeling languages for the formal description and design of the circuits of the IC device. The generated HDL code typically represents the operation of the circuits of the IC device, the design and organization of the circuits, and tests to verify correct operation of the IC device through simulation. Examples of HDL include Analog HDL (AHDL), Verilog HDL, System Verilog HDL, and VHDL. For IC devices implementing synchronized digital circuits, the hardware descriptor code may include register transfer level (RTL) code to provide an abstract representation of the operations of the synchronous digital circuits. For other types of circuitry, the hardware descriptor code may include behavior-level code to provide an abstract representation of the circuitry's operation. The HDL model represented by the hardware description code typically is subjected to one or more rounds of simulation and debugging to pass design verification.
After verifying the design represented by the hardware description code, at block 706 a synthesis tool is used to synthesize the hardware description code to generate code representing or defining an initial physical implementation of the circuitry of the IC device. In some embodiments, the synthesis tool generates one or more netlists comprising circuit device instances (e.g., gates, transistors, resistors, capacitors, inductors, diodes, etc.) and the nets, or connections, between the circuit device instances. Alternatively, all or a portion of a netlist can be generated manually without the use of a synthesis tool. As with the hardware description code, the netlists may be subjected to one or more test and verification processes before a final set of one or more netlists is generated.
Alternatively, a schematic editor tool can be used to draft a schematic of circuitry of the IC device and a schematic capture tool then may be used to capture the resulting circuit diagram and to generate one or more netlists (stored on a computer readable media) representing the components and connectivity of the circuit diagram. The captured circuit diagram may then be subjected to one or more rounds of simulation for testing and verification.
At block 708, one or more EDA tools use the netlists produced at block 706 to generate code representing the physical layout of the circuitry of the IC device. This process can include, for example, a placement tool using the netlists to determine or fix the location of each element of the circuitry of the IC device. Further, a routing tool builds on the placement process to add and route the wires needed to connect the circuit elements in accordance with the netlist(s). The resulting code represents a three-dimensional model of the IC device. The code may be represented in a database file format, such as, for example, the Graphic Database System II (GDSII) format. Data in this format typically represents geometric shapes, text labels, and other information about the circuit layout in hierarchical form.
At block 710, the physical layout code (e.g., GDSII code) is provided to a manufacturing facility, which uses the physical layout code to configure or otherwise adapt fabrication tools of the manufacturing facility (e.g., through mask works) to fabricate the IC device. That is, the physical layout code may be programmed into one or more computer systems, which may then control, in whole or part, the operation of the tools of the manufacturing facility or the manufacturing operations performed therein.
Note that not all of the activities or elements described above in the general description are required, that a portion of a specific activity or device may not be required, and that one or more further activities may be performed, or elements included, in addition to those described. Still further, the order in which activities are listed are not necessarily the order in which they are performed.
Also, the concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any feature(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature of any or all the claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2013/075692 | 5/16/2013 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2014/183287 | 11/20/2014 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4072932 | Kitagawa | Feb 1978 | A |
5544124 | Zagar | Aug 1996 | A |
5778413 | Stevens | Jul 1998 | A |
6092180 | Anderson | Jul 2000 | A |
6108745 | Gupta et al. | Aug 2000 | A |
6779096 | Cornelius | Aug 2004 | B1 |
6877078 | Fujiwara et al. | Apr 2005 | B2 |
6886120 | Yamazaki | Apr 2005 | B2 |
7130967 | Arimilli et al. | Oct 2006 | B2 |
8749267 | Riho | Jun 2014 | B2 |
8988919 | Riho | Mar 2015 | B2 |
20010056521 | Fujiwara et al. | Dec 2001 | A1 |
20020188899 | Yamazaki et al. | Dec 2002 | A1 |
20020194444 | Goodrich, II | Dec 2002 | A1 |
20040123036 | Hammitt | Jun 2004 | A1 |
20040186973 | Moyer | Sep 2004 | A1 |
20050002245 | Brede | Jan 2005 | A1 |
20060112250 | Walker | May 2006 | A1 |
20090093113 | Flake | Apr 2009 | A1 |
20090206870 | Huang | Aug 2009 | A1 |
20090292971 | Man | Nov 2009 | A1 |
20100072617 | Supriya | Mar 2010 | A1 |
20100211745 | Jeddeloh | Aug 2010 | A1 |
20100238693 | Jeddeloh | Sep 2010 | A1 |
20120159230 | Chen | Jun 2012 | A1 |
20130073878 | Jayasimha | Mar 2013 | A1 |
20130093492 | Riho | Apr 2013 | A1 |
20130094272 | Riho | Apr 2013 | A1 |
20140232429 | Riho | Aug 2014 | A1 |
20140247683 | Riho | Sep 2014 | A1 |
20140317334 | Gadsing | Oct 2014 | A1 |
20150371689 | Li | Dec 2015 | A1 |
Number | Date | Country |
---|---|---|
101017468 | Aug 2007 | CN |
2006058115 | Jun 2006 | WO |
Entry |
---|
Memory Access Scheduling by Rixner; Stanford 2000. |
The Bloomier Filter: An Efficient Data Structure for Static Support Lookup Tables by Chazelle (Year: 2004). |
System-level Power/performance Evaluation of 3D stacked DRAMs for Mobile Applications by Facchini (Year: 2009). |
Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era by Kaseridis (Year: 2011). |
PicoServer: Using 3D Stacking Technology to Enable a Compact Energy Efficient Chip Multiprocessor by Kgil (Year: 2006). |
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors by Mutlu (Year: 2007). |
3D-Stacked Memory Architectures for Multi-Core Processors by Loh (Year: 2008). |
Memory Access Scheduling by Rixner (Year: 2000). |
A Burst Scheduling Access Reordering Mechanism by Shao (Year: 2007). |
International Search Report and Written Opinion for PCT/CN2013/01592 dated Feb. 20, 2014, 8 pages. |
Serkan Ozdemier et al., “Yield-Aware Cache Architectures”, 39th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 9, 2013, 11 pages. |
Jamie Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh”, Proceedings of the 39th Annual International Symposium on Computer Architecture, Jun. 1, 2012, 12 pages. |
Gabriel H. Loh et al., “Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches”, The 44th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 4, 2011, 11 pages. |
Bo Zhao et al., “Variation-Tolerant Non-Uniform 3D Cache Management in Die Stacked Multicore Processor”, The 42th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 12, 2009, 10 pages. |
Supplementary European Search Report issued in Application No. 13884792.6 dated Dec. 23, 2016, 7 pages. |
English translation of Non-Final Office Action dated Apr. 4, 2017 for Japanese Application No. 2016-513194, 13 pages. |
International Search Report and Written Opinion correlating to PCT/CN2013/075692 dated Feb. 20, 2014, 8 pages. |
Translation and Decision of Refusal dated Sep. 5, 2017 for Japanese Application No. JP2016-513194, 5 pages. |
Translation of Office Action dated Jul. 11, 2018 for corresponding China Application No. 201380076650.8, 18 pages. |
Translation of Appeal Decision dated Oct. 30, 2018 for corresponding Japan Application No. 2016-513194, 52 pages. |
European Examination Report dated Dec. 17, 2018 for corresponding European Application No. 13884792.6, 45 pages. |
Second Office Action dated May 5, 2019 for Chinese Application No. 201380076650.8, 32 pages. |
Summons to Oral Proceedings dated Aug. 14, 2019 for European Patent Application No. 13884792.6, 6 pages. |
Korean Office Action dated Aug. 21, 2019 for Korean Patent Application No. 10-2015-7035441, 10 pages. |
Korean Office Action dated Mar. 30, 2020 for Korean Patent Application No. 10-2015-7035441, 15 pages. |
Office Action dated Nov. 26, 2019 in Chinese Application No. 201380076650.8, 41 pages. |
Office Action dated Jun. 22, 2020 for Chinese Patent Application No. 201380076650.8, 31 pages. |
Number | Date | Country | |
---|---|---|---|
20160124873 A1 | May 2016 | US |