Processing elements send memory access requests to memory systems. The processing elements may resend the same memory access request if the data received back from the memory system is invalid, incorrect, corrupted, etc. The processing element will fail if the requested data is still invalid or incorrect after multiple memory access attempts. The processing element may then need to be restarted or rebooted.
In many computer architectures different devices, applications, or elements request data from the same memory system. The memory accesses from these different devices and applications can also be abstracted by other devices or processing elements. For example, an operating system may break a read request from a software application into a plurality of different individual read operations.
Memory access requests from different sources and the abstractions made by other processing elements make it difficult to correctly identify repeated memory access requests that are associated with incorrect or corrupted data.
Referring to
The initiators 100 and targets 300 can be directly connected, or connected to each other through a network or fabric. In some embodiments, the initiators 100 are servers, server applications, routers, switches, client computers, personal computers, Personal Digital Assistants (PDA), smart phones, or any other wired or wireless computing device that needs to access the data in targets 300.
In one embodiment, the initiators 100 may be stand-alone appliances, devices, or blades, and the targets 300 are stand-alone storage arrays. In some embodiments, the initiators 100, storage proxy 200, and targets 300 are each coupled to each other via wired or wireless Internet connections 12. In other embodiments, the initiators 100 may be a processor or applications in a personal computer or server that accesses one or more targets 300 over an internal or external data bus. The targets 300 in this embodiment could be located in the personal computer or server 100, or could also be a stand-alone device coupled to the computer/initiators 100 via a computer bus or packet switched network connection.
The storage proxy 200 could be hardware and/or software located in a storage appliance, wireless or wired router, gateway, firewall, switch, or any other computer processing system. The storage proxy 200 provides an abstraction of physical disks 500 in targets 300 as virtual disks 400. In one embodiment, the physical disks 500 and the virtual disks 400 may be identical in size and configuration. In other embodiments the virtual disks 400 could consist of stripes of data or volumes of data that extend across multiple different physical disks 500.
Different communication protocols can be used over connections 12 between initiators 100 and targets 300. Typical protocols include Fibre Channel Protocol (FCP), Small Computer System Interface (SCSI), Advanced Technology Attachment (ATA) and encapsulated protocols such as Fibre Channel over Ethernet (FCoE), Internet Small Computer System Interface (ISCSI), Fibre Channel over Internet Protocol (FCIP), ATA over Ethernet (AoE) and others. In one embodiment, the communication protocol is a routed protocol such that any number of intermediate routing or switching agents may be used to abstract connection 12.
The initiators 100 conduct different storage operations with the physical disks 500 in targets 300 though the storage proxy 200. The storage operations may include write operations and read operations that have associated storage addresses. These interactions with storage proxy 200 and other components of storage proxy 200 may be normalized to block-level operations such as “reads” and “writes” of an arbitrary number of blocks.
Storage proxy 200 contains a cache resource 16 used for accelerating accesses to targets 300. The cache resource in
A prefetch controller 18 includes any combination of software and/or hardware within storage proxy 200 that controls cache resource 16. For example, the prefetch controller 18 could be a processor 22 that executes software instructions that when executed by the processor 22 conduct the analysis and invalidation operations described below.
During a prefetch operation, prefetch controller 18 performs one or more reads to targets 300 and stores the data in cache resource 16. If subsequent reads from initiators 100 request the data in cache resource 16, storage proxy 200 returns the data directly from cache resource 16. Such a direct return is referred to as a “cache hit” and reduces the read time for applications on initiators 100 accessing targets 300. For example, a memory access to targets 300 can take several milliseconds while a memory access to cache resource 16 may be in the order of microseconds.
Prefetch controller 18 can operate in both a monitoring mode and an active mode. When operating in the monitoring mode, the prefetch controller 18 monitors and records read and write operations from initiators 100 to targets 300. The prefetch controller 18 uses the monitored information when performing subsequent caching operations.
As mentioned above, an initiator 100 may resend a memory access request several times when the data received back from the storage proxy 200 is incorrect or corrupted. It would be advantageous for the storage proxy 200 to identify these invalid error conditions and then invalidate the cache lines 207 that contain the erroneous data before the initiator 100 fails. Memory accesses from different originating computing elements have to be correctly identified in order to correctly anticipate and avoid these failure conditions. However, memory access requests may come from different initiators and may be abstracted by different software and hardware elements.
For example, the storage proxy 200 may receive memory access requests that are broken up into different portions and sent at different times. Further, the different portions of the broken up memory access requests may overlap with other broken up memory access requests from other computing elements. These disjointed overlapping memory access requests make it difficult for the storage proxy 200 to accurately identify the processing elements that originated the memory access requests.
The HBA card 110 asserts signals on a fiber channel bus connection 12 in
Some of the guest applications 115 and guest operating systems 113 may be the same as the applications 114 and operating system 112, respectively, in
The virtualization and abstractions comprise the differences between the memory access requests originally issued by the applications 114 and/or 115 in
The operating system 112 in
The HBA card/initiator 110 may have yet another buffer size or a particular configuration or state that further abstracts the OS reads 122A, 122B, and 122C. For example, the HBA card 110 may break the first OS read 122A into two separate initiator reads 124A and 124B. The first initiator read 124A has the same starting address 130 as application read 120 and OS read 122A. The second initiator read 124B has a starting address that starts at the ending address of initiator read 124A and has the same ending address as OS read 122A.
The HBA card/initiator 110 may not dissect or abstract the second OS read 122B. In other words, the third initiator read 124C may have the same starting address as OS read 122B and the same ending address as OS read 122B. The HBA card/initiator 110 separates the third OS read 122C into two separate initiator reads 124D and 124E. The starting address of the fourth initiator read 124D starts at the starting address of OS read 122C. The fifth initiator read 124E starts at the ending address of initiator read 124D and has the same ending address 132 as application read 120 and OS read 122C.
It can be seen that the operating system 112 and the initiator 110 in
In one embodiment of the present invention, the interpretation of application reads by the system is bypassed such that the initiator or OS level read commands are used. The subsequently described method of identifying invalid cache data performs correctly regardless of the read operation layer (application, operating system or initiator as shown in
In
For example, responsive to the application read 120 in
Identifying Invalid Data Conditions
For any variety of reasons, the data in a particular cache line 207 may not be the correct data requested by one of the initiators 100. For example, a particular state of the storage proxy 200 may load the wrong data from the targets 300 into a particular cache line 207. In another situation, the storage proxy 200 may load invalid data or the wrong data into a particular cache line 207 at the wrong time. In other situation, due to some hardware glitch, data in a particular cache line 207 may be corrupted even though the correct data was loaded from the targets 300. A variety of other hardware or software conditions may result in one or more of the cache lines 207 containing corrupted or incorrect data. In some cases, the incorrect data is due to initiator errors, such as inconsistent multipath configuration, rather than an error within storage proxy 200.
Certain applications 114 or other processing elements in the initiators 100 may initiate a read operation to the targets 300 and receive back data from one of the cache lines 207 in storage proxy 200. The application or processing element determines if the correct data is received back from the storage proxy 200. For example, data values, checksums, configuration bits, etc. may be compared with the data received back from the storage proxy 200.
The application 114 may resend the same application read 120 a number of times if the data received back from the storage proxy 200 is incorrect. If the wrong data continues to be received back from the storage proxy 200, the application 114 may simply fail. For example after five read retries, the application 114 may go into a failure state and require a restart or reboot to start operating again. The number of retries that will be performed may be configurable or a fixed number based on the application programming. In some applications, the decision to retry may be based on configuration.
The detection system described below identifies these incorrect data conditions and automatically invalidates the data in the identified cache line. Because the cache line 207 is invalidated, a subsequent read retry operation from the application 114 causes the storage proxy 200 to access the data from the targets 300. One of the cache lines 207 may then be reloaded with the valid data read from targets 300 so that the storage proxy 200 can then start providing the correct data to the application 114. The application 114 can then continue operating and processing data without going into a failure state.
The storage proxy 200 may normally consider each application read 120A, 120B, 120C, and/or 120D in
These repeated hits to the same cache line 207 could appear to the storage proxy 200 as an invalid data retry condition from one of the applications 114. However, the repeated hits to the same cache line 207 may actually be from multiple different initiators 110, 130, and 150 in
The storage proxy 200 distinguishes multiple cache line hits from repeated abstracted read operations from multiple cache line hits caused by invalid cache line data. The storage proxy 200 identifies the cache lines with bad data and automatically invalidates the identified cache lines 207. This automatic invalidation avoids application failures.
Referring to
Using
Of course, the start of application read 120B may be the next sequential address after address 150 and the start of cache line 207B may be the next sequential address after address 120. However, for illustration purposes the ending address of application read 120A is also shown as the starting address of application read 120B. Similarly, the ending address of cache line 207A is shown as the starting address of cache line 207B.
The last read start register 272 in
For example, after receiving application read 120A, the address value in last read start register 272 for cache line 207A is 100 and the value in last read length register 274 is 20. If this is the first hit on cache line 207A, the repetition counter 276 will be set to one and the last timestamp register 278 will be set to a current time associated with the application read 120A.
After receiving application read 120A, the address value in last read start register 272 for cache line 207B is set to 120 and the value in last read length register 274 for cache line 207B is set to 30. This corresponds to the starting address of cache line 207B at address 120 and a read length that starts at cache line address 120 and extends to the end of application read 120A at address 150. If this is the first hit on cache line 207B, the value in repetition counter 276 is set to 1 and the value in last timestamp register 278 is set to a current time associated with application read 120B.
Since there is no hit on cache line 207A after receiving the second application read 120B, there is no change to the control registers 270 associated with cache line 207A. However after receiving application read 120B, the address value in last read start register 272 for cache line 207B is set to 150 and the value in last read length register 274 for cache line 207B is set to 20. This corresponds to the starting address of the application read 120B at address 150 and a read length that starts at address 150 and extends to the end of cache line 207B at address 170. Since this is the latest hit on cache line 207B with this particular start address and read length, the value in repetition counter 276 for cache line 207B is reset to one and the value in last timestamp register 278 is reset to a current time of application read 120B.
Compare this to a second application read 120A sent immediately after the first application read 120A. For example, instead of receiving application read 120A in
In this case there is a second hit on cache line 207A after receiving the first application read 120A. The second application read 120A will have the same start address as the last read start register 272 and will have the same length as the last read length register 274 for cache line 207A. Accordingly, the processor 22 in
If the time difference is outside of a given time threshold in operation 304, the start address value and address length value for the received read request are loaded or set into the registers 272 and 274, respectively, for the associated cache line 207 in operation 306. The count value in repetition counter 276 for the associated the cache line 207 is cleared or reset to one in operation 307. The timestamp value in last timestamp register 278 for the associated cache line 207 is set to the current time associated with the received read request in operation 308.
The time threshold checked in operation 304 is used to distinguish repeat back-to-back hits to the same cache line 207 caused by invalid cache line data from normal hits to the same cache line that are not due to bad cache line data. For example, if the same cache line hit for the same start address and address length happens outside of the given time threshold, the second cache line hit may be associated with a second valid data request for the same data.
Operation 305 compares the start address value in last read register 272 for the hit cache line 207 with the start address of the received read request. Operation 305 compares the address length value in last read length register 274 for the associated cache line 207 with the address length of the received read request.
A different start address or different read length in the registers 272 or 274 than the received read request indicates two different read operations that just happen to hit the same cache line 207. The storage proxy 200 considers this a normal read operation that is not associated with back-to-back read requests that may be associated with invalid cache line data.
Accordingly, in operation 306 the processor 22 in storage proxy 200 sets the registers 272 and 274 with the start address and read length, respectively, of the latest received read request. The value in counter register 276 is reset in operation 307 and the value in timestamp register 278 is updated to the time associated with the new read request in operation 308.
The time of the latest read request may be within the predetermined time threshold of the timestamp value in last timestamp register 278 in operation 304. And, the start address and read length of the received read request may also match the values in registers 272 and 274 in operation 305. This indicates an application 114 that may have made the same two back-to-back read requests for the same data. This also indicates that the application 114 may have made the same back-to-back application reads 120 because the data returned back from the cache line 207 in the previous read was incorrect.
Accordingly, the storage proxy 200 in operation 310 increments the value in repetition counter register 276 for the hit cache line after receiving the second read request. The storage proxy 200 in operation 312 determines if the value in repetition counter 276 is above a predetermined limit. For example, a database application 114 may be known to repeat the same read request five times if the data received pursuant to the read request continues to be incorrect. After five read retries, the database application 114 may fail and require rebooting or a resetting. Of course any number of retries may be used by a particular application 114.
The counter limit in operation 312 is set to some number below the database failure retry number. For example, the counter limit in operation 312 may be set to three. If the value in repetition counter 276 is less than three in operation 312, the storage proxy 200 returns to receive the next read request in operation 302. If the value in repetition counter 276 reaches the limit of three in operation 312, the storage proxy 200 invalidates the cache line 207 in operation 314. The cache line is invalidated by setting an invalid bit in the cache resource 16 associated with the hit cache line 207. Other schemes may also be used for invalidating cache lines.
Invalidating the cache line 207 causes the storage proxy 200 to send the next read request associated with that cache line address to the targets 300 in
If the data read from the targets 300 is valid, the application 114 can continue operating without any more read request retries. Thus, the storage proxy 200 invalidates the corrupted or incorrect data in the cache line 207 and provides the correct data for the read request from targets 300. This prevents a failure condition in the application 114.
Hardware and Software
Several examples have been described above with reference to the accompanying drawings. Various other examples are also possible and practical. The systems and methodologies may be implemented or applied in many different forms and should not be construed as being limited to the examples set forth above. Some systems described above may use dedicated processor systems, micro controllers, programmable logic devices, or microprocessors that perform some or all of the operations. Some of the operations described above may be implemented in software or firmware and other operations may be implemented in hardware.
For the sake of convenience, the operations are described as various interconnected functional blocks or distinct software modules. This is not necessary, however, and there may be cases where these functional blocks or modules are equivalently aggregated into a single logic device, program or operation with unclear boundaries. In any event, the functional blocks and software modules or features of the flexible interface can be implemented by themselves, or in combination with other operations in either hardware or software.
Digital Processors, Software and Memory Nomenclature
As explained above, embodiments of this disclosure may be implemented in a digital computing system, for example a CPU or similar processor. More specifically, the term “digital computing system,” can mean any system that includes at least one digital processor and associated memory, wherein the digital processor can execute instructions or “code” stored in that memory. (The memory may store data as well.)
A digital processor includes but is not limited to a microprocessor, multi-core processor, Digital Signal Processor (DSP), Graphics Processing Unit (GPU), processor array, network processor, etc. A digital processor (or many of them) may be embedded into an integrated circuit. In other arrangements, one or more processors may be deployed on a circuit board (motherboard, daughter board, rack blade, etc.). Embodiments of the present disclosure may be variously implemented in a variety of systems such as those just mentioned and others that may be developed in the future. In a presently preferred embodiment, the disclosed methods may be implemented in software stored in memory, further defined below.
Digital memory, further explained below, may be integrated together with a processor, for example Random Access Memory (RAM) or FLASH memory embedded in an integrated circuit Central Processing Unit (CPU), network processor or the like. In other examples, the memory comprises a physically separate device, such as an external disk drive, storage array, or portable FLASH device. In such cases, the memory becomes “associated” with the digital processor when the two are operatively coupled together, or in communication with each other, for example by an I/O port, network connection, etc. such that the processor can read a file stored on the memory. Associated memory may be “read only” by design (ROM) or by virtue of permission settings, or not. Other examples include but are not limited to WORM, EPROM, EEPROM, FLASH, etc. Those technologies often are implemented in solid state semiconductor devices. Other memories may comprise moving parts, such a conventional rotating disk drive. All such memories are “machine readable” in that they are readable by a compatible digital processor. Many interfaces and protocols for data transfers (data here includes software) between processors and memory are well known, standardized and documented elsewhere, so they are not enumerated here.
Storage of Computer Programs
As noted, some embodiments may be implemented or embodied in computer software (also known as a “computer program” or “code”; we use these terms interchangeably). Programs, or code, are most useful when stored in a digital memory that can be read by one or more digital processors. The term “computer-readable storage medium” (or alternatively, “machine-readable storage medium”) includes all of the foregoing types of memory, as well as new technologies that may arise in the future, as long as they are capable of storing digital information in the nature of a computer program or other data, at least temporarily, in such a manner that the stored information can be “read” by an appropriate digital processor. The term “computer-readable” is not intended to limit the phrase to the historical usage of “computer” to imply a complete mainframe, mini-computer, desktop or even laptop computer. Rather, the term refers to a storage medium readable by a digital processor or any digital computing system as broadly defined above. Such media may be any available media that is locally and/or remotely accessible by a computer or processor, and it includes both volatile and non-volatile media, removable and non-removable media, embedded or discrete.
Having described and illustrated a particular example system, it should be apparent that other systems may be modified in arrangement and detail without departing from the principles described above. Claim is made to all modifications and variations coming within the spirit and scope of the following claims
The present application is a continuation in part of U.S. patent application Ser. No. 12/846,568 filed on Jul. 29, 2010 which is herein incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5604753 | Bauer et al. | Feb 1997 | A |
5954796 | McCarthy et al. | Sep 1999 | A |
6041366 | Maddalozzo et al. | Mar 2000 | A |
6161208 | Dutton et al. | Dec 2000 | A |
6401147 | Sang et al. | Jun 2002 | B1 |
6636982 | Rowlands | Oct 2003 | B1 |
6678795 | Moreno et al. | Jan 2004 | B1 |
6721870 | Yochai et al. | Apr 2004 | B1 |
6728855 | Thiesfeld et al. | Apr 2004 | B2 |
6742084 | Defouw et al. | May 2004 | B1 |
6789171 | Desai et al. | Sep 2004 | B2 |
6810470 | Wiseman et al. | Oct 2004 | B1 |
7017084 | Ng et al. | Mar 2006 | B2 |
7089370 | Luick | Aug 2006 | B2 |
7110359 | Acharya | Sep 2006 | B1 |
7243205 | Freeman et al. | Jul 2007 | B2 |
7594085 | Rao | Sep 2009 | B1 |
7856533 | Hur et al. | Dec 2010 | B2 |
7870351 | Resnick | Jan 2011 | B2 |
7873619 | Faibish et al. | Jan 2011 | B1 |
7975108 | Holscher et al. | Jul 2011 | B1 |
8010485 | Chatterjee et al. | Aug 2011 | B1 |
8190951 | Gille | May 2012 | B2 |
20020035655 | Finn et al. | Mar 2002 | A1 |
20020175998 | Hoang | Nov 2002 | A1 |
20020194434 | Kurasugi | Dec 2002 | A1 |
20030012204 | Czeiger et al. | Jan 2003 | A1 |
20030167327 | Baldwin et al. | Sep 2003 | A1 |
20030177168 | Heitman et al. | Sep 2003 | A1 |
20030210248 | Wyatt | Nov 2003 | A1 |
20040128363 | Yamagami et al. | Jul 2004 | A1 |
20040146046 | Jo et al. | Jul 2004 | A1 |
20040186945 | Jeter et al. | Sep 2004 | A1 |
20040215923 | Royer | Oct 2004 | A1 |
20050025075 | Dutt et al. | Feb 2005 | A1 |
20050195736 | Matsuda | Sep 2005 | A1 |
20060005074 | Yanai et al. | Jan 2006 | A1 |
20060034302 | Peterson | Feb 2006 | A1 |
20060053263 | Prahlad et al. | Mar 2006 | A1 |
20060075191 | Lolayekar et al. | Apr 2006 | A1 |
20060112232 | Zohar et al. | May 2006 | A1 |
20060212524 | Wu et al. | Sep 2006 | A1 |
20060218389 | Li et al. | Sep 2006 | A1 |
20060277329 | Paulson et al. | Dec 2006 | A1 |
20070050548 | Bali et al. | Mar 2007 | A1 |
20070079105 | Thompson | Apr 2007 | A1 |
20070118710 | Yamakawa et al. | May 2007 | A1 |
20070124407 | Weber et al. | May 2007 | A1 |
20070192444 | Ackaouy et al. | Aug 2007 | A1 |
20070233700 | Tomonaga | Oct 2007 | A1 |
20070283086 | Bates | Dec 2007 | A1 |
20080028162 | Thompson | Jan 2008 | A1 |
20080098173 | Chidambaran et al. | Apr 2008 | A1 |
20080104363 | Raj et al. | May 2008 | A1 |
20080162864 | Sugumar et al. | Jul 2008 | A1 |
20080215827 | Pepper | Sep 2008 | A1 |
20080215834 | Dumitru et al. | Sep 2008 | A1 |
20080250195 | Chow et al. | Oct 2008 | A1 |
20080320269 | Houlihan et al. | Dec 2008 | A1 |
20090006725 | Ito et al. | Jan 2009 | A1 |
20090006745 | Cavallo et al. | Jan 2009 | A1 |
20090034377 | English et al. | Feb 2009 | A1 |
20090110000 | Brorup | Apr 2009 | A1 |
20090164727 | Penton et al. | Jun 2009 | A1 |
20090240873 | Yu et al. | Sep 2009 | A1 |
20090259800 | Kilzer et al. | Oct 2009 | A1 |
20090262741 | Jungck et al. | Oct 2009 | A1 |
20090276588 | Murase | Nov 2009 | A1 |
20090307388 | Tchapda | Dec 2009 | A1 |
20100011154 | Yeh | Jan 2010 | A1 |
20100030809 | Nath | Feb 2010 | A1 |
20100080237 | Dai et al. | Apr 2010 | A1 |
20100088469 | Motonaga et al. | Apr 2010 | A1 |
20100115206 | de la Iglesia et al. | May 2010 | A1 |
20100115211 | de la Iglesia et al. | May 2010 | A1 |
20100122020 | Sikdar et al. | May 2010 | A1 |
20100125857 | Dommeti et al. | May 2010 | A1 |
20100169544 | Eom et al. | Jul 2010 | A1 |
20100174939 | Vexler | Jul 2010 | A1 |
20110047347 | Li et al. | Feb 2011 | A1 |
20110258362 | McLaren et al. | Oct 2011 | A1 |
20120198176 | Hooker et al. | Aug 2012 | A1 |
Entry |
---|
Mark Friedman, Odysseas Pentakalos. Windows 2000 Performance Guide. File Cache Performance and Tuning [reprinted online]. O'Reilly Media. Jan. 2002 [retrieved on Oct. 29, 2012]. Retrieved from the internet: <URL:http://technet.microsoft.com/en-us/library/bb742613.aspx#mainSection>. |
Stolowitz Ford Cowger Listing of Related Cases, Feb. 7, 2012. |
Rosenblum, Mendel and Ousterhout, John K., The LFS Storage Manager. Proceedings of the 1990 Summer Usenix. 1990 pp. 315-324. |
Number | Date | Country | |
---|---|---|---|
Parent | 12846568 | Jul 2010 | US |
Child | 12849652 | US |