Storage systems include functionality to service write requests and read requests. More specifically, traditional storage systems include functionality to write data to persistent storage and then immediately read this data from the persistent storage.
Specific embodiments of the technology will now be described in detail with reference to the accompanying figures. In the following detailed description of embodiments of the technology, numerous specific details are set forth in order to provide a more thorough understanding of the technology. However, it will be apparent to one of ordinary skill in the art that the technology may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description
In the following description of
In general, embodiments of the technology relate to a method and system for storing and reading data from persistent storage. More specifically, embodiments of the technology relate to a method and system for storing data in persistent storage, where the data written to the persistent storage is not immediately accessible in the persistent storage (i.e., during the inaccessibility period). In such instances, embodiments of the technology provide a method and system to enable the storage system to service read requests for the data using a primary cache entry table (PCET) and an overflow table.
In one embodiment of the technology, a host (100) is any system or process executing on a system that includes functionality to issue read requests and/or write requests to the control module. In one embodiment of the technology, the hosts (100) may each include a processor (not shown), memory (not shown), and persistent storage (not shown). In one embodiment of the technology, the control module is configured to receive write requests and read requests from one or more hosts (100) and to transmit the received requests to the appropriate storage module(s) (104A, 104N). Additional detail about the storage modules is provided below in
In one embodiment of the technology, the hosts (100) are configured to communicate with the control module (102) using one or more of the following protocols: Peripheral Component Interconnect (PCI), PCI-Express (PCIe), PCI-eXtended (PCI-X), Non-Volatile Memory Express (NVMe), Non-Volatile Memory Express (NVMe) over a PCI-Express fabric, Non-Volatile Memory Express (NVMe) over an Ethernet fabric, and Non-Volatile Memory Express (NVMe) over an Infiniband fabric. Those skilled in the art will appreciate that the technology is not limited to the aforementioned protocols.
In one embodiment of the technology, the control module (102) is configured to communicate with the storage modules (104A, 104N) using one or more of the following protocols: Peripheral Component Interconnect (PCI), PCI-Express (PCIe), PCI-eXtended (PCI-X), Non-Volatile Memory Express (NVMe), Non-Volatile Memory Express (NVMe) over a PCI-Express fabric, Non-Volatile Memory Express (NVMe) over an Ethernet fabric, and Non-Volatile Memory Express (NVMe) over an Infiniband fabric. Those skilled in the art will appreciate that the technology is not limited to the aforementioned protocols.
Those skilled in the art will appreciate that the technology is not limited to the architecture shown in
In one embodiment of the technology, the FPGA (202) is an integrated circuit that is configured to perform all or a portion of the methods described in
In one embodiment of the technology, the external memory (204) is volatile memory, which includes an overflow table (212), a bitmap (214), and a data cache (216). Each of these components is described below. The overflow table (212), like the PCET (210), includes table entries. However, the size of the overflow table (212) is typically significantly larger than the size of the PCET (210). More specifically, the size of the PCET (210) is limited by the size of the internal memory that may be located on the FPGA (202); however, the size of the overflow table (212) is determined such that there is a sufficient number of table entries to ensure that all read requests issued to storage module for data that cannot be read during the unavailability period can be serviced using the PCET and/or the overflow table (212). If there is not sufficient space to store an appropriate number of table entries in the PCET and the overflow table, then the storage module may not be able to service all read requests for data issued during the unavailability period (see e.g.,
In one embodiment of the technology, the bitmap (214) includes an entry for each logical address that may be used by hosts issuing read and/or write requests to the storage module (200). Additional detail about the use of the bitmap is described in
In one embodiment of the technology, the data cache (216) temporarily stores data that has been written to the storage units. The data stored in the data cache (216) is used to service read requests for the data when the data cannot be retrieved from the storage units (i.e., during the unavailability period).
In one embodiment of the technology, each of the storage units (206A, 206M) includes persistent storage. The persistent storage may include magnetic storage media, optical storage media, solid state storage media, phase change storage media, any other suitable type of persistent storage media, or any combination thereof. In one embodiment of the technology, the persistent storage media may have an unavailability period. More specifically, when data is written to such persistent storage media, the data may not be read from the persistent storage media for a period of time (referred to as an unavailability period). The unavailability period may vary depending on specific implementation of the persistent storage media.
Those skilled in the technology will appreciate that the technology is not limited to the architecture shown in
The valid (302) bit is used to determine whether the given table entry may be removed from the PCET or the overflow table. More specifically, when the data with which the table entry is associated may not be read from any of the storage units, the valid (302) bit may be set in order to signify that the table entry (300) is valid and may not be removed from the PCET or the overflow table. Further, when the data with which the table entry is associated may be read from one or more of the storage units, the valid (302) bit may be cleared in order to signify that the table entry (300) may be removed from the PCET or the overflow table.
The logical address (304) corresponds to the logical address in the write request that resulted in the creation of the table entry (see e.g.,
Turning to the example, consider a scenario in which Table Entry A (404) was initially stored in the PCET (400). At a later point in time, Table Entry B (406) is created and stored in the overflow table (402) and the next link pointer in Table Entry A (404) is updated to reference Table Entry B (406). Table Entry C (408) is then created and stored in the overflow table (402). At this time, the next link pointer in Table Entry B is updated to reference Table Entry C (408).
Turning to the flowcharts, while the various steps in the flowchart are presented and described sequentially, one of ordinary skill will appreciate that some or all of the steps may be executed in different orders, may be combined or omitted, and some or all of the steps may be executed in parallel. Further, the processes described in the various flowcharts may be performed serially, concurrently and/or in parallel by the storage modules.
In Step 500, a write request is received by the storage module from a host, where the write request includes a logical address and data.
In Step 502, the FPGA applies a hash function to the logical address in order to generate a hash value. The following is a set of non-limiting hash functions that may be used: SHA-1, MD5, any non-cryptographic hash function, any keyed cryptographic hash function, and/or any non-keyed cryptographic hash function.
In Step 504, a determination is made about whether there is a table entry stored in the PCET at the location associated with the hash value. Said another, the PCET includes N number of physical locations, where each of the N number of physical locations is associated with a hash value. Accordingly, the determination is step 504 is used to ascertain whether the location in the PCET associated with the hash value is full (i.e., currently storing a valid table entry) or is empty (i.e., no table entry is stored or an invalid table entry is stored at the location). If the location associated with the hash value is empty (or currently storing an invalid table entry), then the process proceeds to step 506; otherwise, the process proceeds to step 508.
In Step 506, a table entry is generated and stored in the physical location in the PCET corresponding to the hash value. At the time the table entry is stored in PCET, the next link pointer is set to null. The process then proceeds to Step 510.
In scenarios in which the location in the PCET associated with the hash value is full, then in Step 508, a table entry is generated and stored in an available location in the overflow table. The location in the overflow table may be selected randomly or using any other selection method. Continuing with the discussion of step 508, a next link pointer in a table entry in the PCET or a table entry in the overflow table is updated to reference the table entry created in step 508 (see e.g.,
In Step 510, the entry in the bitmap associated with the logical address in the write request is set.
In Step 512, the data associated with the write request is stored in the data cache in the external memory. In one embodiment of the technology, the location in which data is stored is associated with the table entry (i.e., the table entry stored in step 506 or 508). Said another way, the FPGA maintains a mapping between a table entry and the location in the data cache in which the data is stored. This mapping is used to obtain the data from the data cache (see e.g.,
In Step 514, the FPGA transmits the data (i.e., the data retrieved from the host) to at least one of the storage units in the storage module.
In Step 600, a read request is received from a host. The read request includes a logical address.
In Step 602, a determination is made about whether an entry in bitmap corresponding to the logical address is set. If the entry is not set, then the data may be accessed from one or more of the storage units and, as such, the process proceeds to step 604. However, if the entry is set, then the data cannot be retrieved from the storage unit (e.g., because of the unavailability period) and, as such, the process proceeds to step 606.
In Step 604, the data corresponding to the logical address is obtained from the appropriate storage unit and provided to the host. The FGPA may be configured to retrieve the data from the storage unit and provide the retrieved data to the host.
Continuing with the discussion of
In Step 608, the hash value is used to identify a table entry in the PCET or the overflow table that includes the logical address (i.e., the logical address in the read request). The following is an example of how the table entry may be identified. The following example is described with respect to
Turning to the example, consider a scenario in which the hash value generated in step 606 corresponds to a physical location in the PCET (400) in which Table Entry A (404) is stored. Accordingly, the logical address in Table Entry A is compared to the logical address from the read request. In this example, the logical address in Table Entry A (404) does not match the logical address in the read request. Thus, the next link pointer in Table Entry A is used to identify a next table entry in the overflow table (402). In this example, Table Entry B (406) is the next identified table entry. Similar to the evaluation of Table Entry A, the logical address in Table Entry B is compared to the logical address from the read request. In this example, the logical address in Table Entry B (406) matches the logical address in the read request. Accordingly, the Table Entry B is the table entry identified in step 608. If the logical address in Table Entry B did not match the logical address in the read request, then the next link pointer in Table Entry B would be used to identify Table Entry C. Table Entry C would then be evaluated in the same manner as Table Entries A and B. The aforementioned process would continue until a table entry is identified.
Continuing with the discussion of
In Step 700, a table entry in the primary cache entry table is selected.
In Step 702, a determination is made about whether the calculated time (i.e., the time value determined by combining the timestamp in the selected table entry and the unavailability period) is greater than the current time. The following is a non-limiting example of determining a calculated time. Turning to the example, consider a scenario in which the timestamp is T1 and the unavailability period is P, then the calculated time is T1+P. Further, if the current time is T2, then the determination made is step 702 is whether (T1+P)>T2. If the calculated time is greater than the current time, then the unavailability period for the data has elapsed and the data may be obtained from the appropriate storage unit and, as such, the process proceeds to step 704. However, if the calculated time is less than the current time, then the unavailability period for the data has not elapsed and the data may not be obtained from a storage unit and, as such, the process in
Continuing with the discussion of
In Step 706, the table entry is selected table entry is removed (or the valid bit in the table entry is updated to signify that the table entry is invalid). Further, the entry in the bitmap corresponding to the logical address in the selected table entry is updated to indicate that there is no corresponding table entry in the PCET (e.g., the entry in the bitmap is updated from one to zero). The data corresponding to the removed table entry is also removed (or set as invalid) in the data cache. The process then ends.
Continuing with the discussion of
In Step 710, the table entry in the PCET (i.e., the table entry selected in step 700) is replaced by the table entry in identified in step 708.
In Step 712, the entry in the bitmap corresponding to the logical address in the selected table entry is updated to indicate that there is no corresponding table entry (e.g., the entry in the bitmap is updated from one to zero). The data corresponding to the removed table entry is also removed (or set as invalid) in the data cache. The process then ends.
The process described in
One or more embodiments of the technology may be implemented using instructions executed by one or more processors in the storage appliance. Further, such instructions may correspond to computer readable instructions that are stored on one or more non-transitory computer readable mediums.
While the technology has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the technology as disclosed herein. Accordingly, the scope of the technology should be limited only by the attached claims.
Number | Name | Date | Kind |
---|---|---|---|
5708668 | Styczinski | Jan 1998 | A |
6351838 | Amelia | Feb 2002 | B1 |
6415355 | Hirofuji | Jul 2002 | B1 |
7398418 | Soran et al. | Jul 2008 | B2 |
7406621 | Lubbers et al. | Jul 2008 | B2 |
7430706 | Yuan et al. | Sep 2008 | B1 |
7543100 | Singhal et al. | Jun 2009 | B2 |
7644197 | Waldorf et al. | Jan 2010 | B1 |
7752389 | Fan | Jul 2010 | B1 |
7934120 | Zohar et al. | Apr 2011 | B2 |
8078906 | Yochai et al. | Dec 2011 | B2 |
8145840 | Koul et al. | Mar 2012 | B2 |
8200887 | Bennett | Jun 2012 | B2 |
8316260 | Bonwick | Nov 2012 | B1 |
8327185 | Bonwick | Dec 2012 | B1 |
8448021 | Bonwick | May 2013 | B1 |
8464095 | Bonwick | Jun 2013 | B1 |
8554997 | Bonwick et al. | Oct 2013 | B1 |
8560772 | Piszczek et al. | Oct 2013 | B1 |
8719520 | Piszczek et al. | May 2014 | B1 |
8725931 | Kang | May 2014 | B1 |
8924776 | Mollov et al. | Dec 2014 | B1 |
8977942 | Wu et al. | Mar 2015 | B2 |
9021183 | Matsuyama et al. | Apr 2015 | B2 |
9152499 | Mollov et al. | Oct 2015 | B1 |
9552242 | Leshinsky et al. | Jan 2017 | B1 |
9760493 | Wang | Sep 2017 | B1 |
10095414 | Zettsu et al. | Oct 2018 | B2 |
20020161972 | Talagala et al. | Oct 2002 | A1 |
20030093740 | Stojanovic | May 2003 | A1 |
20040153961 | Park et al. | Aug 2004 | A1 |
20040177219 | Meehan et al. | Sep 2004 | A1 |
20040225926 | Scales et al. | Nov 2004 | A1 |
20050166083 | Frey et al. | Jul 2005 | A1 |
20050223156 | Lubbers et al. | Oct 2005 | A1 |
20050229023 | Lubbers et al. | Oct 2005 | A1 |
20060085594 | Roberson et al. | Apr 2006 | A1 |
20060112261 | Yourst et al. | May 2006 | A1 |
20060190243 | Barkai et al. | Aug 2006 | A1 |
20070061383 | Ozawa et al. | Mar 2007 | A1 |
20080109602 | Ananthamurthy et al. | May 2008 | A1 |
20080120484 | Zhang et al. | May 2008 | A1 |
20080168225 | O'Connor | Jul 2008 | A1 |
20090187786 | Jones et al. | Jul 2009 | A1 |
20100005364 | Higurashi et al. | Jan 2010 | A1 |
20100082540 | Isaacson et al. | Apr 2010 | A1 |
20100199125 | Reche | Aug 2010 | A1 |
20110055455 | Post et al. | Mar 2011 | A1 |
20110258347 | Moreira et al. | Oct 2011 | A1 |
20120030425 | Becker-Szendy et al. | Feb 2012 | A1 |
20120079318 | Colgrove et al. | Mar 2012 | A1 |
20120089778 | Au et al. | Apr 2012 | A1 |
20120166712 | Lary | Jun 2012 | A1 |
20120297118 | Gorobets | Nov 2012 | A1 |
20120303576 | Calder et al. | Nov 2012 | A1 |
20120324156 | Muralimanohar et al. | Dec 2012 | A1 |
20130151754 | Post | Jun 2013 | A1 |
20150324387 | Squires | Nov 2015 | A1 |
20160132432 | Shen | May 2016 | A1 |
20160210060 | Dreyer | Jul 2016 | A1 |
20160320986 | Bonwick | Nov 2016 | A1 |
20170192889 | Sato et al. | Jul 2017 | A1 |
20170255405 | Zettsu et al. | Sep 2017 | A1 |
20170285945 | Kryvaltsevich | Oct 2017 | A1 |
20170300249 | Geml | Oct 2017 | A1 |
20170329675 | Berger et al. | Nov 2017 | A1 |
20170351604 | Tang et al. | Dec 2017 | A1 |
20180267897 | Jeong | Sep 2018 | A1 |
Number | Date | Country |
---|---|---|
1577774 | Sep 2005 | EP |
2004-326759 | Nov 2004 | JP |
2010-508604 | Mar 2010 | JP |
2008054760 | May 2008 | WO |
Entry |
---|
Decision to Grant a Patent issued in corresponding Japanese Application No. 2015-501902, dated May 31, 2016 (6 pages). |
Minoru Uehara; “Orthogonal RAID with Multiple Parties in Virtual Large-Scale Disks”; IPSJ SIG Technical Report; vol. 2011-DPS-149; No. 4; Nov. 24, 2011 (8 pages). |
H. Peter Anvin; “The mathematics of RAID-6”; http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf (last visited Nov. 15, 2017) (9 pages). |
Amber Huffman; “NVM Express: Revision 1.0b”; Jul. 12, 2011 (126 pages). |
Akber Kazmi; “PCI Express™ Basics & Applications in Communication Systems”; PCI-SIG Developers Conference; 2004 (50 pages). |
Derek Percival; “Multicast Over PCI Express®,” PCI-SIG Developers Conference Europe; 2009 (33 pages). |
Jack Regula; “Using Non-transparent Bridging in PCI Express Systems”; PLX Technology, Inc.; Jun. 1, 2004 (31 pages). |
Number | Date | Country | |
---|---|---|---|
20180314639 A1 | Nov 2018 | US |