This application claims priority to Chinese Patent Application No. CN201811251580.6, on file at the China National Intellectual Property Administration (CNIPA), having a filing date of Oct. 25, 2018, and having “A NEW EFFICIENT METHOD FOR MAPPER AND RAID TO IMPLEMENT PACO FOR TRIDENT” as a title, the contents and teachings of which are herein incorporated by reference in their entirety.
Embodiments of the present disclosure generally relate to a storage system, and more specifically, to a method, a device and a computer readable storage medium for managing a Redundant Array of Independent Disks (RAID).
The RAID is a data storage virtualizing technique, which combines a plurality of physical storage devices into one or more logical units, for purposes of data redundancy, performance improvements and so on. If a product life of a storage device in the RAID is to be ended (EOL), it is probably required to replace online the storage device with other storage device. However, in some storage systems, replicating data from the storage device to the other storage device may involve too many write operations. This causes adverse impacts on the host I/O and wear of the storage device. Therefore, it is necessary to provide a solution at least partly solving the above problem.
The embodiments of the present disclosure provide a method, a device and a computer program product for managing a RAID.
In a first aspect, there is provided a method of managing a RAID. The method includes: in response to receiving information indicative of an end-of-life (EOF) of a first storage device of the RAID, determining a storage extent associated with the first storage device, the storage extent being distributed over a plurality of storage devices of the RAID and including a first group of slices in the first storage device, the storage extent including a plurality of data blocks stored thereon; reading a portion of a data block of the plurality of data blocks from a first slice of the first group of slices, the first slice including the portion of the data block; and writing the portion of the data block into a spare slice.
In a second aspect, there is provided a device for managing a Redundant Array of Independent Disks (RAID), including: a processing unit; and a memory coupled to the processing unit and including instructions stored thereon, the instructions, when executed by the processing unit, causing the device to perform acts including: in response to receiving information indicative of an end-of-life (EOF) of a first storage device of the RAID, determining a storage extent associated with the first storage device, the storage extent being distributed over a plurality of storage devices of the RAID and including a first group of slices in the first storage device, the storage extent including a plurality of data blocks stored thereon; reading a portion of a data block of the plurality of data blocks from a first slice of the first group of slices, the first slice including the portion of the data block; and writing the portion of the data block into a spare slice.
In a third aspect, there is provided a computer-readable storage medium including machine-executable instructions stored thereon which, when executed by at least one processor, cause the at least one processor to perform the method according to the first aspect.
In a fourth aspect, there is provided a computer program product stored on a computer-readable medium and including machine-executable instructions which, when executed, cause a machine to perform the method according to the first aspect.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the present disclosure, nor is it intended to be used to limit the scope of the present disclosure.
The above and other objectives, features, and advantages of example embodiments of the present disclosure will become more apparent from the following detailed description with reference to the accompanying drawings, in which the same reference signs refer to the same elements:
The individual features of the various embodiments, examples, and implementations disclosed within this document can be combined in any desired manner that makes technological sense. Furthermore, the individual features are hereby combined in this manner to form all possible combinations, permutations and variants except to the extent that such combinations, permutations and/or variants have been explicitly excluded or are impractical. Support for such combinations, permutations and variants is considered to exist within this document.
It should be understood that the specialized circuitry that performs one or more of the various operations disclosed herein may be formed by one or more processors operating in accordance with specialized instructions persistently stored in memory. Such components may be arranged in a variety of ways such as tightly coupled with each other (e.g., where the components electronically communicate over a computer bus), distributed among different locations (e.g., where the components electronically communicate over a computer network), combinations thereof, and so on.
The preferred embodiments disclosed herein will be described in detail below with reference to the accompanying drawings. Although the drawings illustrate the preferred embodiments of the present disclosure, it would be appreciated that the present disclosure can be implemented in various forms but cannot be limited by the embodiments described herein. Rather, these embodiments are provided to disclose the present disclosure more thoroughly and completely, and to convey the scope of the present disclosure fully to those skilled in the art.
As used herein, the term “includes” and its variants are to be read as open-ended terms that mean “includes, but is not limited to.” The term “or” is to be read as “and/or” unless the context clearly indicates otherwise. The term “based on” is to be read as “based at least in part on.” The term “one example embodiment” and “an embodiment” are to be read as “at least one example embodiment.” The term “another embodiment” is to be read as “at least one other embodiment.” The terms “first,” “second,” and the like may refer to different or same objects. Other definitions, explicit and implicit, may be included below.
In some embodiments, the namespace 102 supports a Network File System (NFA) and a Common Internet File System (CIFS), and is implemented on logical storage. The namespace 102 can communicate with the mapper 104, and obtain a physical address corresponding to a logical address using an Application Programming Interface (API) of the mapper 104. The mapper 104 includes mapping between the logical address and the physical address.
For example, the namespace 102 receives an input/output (I/O) request from a user, and sends the I/O request to the mapper 104. The mapper 104 searches an address of data and pushes the I/O request to the RAID 106. The RAID 106 uses a storage device (for example, a drive, a hard disk, or a Solid State Drive (SSD)) at the backend for executing I/O.
As shown in
As shown in
As shown in
If a storage device (for example, SSD) is approaching an End of Life (EOL), it is required to replicate data on the storage device to other storage devices.
In order to at least partly solve the above problem, the embodiments of the present disclosure provide solutions of managing the RAIDs. These solutions will be described below in detail with reference to
At block 402, the mapper 104 receives information indicative of an end of life (EOF) of a storage device 510 of the RAID 106.
At block 404, a storage extent associated with the storage device 510 is determined. The storage extent can be an uber as described above, which is distributed over a plurality of storage devices 202 of the RAID 510 and includes a first set of slices in the storage device 510. A set of slices can include one or more slices. The storage extent stores a plurality of data blocks, and for example, an uber stores a plurality of PLBs. For example, all storages extents or ubers including any slice of the storage device 510 can be determined.
At block 602, a first uber is selected. At block 604, it is determined whether the uber includes a slice of the EOL storage device. If yes, the method proceeds to block 606, and the pointer of the uber is moved to the list uber_paco_list. Then, the method 600 proceeds to block 608 where it is determined whether the uber is the last one. If it is not the last uber, the method proceeds to block 610, and the mapper 104 moves to the next uber for executing a next iteration. In addition, if it is determined at 604 that the uber does not include a slice of the EOL storage device, the method 600 proceeds to block 608.
Now returning to
At block 408, the portion of the data block is written into a spare slice. When obtaining the uber_paco_list, the mapper 104 can select an uber each time and cooperate with the RAID 106 to replicate the data from the EOL drive to the spare slice. When copying an uber is completed, the mapper 104 sends an uber recovery message to the RAID 106 to modify the metadata of the uber, thereby replacing the EOL disk slice with the spare slice. After all ubers in the uber_paco_list are recovered, the mapper sends a proactive copy (PACO) recovery message to the RAID 106, such that the RAID 106 marks the EOL disk offline.
For example, the uber determined at block 404 includes the RAID 106 or the second set of slices in the second storage device in the RRS 1. In some embodiments, the second storage device can also refer to a plurality of storage devices in the RAID 106 or RRS 1, other than the first storage device, for example, all of the other storage devices. For example, if it is determined that the second set of slices include a first spare slice, the portion of the PLB is written into the first spare slice. If the second set of slices does not include a spare slice, a second spare slice is selected from slices in the second storage device, other than the second set of slices, as a portion of the storage extent, and the portion of the data block is written into the second spare slice. For example, the second spare slice can be selected for any storage device from the other storage devices.
In some embodiments, if respective portions of the plurality of data blocks or PLBs are all written into respective spare slices, the first slice is replaced with the spare slice. For other data blocks or PLBs, other spare slices can be used to replace respective first slices. In some embodiments, if respective portions of the plurality of data blocks or PLBs in all the storage extents or ubers associated with the EOL storage device are written into respective spare slices, the EOL device can go offline.
By reducing a number of I/Os, a loss level of a storage device (for example, SSD) is decreased. Taking the 4+1 RAID 5 as an example, the method as shown in
At block 720, the mapper 104 receives a PACO PLB response and releases the read lock of the PLB, and the method proceeds to block 722. At block 722, the mapper 104 determines whether the PLB is the last one in the uber. If not, the mapper 104 moves to the next PLB at block 724, and the method proceeds to block 706 for the next iteration.
If it is determined at block 722 that the PLB is the last one in the uber, processing of the uber has been completed, and the method 700 proceeds to the block 726. At block 726, the mapper 104 marks the uber as being recovered, and sends the identifier (ID) of the uber to the RAID 106. At block 728, the RAID 106 replaces the EOL drive slice with the spare slice. At block 730, the RAID 106 releases the busy state of the spare slice, and access thereto is permitted. For example, the uber information can indicate information for the storage devices and slices forming the uber. At block 734, the RAID 106 marks the EOL disk slice busy to prevent access to the EOL disk slice. Moreover, the RAID 106 sends a response indicative of uber recovery completion to block 736. At block 736, the mapper 104 receives the response indicative of uber recovery completion. At block 738, the mapper 104 determines whether there is any uber left in the uber_paco_list that has been not processed. If yes, the method 700 proceeds to block 740 where the mapper 104 moves to the next uber for iteration. If no, the method 700 proceeds to block 742 where the mapper 104 requests the RAID 106 to execute PACO recovery, and sends the disk ID of the EOL disk to the RAID 106, for replacing the old disk with a new one. At block 744, the RAID 106 updates the RRS information, and at block 746, the EOL disk is marked offline, and a response indicative of PACO recovery completion is sent. At block 748, the mapper 104 receives the response indicative of PACO recovery completion and acknowledges recovery completion.
The following components in the device 800 are connected to the I/O interface 805: an input unit 806, such as a keyboard, a mouse and the like; an output unit 807, such as various kinds of displays and a loudspeaker, etc.; a storage unit 808, such as a magnetic disk, an optical disk, and etc.; a communication unit 809, such as a network card, a modem, and a wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices through a computer network such as the Internet and/or various kinds of telecommunications networks.
Various processes and processing described above, e.g., the methods 400, 600 and 700, can be executed by the processing unit 801. For example, in some embodiments, the methods 400, 600 and 700 can be implemented as a computer software program that is tangibly embodied on a machine readable medium, e.g., the storage unit 808. In some embodiments, part or all of the computer programs can be loaded and/or mounted onto the device 800 via ROM 802 and/or communication unit 808. When the computer program is loaded to the RAM 803 and executed by the CPU 801, one or more steps of the methods 400, 600 and 700 as described above can be executed.
The present disclosure can be a method, a device, a system and/or a computer program product. The computer program product can include a computer readable storage medium on which computer readable program instructions are carried out for performing each aspect of the present application.
The computer readable medium may be a tangible medium that may contain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may include copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
Computer readable program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It would be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means (or specialized circuitry) for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein includes an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, snippet, or portion of code, which includes one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reversed order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The descriptions of the various embodiments of the present disclosure have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
Number | Date | Country | Kind |
---|---|---|---|
201811251580.6 | Oct 2018 | CN | national |
Number | Name | Date | Kind |
---|---|---|---|
5717917 | Munakata | Feb 1998 | A |
7181578 | Guha | Feb 2007 | B1 |
8195872 | Ito | Jun 2012 | B2 |
8479080 | Shalvi | Jul 2013 | B1 |
9122588 | Mondal | Sep 2015 | B1 |
9203616 | Brown et al. | Dec 2015 | B1 |
9652160 | Piszczek | May 2017 | B1 |
9959054 | Vankamamidi et al. | May 2018 | B1 |
10013323 | Puhov | Jul 2018 | B1 |
10013325 | Garrett, Jr. | Jul 2018 | B1 |
10082965 | Tamilarasan | Sep 2018 | B1 |
10146449 | Labonte | Dec 2018 | B1 |
10152254 | Kang | Dec 2018 | B1 |
10289336 | Liu | May 2019 | B1 |
10346247 | Gao | Jul 2019 | B1 |
10958434 | Marquardt | Mar 2021 | B1 |
20030056142 | Hashemi | Mar 2003 | A1 |
20070248029 | Merkey | Oct 2007 | A1 |
20070263444 | Gorobets | Nov 2007 | A1 |
20070266200 | Gorobets | Nov 2007 | A1 |
20080082725 | Elhamias | Apr 2008 | A1 |
20080082726 | Elhamias | Apr 2008 | A1 |
20090138671 | Danilak | May 2009 | A1 |
20090144516 | Sandorfi | Jun 2009 | A1 |
20100077252 | Siewert | Mar 2010 | A1 |
20100122148 | Flynn | May 2010 | A1 |
20100195538 | Merkey | Aug 2010 | A1 |
20100250831 | O'Brien | Sep 2010 | A1 |
20110231594 | Sugimoto | Sep 2011 | A1 |
20110302358 | Yu | Dec 2011 | A1 |
20120203951 | Wood | Aug 2012 | A1 |
20120278543 | Yu | Nov 2012 | A1 |
20140025770 | Warfield | Jan 2014 | A1 |
20140068158 | Cheng | Mar 2014 | A1 |
20140089565 | Lee | Mar 2014 | A1 |
20140122968 | Kazi | May 2014 | A1 |
20140173268 | Hashimoto | Jun 2014 | A1 |
20140359348 | Volvovski | Dec 2014 | A1 |
20150067240 | Nozaki | Mar 2015 | A1 |
20150199152 | Asnaashari | Jul 2015 | A1 |
20150212937 | Stephens | Jul 2015 | A1 |
20150324294 | Ogawa | Nov 2015 | A1 |
20150347039 | Truong | Dec 2015 | A1 |
20150378822 | Grube | Dec 2015 | A1 |
20160011782 | Kurotsuchi | Jan 2016 | A1 |
20160011818 | Hashimoto | Jan 2016 | A1 |
20160019137 | Ellis | Jan 2016 | A1 |
20160080490 | Verma | Mar 2016 | A1 |
20160239390 | Myers | Aug 2016 | A1 |
20160246830 | Chiu | Aug 2016 | A1 |
20160292025 | Gupta | Oct 2016 | A1 |
20160299699 | Vanaraj | Oct 2016 | A1 |
20170003891 | Arai | Jan 2017 | A1 |
20170077950 | Pavlov | Mar 2017 | A1 |
20170090771 | Lin | Mar 2017 | A1 |
20170149242 | Carson | May 2017 | A1 |
20170228158 | Kraemer | Aug 2017 | A1 |
20180011642 | Koseki | Jan 2018 | A1 |
20180018113 | Koseki | Jan 2018 | A1 |
20180084620 | Klein | Mar 2018 | A1 |
20180088857 | Gao | Mar 2018 | A1 |
20180210782 | Gao | Jul 2018 | A1 |
20180275894 | Yoshino | Sep 2018 | A1 |
20180336101 | Resch | Nov 2018 | A1 |
20190011283 | Soutar | Jan 2019 | A1 |
20190034108 | Chang | Jan 2019 | A1 |
20190082010 | Friedman | Mar 2019 | A1 |
20190129797 | Ma | May 2019 | A1 |
20190129815 | Gao | May 2019 | A1 |
20190213803 | Ye | Jul 2019 | A1 |
20190220221 | Gao | Jul 2019 | A1 |
20190332502 | Ma | Oct 2019 | A1 |
20200026468 | Kang | Jan 2020 | A1 |
20200026469 | Gao | Jan 2020 | A1 |
20200042380 | Roberts | Feb 2020 | A1 |
20200042390 | Roberts | Feb 2020 | A1 |
20200073818 | Inglis | Mar 2020 | A1 |
20200133494 | Li | Apr 2020 | A1 |
20200133778 | Liu | Apr 2020 | A1 |
Number | Date | Country | |
---|---|---|---|
20200133514 A1 | Apr 2020 | US |