To treat a set of block devices as rows and projections in the Mojette Transform that implies a mapping between block device blocks and Mojette Transform pixels.
Redundant Array of Independent Disks (RAID) is traditionally implemented in businesses and organizations where disk fault tolerance and/or optimized performance are necessary. Servers and Network Attached Storage (NAS) servers in business datacenters (DC) typically have a hardware RAID controller but a lot of today's implementations rely on software solutions for redundancy in both DC and consumer hardware.
Software RAID means you can setup RAID without a need for a dedicated hardware RAID controller. There are several RAID levels, and the one you choose depends on whether you are using RAID for performance and/or fault tolerance.
RAID level 5 is described herein as an example. RAID 5 is a common RAID configuration for business servers and enterprise NAS devices. This RAID can provide better performance and redundancy than mirroring as well as fault tolerance. With RAID 5, data and parity used for recovery are striped across three or more disks. If a disk gets an error or starts to fail, data is recreated from this distributed data and parity, seamlessly and automatically. The system is still operational even when one disk fails. Another benefit of RAID 5 is that it allows NAS and server drives to be “hot-swappable” meaning that in case of a drive failure, that drive can be swapped with a new drive without shutting down the server or NAS, and without having to interrupt users who may be accessing the server or NAS. RAID for fault tolerance is important because as drives fail, the data can be rebuilt to new disks as failing disks are replaced. The downside to today's standard RAID 5 is the performance hit to servers that perform a lot of write operations. For example, with RAID 5 on a server that has a database that many employees access on a workday, there could be noticeable lag. RAID is not a backup (BU), and does not replace a business strategy for Disaster Recovery (DR) or a data protection policy embracing regular controlled BU and DR tests.
Different algorithms can be used to create RAID functionality but the preferred algorithm is the Mojette Transform (MT) a discrete and exact version of the Radon Transform. The Mojette Transform is by nature a non-systematic code and the parity chunks have a larger size (1+ε) than corresponding systematic chunks (k), where epsilon is ε>0 making the parity chunks (m) containing more information than data chunks. The Mojette Transform is by design highly performant also on CPUs without advanced acceleration features and delivers excellent results even on less potent CPUs, but takes full advantage of modern CPU features when present. MT is also portable between different hardware platforms, which means that it can be used in all architectural layers such as data centers, client applications and edge devices. MT is an algorithm that is rateless meaning that it is possible to set any redundancy level to a specific use case for optimal functionality, and add or reduce the redundancy level without noticeable performance impact when tiering the data from hot to cold storage or vice versa.
There is a need for an improved rateless RAID solution implemented in software or hardware. Redundant Block Devices using Mojette Transform Projections (RBD_MT) for the next generation of disks can communicate more like cloud native solutions and work in distributed frameworks over networks for high performance use-cases.
The invention is to treat a set of block devices as rows and projections in the Mojette Transform (MT). This implies a mapping between block device blocks and Mojette Transform pixels. By running the Mojette Transform in Block-less mode, all padding is pushed to the end of the Mojette Transform block, therefore the Mojette Transform Near-Optimal property can be handled by reserving a fixed number of blocks at the end of each block device. Essentially this works as a rate-less RAID-4/5 (standard RAID levels). During block write operations Partial Update (PU) is used to reduce the number of blocks that needs to be touched to complete the operation. This works by computing the difference between the old block and the new block. The new block is written to the data device and the difference is used to patch the parity blocks on the parity devices. Blocks written to the data devices can be striped over all data devices to speed up both read and write operations. Note that block is an overloaded term in computer storage systems. In this document we refer to block devices which read and write blocks of data. Each block has the size of one sector. It is independent of the block size used by the file system layered on top of the block devices.
Exemplary aspects of the disclosure include a method for managing redundant block devices using Mojette Transform, where the method includes creating a mapping between a plurality of blocks in the redundant block devices and Mojette Transform pixels, and writing data into the plurality of blocks in a plurality of data disks. The method also includes calculating parity projections and transferring the parity projections into a plurality of respective blocks in the plurality of blocks using Mojette Transform projections, and updating data in one of the plurality of blocks and the respective parity projection based on an input from a user.
Exemplary aspects of the disclosure also include a non-transitory computer-readable medium encoded with computer-readable instructions that, when executed by processing circuitry, cause the processing circuitry to perform a method including creating a mapping between a plurality of blocks in the redundant block devices and Mojette Transform pixels, and writing data into the plurality of blocks in a plurality of data disks. The method also includes calculating parity projections and transferring the parity projections into a plurality of respective blocks in the plurality of blocks using Mojette Transform projections, and updating, data in one of the plurality of blocks and the respective parity projection based on an input from a user.
Exemplary aspects of the disclosure further include a device having circuitry that creates a mapping between a plurality of blocks in the redundant block devices and Mojette Transform pixels, and writes data into the plurality of blocks in a plurality of data disks. The circuitry also calculates parity projections and transfers the parity projections into a plurality of respective blocks in the plurality of blocks using Mojette Transform projections, and updates data in one of the plurality of blocks and the respective parity projection based on an input from a user.
The embodiments described above are merely given as examples, and it is understood that the proposed technology is not limited thereto. It is understood by those skilled in the art that various modifications, combinations and changes may be made to the embodiments without departing from the present scope as defined by the appended claims. In particular, different part solutions in the different embodiments can be combined in other configurations, where technically possible.
In the examples of the Redundant Block Devices using Mojette Transform Projections (RBD_MT), a 2+2 configuration (2 data devices and 2 parity devices) is used. Shaded cells indicated by, for example 311 and 312 in
M{f(k,l)}≡proj(pi,qi,a)=Σk=0Q−1Σl=0p−1f(k,l)δ(a+pil−qik) (1)
The summation indices P and Q correspond to the size of the data block. That is, the data is provided as a block with dimensions P×Q. The variable a is a value that specifies a line and is given by:
a=pil−qik (2)
The line specified by a is a line over which the elements, or pixels, of the block are centered. Applying the Mojette transform operator to a particular data block leads to a sim over the elements or pixels that are centered about a particular line, where the particular line can be inferred from the Kronecker delta function:
a=pil−qik (3)
If a=0, and 0 otherwise, in the above equation, a can be removed from the argument in the following equation:
proj(pi,qi,a) (4)
Thus, the projection can simply be denoted by (pi,qi). The formula (1) above can then be used to generate a projection with any value of p and q. The number B of line sums, also referred to as the number of bins, per projection is given by:
B=(Q−1)|p|+(P−1)|q|+1 (5)
Examples of how projections are used in the inventive concepts described herein are described below.
Returning to
P=6
Q=3
P(1,1) p=1 and q=1
B=(Q−1)|p|+(P−1)|q|+1=(3−1)*1+(6−1)*1+1=8
Extra blocks for P(1,1), B−P=8−6=2
In order to set the device to the correct configuration, the steps described in the table labeled “Configuration of Block Device,” below must be performed as described in greater detail herein.
The basics of the MT include an equation for the configuration of a block to be used with a set of MT projections. In
In
An example is provided for each of the different cases including Block Sequential Writes (BSW) to the RBD_MT, Block Random Write, and Block Update.
Data can be written using BSW until the devices are full using the same block MT configuration.
Random writes to the same configured RBD_MT are performed with reference to
These random writes can then continue to fill the disks/memory until full, which means all blocks contain written information/data.
Block Update is described with reference to
Updating of written blocks can continue until the disks/memory are full.
The table in
For example, Block 1:2 is updated from C to X. The difference is computed to C{circumflex over ( )}X. And parity blocks 3:2 and 4:3 are patched with the computed difference. The result is B{circumflex over ( )}X and X{circumflex over ( )}F respectively.
Partial update (PU) of a block is an important feature to reduce the number of operations necessary when the RBD_MT is used. This example shows that a general partial update (PU) of MT coded data reduces the number of operations needed when re-encoding parity chunks after an update to one or more data chunks. If data chunks are transmitted to the machine performing the re-encode, it will also reduce the network traffic since less data chunks need to be transmitted over the network. In
Referring to
The memory 820 is stores a program 850, and the processor 810 is connected to the memory 820 by using the bus 840. When the computer device 800 is running, the processor 810 executes the program 850 stored in the memory 820, so that the computer device 800 performs the functions described above. The processor 810 is configured to perform the functions described above, with reference to other Figures.
The memory 820 may include a high-speed random access memory (RAM) memory. Optionally, the memory 820 may further include a non-volatile memory. For example, the memory 820 may include a magnetic disk memory.
The processor 810 may be a central processing unit (CPU), or the processor 810 may be an application-specific integrated circuit (ASIC), or the processor 810 may be one or more integrated circuits configured to implement the embodiments of the present disclosure.
A person of ordinary skill in the art may understand that all or some of the steps of the methods in the embodiments may be implemented by a program instructing relevant hardware. The program may be stored in a computer readable storage medium, such as a non-transitory computer readable storage medium. The storage medium may include a Read Only Memory (ROM), a RAM, a magnetic disk, or an optical disc.
The devices 900, 915, 920, and 925 are connected to a network 930, which can be a wireless network (cellular or wifi) or a wired network, or any combination thereof. The network 930 may also include private networks, public networks, such as the Internet, or any combination thereof. As can be appreciated, the make-up of the network 930 does not limit the present disclosure, which is applicable to any network configuration.
A computer device 935 is also connected to the network 930 in order to communicate with the devices 900, 915, 920, and 925 and to carry out the functions described above.
The embodiments described above are merely given as examples, and it is understood that the proposed technology is not limited thereto. It can be understood by those skilled in the art that various modifications, combinations and changes may be made to the embodiments without departing from the present scope as defined by the appended claims. In particular, different part solutions in the different embodiments can be combined in other configurations, where technically possible.
This application claims the benefit of priority under 35 U.S.C. 119(e) to U.S. Application No. 62/822,508 entitled “Method and Devices for Creating Redundant Block Devices using Mojette Transform Projections,” and filed on Mar. 22, 2019. The entire contents of U.S. Application No. 62/822,508 are incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
9846613 | Arslan | Dec 2017 | B2 |
20050283652 | Chen | Dec 2005 | A1 |
20080126912 | Zohar | May 2008 | A1 |
20150154072 | Diamant | Jun 2015 | A1 |
20160254826 | David | Sep 2016 | A1 |
20180048333 | Nilsson | Feb 2018 | A1 |
20180246784 | Nilsson | Aug 2018 | A1 |
20190050293 | Resch | Feb 2019 | A1 |
20190158120 | Andersson | May 2019 | A1 |
20190158124 | Nilsson | May 2019 | A1 |
Number | Date | Country |
---|---|---|
WO 2017023199 | Feb 2017 | WO |
Entry |
---|
J. Shen, K. Zhang, J. Gu, Y. Zhou and X. Wang, “Efficient Scheduling for Multi-Block Updates in Erasure Coding Based Storage Systems,” in IEEE Transactions on Computers, vol. 67, No. 4, pp. 573-581, Apr. 1, 2018, doi: 10.1109/TC.2017.2769051. (Year: 2018). |
Extended European Search Report dated Aug. 19, 2020 in European Patent Application No. 20160320.6, 12 pages. |
European Office Action dated Sep. 28, 2020 in European Patent Application No. 20160320.6, 2 pages. |
Pertin, D., et al., “Performance evaluation of the Mojette erasure code for fault-tolerant distributed hot data storage”, Arxiv.Org, Cornell University Library, 201 Library Cornell University, Apr. 27, 2015, XP080808751, pp. 1-5. |
Number | Date | Country | |
---|---|---|---|
20200301781 A1 | Sep 2020 | US |
Number | Date | Country | |
---|---|---|---|
62822508 | Mar 2019 | US |