This application is related to U.S. Nonprovisional patent application Ser. No. 13/765,687, filed Feb. 12, 2013, which is herein incorporated by reference in its entirety and for all purposes.
Embodiments of the invention relates generally to software, data storage, and virtualized computing and processing resources. More specifically, techniques for replicating data and/or files constituting a virtual machine, or portion thereof, using deduplication metadata are described.
Conventional approaches to replicating virtual machine images are typically a resource-intensive. Organizations replicate virtual machine images for a variety of reasons, but one notable reason is disaster recovery. Virtual machine-based computing systems in one geographic region, such as in New York City, that can be susceptible to data loss or inability to access data due to, for example, a severe hurricane or other types of disasters. In such occasions, transferring data from the affected region to another virtual machine-based computing system in another geographic region enables an organization to continue to keep its internal processes (e.g., of a business) up and running.
However, transferring data to replicate virtual machine-based computing system can involve transferring gigabytes or terabytes of data via a variety of networks, including the Internet. Creating a replica of a virtual machine requires reading the source virtual machine image block by block and transmitting copying each block to the replicated virtual machine image. This is a relatively time-consuming operation since the data sizes of virtual machine images can take many hours to complete.
Moreover, a rapidly-growing demand of virtualized systems and machines means hundreds of thousands of virtual machines may need to be deployed at different locations. Conventional solutions of replication hundreds or thousands of virtual machines is cost prohibitive and time consuming and do not scale effectively with the relatively large number of virtual machines required for deployment, even if the underlying file system of the virtual machines is deduplicated.
For example, synchronous replication techniques require the copying of data over a variety of networks to maintain up-to-date copies of the data. Generally, synchronous replication requires data to be synchronously written to different locations contemporaneously, whereby latency is introduced due to replicating to a remote location. In particular, the latency slows operation of the principal virtual machines as data is written remote virtual machines and/or storage.
Thus, what is needed is a solution for improving the cost and efficiency of replicating images of virtual machines without the limitations of conventional techniques.
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings:
Various embodiments or examples may be implemented in numerous ways, including as a system, a process, an apparatus, a user interface, or a series of program instructions on a computer readable medium such as a computer readable storage medium or a computer network where the program instructions are sent over optical, electronic, or wireless communication links. In general, operations of disclosed processes may be performed in an arbitrary order, unless otherwise provided in the claims.
A detailed description of one or more examples is provided below along with accompanying figures. The detailed description is provided in connection with such examples, but is not limited to any particular example. The scope is limited only by the claims and numerous alternatives, modifications, and equivalents are encompassed. Numerous specific details are set forth in the following description in order to provide a thorough understanding. These details are provided for the purpose of example and the described techniques may be practiced according to the claims without some or all of these specific details. For clarity, technical material that is known in the technical fields related to the examples has not been described in detail to avoid unnecessarily obscuring the description.
In some examples, the described techniques may be implemented as a computer program or application (“application”) or as a plug-in, module, or sub-component of another application. The described techniques may be implemented as software, hardware, firmware, circuitry, or a combination thereof. If implemented as software, the described techniques may be implemented using various types of programming, development, scripting, or formatting languages, frameworks, syntax, applications, protocols, objects, or techniques, including ASP, ASP.net, .Net framework, Ruby, Ruby on Rails, C, Objective C, C++, C#, Adobe® Integrated Runtime™ (Adobe® AIR™), ActionScript™, Flex™, Lingo™, Java™, Javascript™, Ajax, Perl, COBOL, Fortran, ADA, XML, MXML, HTML, DHTML, XHTML, HTTP, XMPP, PHP, and others. The described techniques may be varied and are not limited to the examples or descriptions provided.
As described herein, techniques for efficient replication of virtual machine images by transferring replication data and deduplication metadata using techniques described herein. The described techniques may be performed in real-time or substantially real-time in which data representing source virtual machines are used to form new virtual machines using a fast-replicate application. The fast replicating techniques described result in the formation of multiple new virtual machines that are replicated instances of a source virtual machine without, for example requiring the transferring 1 of the underlying data of the source virtual machine. Further, the described techniques can significantly reduce the amount of time required to create new virtual machines by creating the new virtual machines without transferring, for example, common data of the source virtual machine. Still further, the described techniques may reduce the amount of storage required for the new virtual machines as the replicated virtual machines need not be created by copying the files of the source virtual machine. In some examples, the described virtual machine replicating techniques may also improve scalability of virtualized networks by significantly shortening the time required to establish larger numbers of virtual machines at different locations, such as different geographic regions. Additionally, the described virtual machine replicating techniques can also be used to create new virtual machines in system memory, such as RAM, where data can be accessed quickly.
In this example, system 100 includes a replicator 101, a software layer 160, and any number of nodes 102 to 142 coupled to storage devices 110 to 150, respectively. Each node can include a virtual machine (“VM”) manager, one or more server applications, and system memory, such as RAM.
Replicator 101 is configured facilitate efficient replication of virtual machine data from one or more nodes 102-142 of the cluster by transferring a subset of data constituting virtual machine data via one or more networks 103 to a transferee computing system (not shown), such as remote storage media devices or as a remote cluster of nodes similar to nodes 102-142. Replicator 101 can be configured to filter out or otherwise block transfer of non-essential data blocks, according to some embodiments. For example, non-essential data blocks can include redundant blocks or duplicate blocks that can reside on a replica node in the transferee computing system. Further, replicator 101 is configured to facilitate expeditious replication, especially due to a higher replication factor that is set to enhance a cluster's fault tolerance. A replication factor is a number of replica copies required in other nodes (e.g., other nodes in the cluster), which is to be transferred to the transferee computing system to maintain replicated virtual machine data.
In some embodiments, VM managers 104 to 144 include a deduplication application that can be configured to eliminate duplicate copies of repeating data to effect a form of data compression to maximize storage in one or more types of storage media (e.g., storage devices 110 to 150, non-volatile memory, and volatile memory). In a deduplication-based file system, a deduplication application can identify and eliminate duplicate copies of repeating data and implement a reference link to point to the original data, thereby eliminating duplicate data, according to some embodiments. For example, the deduplication application can store data representing a link (e.g., the reference link) associating the eliminated duplicate data and the original data in the form of deduplication metadata, which functions to describe the relationship between the original data and the deduplicated data. Examples of techniques associated with deduplication of virtual machine files are described in co-pending U.S. patent application Ser. No. 13/269,525, filed Oct. 7, 2011, entitled “Deduplication of Virtual Machine Files in a Virtualized Desktop Environment,” which is incorporated herein by reference in its entirety for all purposes.
In some embodiments, a deduplication application can store the deduplication metadata in a metadata file or table used to describe or map the relationships between the deduplicated data and the original data. For example, a metadata file or table can contain data representing a block number that is associated with the physical location or data block of the data in a storage device in a deduplicated file system. Such a data block can contain data representing information such as a block number, data associated with a hash value generated by a hashing function (e.g., SHA-1 or MD5) that uniquely identifies the data in the data block, and data associated with a reference link counter to track the number of times a reference link associated with the data block is implemented.
Replicator 101 can be implemented a distinct computing device, as shown, or can be disposed or distributed in one or more nodes 102 to 142. Replicator 101 can include structures and/or functions that can be implemented in software, hardware, firmware, circuitry, or any combination thereof. As depicted in
For example, replicator 101 and any of its one or more components can include one or more processors configured to execute one or more algorithms in memory. Thus, at least some of the elements in
As hardware and/or firmware, the above-described structures and techniques can be implemented using various types of programming or integrated circuit design languages, including hardware description languages, such as any register transfer language (“RTL”) configured to design field-programmable gate arrays (“FPGAs”), application-specific integrated circuits (“ASICs”), multi-chip modules, or any other type of integrated circuit. For example, replicator 101 and any of its one or more components can be implemented in one or more computing devices that include one or more circuits. Thus, at least one of the elements in
According to some embodiments, the term “circuit” can refer, for example, to any system including a number of components through which current flows to perform one or more functions, the components including discrete and complex components. Examples of discrete components include transistors, resistors, capacitors, inductors, diodes, and the like, and examples of complex components include memory, processors, analog circuits, digital circuits, and the like, including field-programmable gate arrays (“FPGAs”), application-specific integrated circuits (“ASICs”). Therefore, a circuit can include a system of electronic components and logic components (e.g., logic configured to execute instructions, such that a group of executable instructions of an algorithm, for example, and, thus, is a component of a circuit). According to some embodiments, the term “module” can refer, for example, to an algorithm or a portion thereof, and/or logic implemented in either hardware circuitry or software, or a combination thereof (i.e., a module can be implemented as a circuit). In some embodiments, algorithms and/or the memory in which the algorithms are stored are “components” of a circuit. Thus, the term “circuit” can also refer, for example, to a system of components, including algorithms. These can be varied and are not limited to the examples or descriptions provided.
In some embodiments, duplicate instances of an entire virtual machine image can be formed or created by duplicating the deduplication metadata files associated with the virtual machine and without copying any data portions of the virtual machine itself. For example, to create a replicated virtual machine (or a replicated instance of a source virtual machine), deduplication metadata table 256a, which includes links to the data blocks where the source virtual machine data is stored, is duplicated to form deduplication metadata table 258, which includes new links to the data blocks where the source virtual machine data is stored (254). After an instance of the source virtual machine is formed, a reference link counter for each of the data blocks of the data of the source virtual machine is incremented a replicator 101 of
Read/write request module 404 is configured to detect a request to access data in a local storage device (e.g., a hypervisor datastore), whereby data blocks are written to, or read from, the local storage device. Thus, read/write request module 404 can be configured to identify data representing a first file on a first storage device during a write operation. The data can include metadata for deduplicated data. Data-read module 406 is configured to detect read operations and associated data from the local storage device. The data block can be retrieved by looking up a local node's MetaMap data in metamap data repository 405a with a key set to a logical block number (“LBN”) of the data block to obtain a corresponding hash value. With the hash value as a key, a look-up operation can be performed in the DataMap data in datamap data repository 405b to retrieve the actual data.
Data-write module 408 is configured to determine whether the data representing a first file matches a set of data on a second storage device. For example, data-write module 408 is configured to compute hash value of a data block to be written to the local storage device. Further, data-write module 408 is configured to transmit hash to the replica node along with a local node ID and a logical block number (“LBN”) of data on disk.
Replication acknowledgment module 410 can be configured to check on the replica node to determine if the same hash block exists on remote storage device, which is local to the replica node. Replication acknowledgment module 410 can be configured to form a second file on the second storage device by, for example, linking the second file to the set of data on the second storage device if the data representing the first file matches the set of data on the second storage device, and copying the data representing the first file to form the second file if the data representing the first file does not match the set of data on the second storage device. If the replica node includes the data, then an entry is made in the replica node's MetaMap and replicator 402 is informed of successful replication. As such, there is no need to send the actual data block across “the wire,” or over the networks. The source node's MetaMap also gets updated pointing to remote node for that data block. But if the replica node does not include the data, then the data block is sent to the replica node, along with its hash value, logical block number (“LBN”), node ID, and entries are made in the repositories for MetaMap and DataMap data in replicator 402.
To illustrate operation of a replicator of various embodiments, consider that data is written into a local storage device in a first time interval. That is, data is written as data 515, 516, 517, and 518 into metamap table 510, and data 519 is written into datamap table 520 (e.g., F represents a block of data). At this time, metadata 550b for data block “A” in 524, which is associated with key 522, indicates that there are “three” instances of “A” in the source node (e.g., an original data block with “A” and two links to that original data block).
A request to initiate a replication operation occurs in a second time interval in which the replica node includes data—prior to replication—in replica node tables 530a and 540a. In particular, metamap table 530a includes initial data for block numbers 532a and corresponding hash values 534a prior to replication. Datamap table 540a includes initial data hash values 542a and data 544a for blocks of data prior to replication. At this time, the metadata 550c for data block “A” in 544a, which is associated with key 542a, indicates that there is “one” instance of “A” in the replica node (i.e., the data of the original data block with “A”).
Next, in a third time interval, data associated with data block “F” is written to the replica node. That is, data block “F” is written into datamap table 540b as data 544b, which is associated with hash value 542b, after replication. Further, during replication, data 535, 536, 537, and 538 is written into metamap table 530b. Thereafter, metadata for data block “A” has a value of “three” links, as depicted in metadata 550d.
According to some examples, computing platform 800 performs specific operations by processor 804 executing one or more sequences of one or more instructions stored in system memory 806, and computing platform 800 can be implemented in a client-server arrangement, peer-to-peer arrangement, or as any mobile computing device, including smart phones and the like. Such instructions or data may be read into system memory 806 from another computer readable medium, such as storage device 808. In some examples, hard-wired circuitry may be used in place of or in combination with software instructions for implementation. Instructions may be embedded in software or firmware. The term “computer readable medium” refers to any tangible medium that participates in providing instructions to processor 804 for execution. Such a medium may take many forms, including but not limited to, non-volatile media and volatile media. Non-volatile media includes, for example, optical or magnetic disks and the like. Volatile media includes dynamic memory, such as system memory 806.
Common forms of computer readable media includes, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, or any other medium from which a computer can read. Instructions may further be transmitted or received using a transmission medium. The term “transmission medium” may include any tangible or intangible medium that is capable of storing, encoding or carrying instructions for execution by the machine, and includes digital or analog communications signals or other intangible medium to facilitate communication of such instructions. Transmission media includes coaxial cables, copper wire, and fiber optics, including wires that comprise bus 802 for transmitting a computer data signal.
In some examples, execution of the sequences of instructions may be performed by computing platform 800. According to some examples, computing platform 800 can be coupled by communication link 821 (e.g., a wired network, such as LAN, PSTN, or any wireless network) to any other processor to perform the sequence of instructions in coordination with (or asynchronous to) one another. Computing platform 800 may transmit and receive messages, data, and instructions, including program code (e.g., application code) through communication link 821 and communication interface 813. Received program code may be executed by processor 804 as it is received, and/or stored in memory 806 or other non-volatile storage for later execution.
In the example shown, system memory 806 can include various modules that include executable instructions to implement functionalities described herein. In the example shown, system memory 806 includes a mapping module 856, a read/write replicate request module 858, a data-write module 860, a data-read module 862, and a replication acknowledgment module 862, any of which can be configured to provide one or more functions described herein.
According to some embodiments, the term “circuit” can refer, for example, to any system including a number of components through which current flows to perform one or more functions, the components including discrete and complex components. Examples of discrete components include transistors, resistors, capacitors, inductors, diodes, and the like, and examples of complex components include memory, processors, analog circuits, digital circuits, and the like, including field-programmable gate arrays (“FPGAs”), application-specific integrated circuits (“ASICs”). Therefore, a circuit can include a system of electronic components and logic components (e.g., logic configured to execute instructions, such that a group of executable instructions of an algorithm, for example, and, thus, is a component of a circuit). According to some embodiments, the term “module” can refer, for example, to an algorithm or a portion thereof, and/or logic implemented in either hardware circuitry or software, or a combination thereof (i.e., a module can be implemented as a circuit). In some embodiments, algorithms and/or the memory in which the algorithms are stored are “components” of a circuit. Thus, the term “circuit” can also refer, for example, to a system of components, including algorithms. These can be varied and are not limited to the examples or descriptions provided.
Although the foregoing examples have been described in some detail for purposes of clarity of understanding, the above-described inventive techniques are not limited to the details provided. There are many alternative ways of implementing the above-described invention techniques. The disclosed examples are illustrative and not restrictive.
Number | Name | Date | Kind |
---|---|---|---|
4603380 | Easton et al. | Jul 1986 | A |
6675214 | Stewart et al. | Jan 2004 | B2 |
6807619 | Ezra et al. | Oct 2004 | B1 |
6915302 | Christofferson et al. | Jul 2005 | B1 |
7269608 | Wong et al. | Sep 2007 | B2 |
7356651 | Liu et al. | Apr 2008 | B2 |
7571288 | Pudipeddi et al. | Aug 2009 | B2 |
7908436 | Srinivasan et al. | Mar 2011 | B1 |
8046446 | Karr et al. | Oct 2011 | B1 |
8117464 | Kogelnik | Feb 2012 | B1 |
8135930 | Mattox | Mar 2012 | B1 |
8312471 | Davis | Nov 2012 | B2 |
8442955 | Al Kiswany | May 2013 | B2 |
8495288 | Hosoya et al. | Jul 2013 | B2 |
8566821 | Robinson et al. | Oct 2013 | B2 |
8732401 | Venkatesh et al. | May 2014 | B2 |
8983952 | Zhang | Mar 2015 | B1 |
9037547 | Shivdeo | May 2015 | B1 |
9305007 | Efstathopoulos | Apr 2016 | B1 |
20020124137 | Ulrich et al. | Sep 2002 | A1 |
20030145045 | Pellegrino et al. | Jul 2003 | A1 |
20030188045 | Jacobson | Oct 2003 | A1 |
20040111443 | Wong et al. | Jun 2004 | A1 |
20040128470 | Hetzler et al. | Jul 2004 | A1 |
20050038850 | Oe et al. | Feb 2005 | A1 |
20050108440 | Baumberger et al. | May 2005 | A1 |
20050114595 | Karr et al. | May 2005 | A1 |
20050131900 | Palliyll et al. | Jun 2005 | A1 |
20060112251 | Karr et al. | May 2006 | A1 |
20060272015 | Frank et al. | Nov 2006 | A1 |
20070005935 | Khosravi et al. | Jan 2007 | A1 |
20070192534 | Hwang et al. | Aug 2007 | A1 |
20070248029 | Merkey et al. | Oct 2007 | A1 |
20070260702 | Richardson et al. | Nov 2007 | A1 |
20070266037 | Terry et al. | Nov 2007 | A1 |
20080183986 | Yehia et al. | Jul 2008 | A1 |
20090063528 | Yueh | Mar 2009 | A1 |
20090063795 | Yueh | Mar 2009 | A1 |
20090089337 | Perlin et al. | Apr 2009 | A1 |
20090254507 | Hosoya et al. | Oct 2009 | A1 |
20090319772 | Singh et al. | Dec 2009 | A1 |
20100031000 | Flynn et al. | Feb 2010 | A1 |
20100064166 | Dubnicki et al. | Mar 2010 | A1 |
20100070725 | Prahlad et al. | Mar 2010 | A1 |
20100138827 | Frank | Jun 2010 | A1 |
20100180153 | Jernigan, IV et al. | Jul 2010 | A1 |
20100181119 | Saigh et al. | Jul 2010 | A1 |
20100188273 | He et al. | Jul 2010 | A1 |
20100274772 | Samuels | Oct 2010 | A1 |
20100306444 | Shirley et al. | Dec 2010 | A1 |
20100332401 | Prahlad et al. | Dec 2010 | A1 |
20110035620 | Elyashev | Feb 2011 | A1 |
20110055471 | Thatcher et al. | Mar 2011 | A1 |
20110071989 | Wilson et al. | Mar 2011 | A1 |
20110082836 | Wang et al. | Apr 2011 | A1 |
20110131390 | Srinivasan et al. | Jun 2011 | A1 |
20110145243 | Yudenfriend | Jun 2011 | A1 |
20110167045 | Okamoto | Jul 2011 | A1 |
20110196900 | Drobychev et al. | Aug 2011 | A1 |
20110265083 | Davis | Oct 2011 | A1 |
20110276781 | Sengupta et al. | Nov 2011 | A1 |
20110295914 | Mori | Dec 2011 | A1 |
20120016845 | Bates | Jan 2012 | A1 |
20120054445 | Swart et al. | Mar 2012 | A1 |
20120084262 | Dwarampudi | Apr 2012 | A1 |
20120137054 | Sadri et al. | May 2012 | A1 |
20120151477 | Sinha et al. | Jun 2012 | A1 |
20120159115 | Cha et al. | Jun 2012 | A1 |
20120254131 | Kiswany | Oct 2012 | A1 |
20130013865 | Venkatesh et al. | Jan 2013 | A1 |
20130036091 | Provenzano | Feb 2013 | A1 |
20130117494 | Hughes et al. | May 2013 | A1 |
20130124523 | Rogers et al. | May 2013 | A1 |
20130166831 | Atkisson et al. | Jun 2013 | A1 |
20130238876 | Fiske et al. | Sep 2013 | A1 |
20130282627 | Faddoul et al. | Oct 2013 | A1 |
20130283004 | Devine et al. | Oct 2013 | A1 |
20140074804 | Colgrove | Mar 2014 | A1 |
Number | Date | Country | |
---|---|---|---|
20140229440 A1 | Aug 2014 | US |