Embodiments of the invention relate generally to software, data storage, and virtualized computing and processing resources. More specifically, systems and apparatuses are described for aggregating nodes to form an aggregated virtual storage for a virtualized desktop environment.
Virtualization is a technology that provides a software-based abstraction to a physical, hardware-based computer. In conventional solutions, an abstraction layer decouples physical hardware components (e.g., central processing unit (“CPU”), memory, disk drives, storage) from an operating system and allows numerous instances to be run side-by-side as virtual machines (“VMs”) in isolation of each other. In conventional solutions, an operating system within a virtual machine has visibility into and can perform data transactions with a complete, consistent, and normalized set of hardware regardless of the actual individual physical hardware components underneath the software-based abstraction.
Virtual machines, in conventional solutions, are encapsulated as files (also referred to as images) making it possible to save, replay, edit and copy a virtual machine in a manner similar to that of handling a file on a file-system. This capability provides improved manageability, increased flexibility, and rapid administration relative to using physical machines to replace those that are abstracted.
However, virtual machines and conventional data storage implementations for the virtual machines suffer from significant shortcomings as VM files tend to be large in size and consume large amounts of disk space. Further, traditional data storage implementations typically include Storage Area Networks (“SANs”), Network Attached Storage (“NAS”), and the like. While functional, drawbacks to these storage technologies include optimizations for read accesses, while typically being ill-suited for write-intensive applications and operations. These traditional data storage require hardware and computing resources for implementing SAN-based or NAS-based storage, in addition to the computing resources and/or physical hardware components that provide the functionalities of the VMs.
Thus, what is needed is a solution for improving data storage for a virtualized desktop environment without the limitations of conventional techniques.
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings:
Various embodiments or examples may be implemented in numerous ways, including as a system, a process, an apparatus, a user interface, or a series of program instructions on a computer readable medium such as a computer readable storage medium or a computer network where the program instructions are sent over optical, electronic, or wireless communication links. In general, operations of disclosed processes may be performed in an arbitrary order, unless otherwise provided in the claims.
A detailed description of one or more examples is provided below along with accompanying figures. The detailed description is provided in connection with such examples, but is not limited to any particular example. The scope is limited only by the claims and numerous alternatives, modifications, and equivalents are encompassed. Numerous specific details are set forth in the following description in order to provide a thorough understanding. These details are provided for the purpose of example and the described techniques may be practiced according to the claims without some or all of these specific details. For clarity, technical material that is known in the technical fields related to the examples has not been described in detail to avoid unnecessarily obscuring the description.
In some examples, the described techniques may be implemented as a computer program or application (“application”) or as a plug-in, module, or sub-component of another application. The described techniques may be implemented as software, hardware, firmware, circuitry, or a combination thereof. If implemented as software, the described techniques may be implemented using various types of programming, development, scripting, or formatting languages, frameworks, syntax, applications, protocols, objects, or techniques, including ASP, ASP.net, .Net framework, Ruby, Ruby on Rails, C, Objective C, C++, C#, Adobe® Integrated Runtime™ (Adobe® AIR™), ActionScript™, Flex™, Lingo™, Java™, Javascript™, Ajax, Perl, COBOL, Fortran, ADA, XML, MXML, HTML, DHTML, XHTML, HTTP, XMPP, PHP, and others. The described techniques may be varied and are not limited to the examples or descriptions provided.
Node storage aggregator 104 is configured to generate one or more aggregated virtual storage repositories. In one example, node storage aggregator 104 is configured to form aggregated virtual storage 121a (e.g., a local aggregated virtual storage) based on memories 123 of subset 120a of servers, whereas node storage aggregator 104 is configured to form aggregated virtual storage 121b and aggregated virtual storage 121n based on memories 123 of subset 120b of servers and subset 120n of servers, respectively. Node storage aggregator 104 is configured to access via paths 106a aggregated virtual storage 121a, 121b, and 121n. In an alternate example, node storage aggregator 104 is configured to form further aggregate aggregated virtual storage 121a, 121b, and 121n to form an aggregated virtual storage 110 (e.g., a global aggregated virtual storage). Node storage aggregator 104 is configured to access aggregated virtual storage 110 via path 106c, whereby aggregated virtual storage 110 includes aggregated virtual storage 121a, 121b, and 121n and accesses are via paths 106b to each aggregated virtual storage associated with a subset of servers. Further, node storage aggregator 104, which can include one or more storage aggregator processors coupled to a memory (not shown), can include executable instructions to generate a data structure 123a or 123b for storage in each memory in an associated server 124 in a subset of servers, such as subset 120b of servers. Each of the data structures 123a and 123b is configured to store a reference to duplicative data stored in a first number of servers in as subset 120b of servers. As used herein, the term “duplicative data” refers to copies of identical data, including the original data and copies thereof. In some embodiments, node storage aggregator 104 is configured to populate data structures 123a and 123b with identical metadata (“md”) 125 and to disperse the data referenced by metadata 125 in each server 124 (a subset thereof) in subset 120b of servers.
In view of the foregoing, the structures and/or functionalities of virtualized storage environment 101 can facilitate implementation of aggregated virtual storage based on memories 123 of servers 124a disposed in, for example, server racks, thereby obviating the dependency on specialized storage technologies, such as SAN and NAS, to provide storage to virtual desktops and/or machines. Therefore, specialized hardware and/or software for implementing the specialized storage technologies need not be required. In some cases, the underlying physical hardware for implementing the virtual machines can be used to implement virtual storage in the aggregate. In accordance with various embodiments, the duplicative data provides data redundancy when a second number of servers, or fewer, in subset 120b of servers are inaccessible. Data redundancy is a general property or characteristic of disks or memories that specify a certain level of fault tolerance should one or more disks or memories fail. Further, the storing of identical metadata 125 preserves references to the duplicated data should servers 124 in subset 120b of servers fail. For example, consider that server 124a is off-line or is otherwise inaccessible (i.e., server 124a is non-responsive). Therefore, metadata 125 in data structure 123b is also inaccessible. In this case, node storage aggregator 104 access metadata 125 in other data structures 123a to determine references to the data being access (e.g., during a read operation), whereby the duplicative data is dispersed in responsive servers 124.
According to some embodiments, node storage aggregator 104 is configured to translate between disk memory access requests (and formats thereof) and access requests with the aggregated virtual storage (and formats thereof). Examples of disk memory access requests include, but are not limited to, requests to access a file in a root file system (e.g., accessing c:\\root\file.docx), which in turn, are related to access to a specific sector. As is discussed below, examples of access requests with the aggregated virtual storage include, but are not limited to, a sector number, identifiers of one or more nodes, a data representation of the data (e.g., a hash value), or other items of data. In some embodiments, subsets 120a, 120b, and 120n of servers each include a server rack and a number of housings in the server rack, each of which is configured to support one of servers 124. Also, subsets 120a, 120b, and 120n of servers each include a communications bus coupling each of servers 124 in subsets 120a, 120b, and 120n of servers to each other and to node storage aggregator 104. As used herein, the term “node” can refer to a collection of one or more processors and one or more memory devices, such as a server.
Write controller 230 is configured to receive a write request (“wr req”) 201, from which write controller 230 extracts data 207 to be written and write information (“wr info”) 206 including a sector number, “S,” associated with the data. Hash generator 233 is configured to receive data, such as data to be written to aggregate virtual storage. The data is divided into portions of data as fragments 234a of data (e.g., F1, F2, and F3), each of which undergoes a hashing operations to generate hash values 234b (e.g., H1, H2, and H3) as key values. Examples of the hashing operation can include MD-5, SHA, and other like hash functions. In some embodiments, the hash values are “data representations” of the portions of data. But note that in some cases, the data itself can be “data representations” of the portions of data. Disperse controller 250 is configured to provide node identifiers to metadata generator 235, whereby the node identifiers specify the nodes to which duplicative data is to be stored. In some embodiments, the nodes (and the node identifiers) specify the optimal nodes as a function of capacity for specific nodes, access speed with the node, and other node access characteristics. For example, disperse controller 250 can receive node characteristic data (“charz”) 280 specifying the attributes of the various nodes, and disperse controller 250 selects the node identifiers for the optimal nodes and presents those node identifiers to metadata generator 235. In some embodiments, a node identifier can be a MAC address, an IP address, or any other unique identifier. The portions of data (i.e., fragments) and/or the hash values can be sized to a 4 Kilobyte (“Kb”) block.
Metadata generator 235 can use a data representation, such as a hash value, as a reference to duplicative data associated with a sector, along with one or more node identifiers that each identify a server in a first number of servers to which duplicative data is to be written. Metadata generator 235 then can generate metadata including the reference to the duplicative data and the node identifier. In operation, metadata generator 235 receives either portions (i.e., fragments 234a) of data or hash values for the portions of data, or both. Further, metadata generator 235 receives node identifiers to be written with duplicative data referenced by data representations (e.g., either the portion of data itself or the hash value). For each duplicative data referenced by a data representation, there is a corresponding set of one or more node identifiers. Each data representation is associated with the one or more node identifiers in a data structure, which is duplicated at duplication module 256 to generate multiple copies of metadata 243 to be stored in a data structure in each node (e.g., in a server rack).
Data disperser 252 is configured to generate duplicative data in cooperation with duplication module 256, and to disperse the duplicative data among a first number of nodes. The first number of nodes can be determined as a quantity, “x,” of minimum number of responsive nodes to ensure data redundancy for a maximum quantity for the non-responsive nodes, “y,” which constitutes the second number of nodes. In particular, “x” is calculated as follows: N−y=x, where N represents the total number of nodes in an aggregated virtual storage space. In the example of 5 total nodes, with no more than 2 nodes that are tolerated to be non-responsive, the duplicative data is written into 3 nodes. For example, data disperser can generate duplicative data 242 that is written to 3 nodes. In a specific embodiment, a striping module 254 is configured to stripe the data representation over the first number of node (e.g., 3 nodes, or 2 nodes with 1 node including parity data) to form striped data 240, and disperse parity data 241 over the subset of nodes or a portion thereof.
According to some embodiments, node storage aggregator 204 is configured to build and maintain node write-repository 236 that is configured to store associations among data representing a sector number (“Sec #”) 237, a hash value (“HV”), and one or more node identifiers (“N1, N2”). In one example, node write-repository 236 is populated with sector number, S, from write information 206 when data 207 is to written to an aggregate virtual storage. A hash value (“HV”) 234b is generated from a portion or fragment 234a of data and is stored in column 239 in association with sector number, S. Disperse controller 250 populates column 238 to include the node identifiers (“N1, N2”) 238 in association with sector number, S, and hash value, HV. Thus, a read controller 232, responsive to a read request (“Rd Req”) 203, can match metadata read out of the aggregate virtual storage against node write-repository 236 to identify alternative node identifiers (e.g., N2) if a node identifier (e.g., N1) is associated with a non-responsive node (e.g., the node is off-line). In some embodiments, deduplication application 202 can be implemented to remove duplicate (i.e., redundant) information in VM files in a read or write path between a virtual machine and an aggregated virtual storage. An example of deduplication application 202 is described in U.S. Non-Provisional patent application Ser. No. 13/269,525, filed Oct. 7, 2011, and entitled “Deduplication Of Virtual Machine Files In A Virtualized Desktop Environment.”
According to some examples, computer system 600 performs specific operations by processor 604 executing one or more sequences of one or more instructions stored in system memory 606. Such instructions may be read into system memory 606 from another computer readable medium, such as static storage device 608 or disk drive 610. In some examples, hard-wired circuitry may be used in place of or in combination with software instructions for implementation.
The term “computer readable medium” refers to any tangible medium that participates in providing instructions to processor 604 for execution. Such a medium may take many forms, including but not limited to, non-volatile media and volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as disk drive 610. Volatile media includes dynamic memory, such as system memory 606.
Common forms of computer readable media includes, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
Instructions may further be transmitted or received using a transmission medium. The term “transmission medium” may include any tangible or intangible medium that is capable of storing, encoding or carrying instructions for execution by the machine, and includes digital or analog communications signals or other intangible medium to facilitate communication of such instructions. Transmission media includes coaxial cables, copper wire, and fiber optics, including wires that comprise bus 602 for transmitting a computer data signal.
In some examples, execution of the sequences of instructions may be performed by a single computer system 600. According to some examples, two or more computer systems 600, such as two or more nodes in a subset of nodes, can be coupled by communication link 620 (e.g., LAN, PSTN, or wireless network) to perform the sequence of instructions in coordination with one another. Computer system 600 may transmit and receive messages, data, and instructions, including program, i.e., application code, through communication link 620 and communication interface 612. Received program code may be executed by processor 604 as it is received, and/or stored in disk drive 610, or other non-volatile storage for later execution.
Although the foregoing examples have been described in some detail for purposes of clarity of understanding, the above-described inventive techniques are not limited to the details provided. There are many alternative ways of implementing the above-described invention techniques. The disclosed examples are illustrative and not restrictive.
Number | Name | Date | Kind |
---|---|---|---|
4603380 | Easton et al. | Jul 1986 | A |
6675214 | Stewart et al. | Jan 2004 | B2 |
6807619 | Ezra et al. | Oct 2004 | B1 |
6915302 | Christofferson et al. | Jul 2005 | B1 |
7269608 | Wong et al. | Sep 2007 | B2 |
7356651 | Liu et al. | Apr 2008 | B2 |
7571288 | Pudipeddi et al. | Aug 2009 | B2 |
7908436 | Srinivasan et al. | Mar 2011 | B1 |
8046446 | Karr et al. | Oct 2011 | B1 |
8117464 | Kogelnik | Feb 2012 | B1 |
8312471 | Davis | Nov 2012 | B2 |
8495288 | Hosoya et al. | Jul 2013 | B2 |
8732401 | Venkatesh et al. | May 2014 | B2 |
20020124137 | Ulrich et al. | Sep 2002 | A1 |
20030145045 | Pellegrino et al. | Jul 2003 | A1 |
20030188045 | Jacobson | Oct 2003 | A1 |
20040111443 | Wong et al. | Jun 2004 | A1 |
20040128470 | Hetzler et al. | Jul 2004 | A1 |
20050038850 | Oe et al. | Feb 2005 | A1 |
20050108440 | Baumberger et al. | May 2005 | A1 |
20050114595 | Karr et al. | May 2005 | A1 |
20050131900 | Palliyll et al. | Jun 2005 | A1 |
20060112251 | Karr et al. | May 2006 | A1 |
20060272015 | Frank et al. | Nov 2006 | A1 |
20070005935 | Khosravi et al. | Jan 2007 | A1 |
20070192534 | Hwang et al. | Aug 2007 | A1 |
20070248029 | Merkey et al. | Oct 2007 | A1 |
20070266037 | Terry et al. | Nov 2007 | A1 |
20080183986 | Yehia et al. | Jul 2008 | A1 |
20090063528 | Yueh | Mar 2009 | A1 |
20090063795 | Yueh | Mar 2009 | A1 |
20090089337 | Perlin et al. | Apr 2009 | A1 |
20090254507 | Hosoya et al. | Oct 2009 | A1 |
20090319772 | Singh et al. | Dec 2009 | A1 |
20100031000 | Flynn et al. | Feb 2010 | A1 |
20100064166 | Dubnicki et al. | Mar 2010 | A1 |
20100070725 | Prahlad et al. | Mar 2010 | A1 |
20100180153 | Jernigan, IV et al. | Jul 2010 | A1 |
20100181119 | Saigh et al. | Jul 2010 | A1 |
20100188273 | He et al. | Jul 2010 | A1 |
20100274772 | Samuels | Oct 2010 | A1 |
20100306444 | Shirley et al. | Dec 2010 | A1 |
20110035620 | Elyashev et al. | Feb 2011 | A1 |
20110055471 | Thatcher et al. | Mar 2011 | A1 |
20110071989 | Wilson et al. | Mar 2011 | A1 |
20110082836 | Wang et al. | Apr 2011 | A1 |
20110131390 | Srinivasan et al. | Jun 2011 | A1 |
20110145243 | Yudenfriend | Jun 2011 | A1 |
20110167045 | Okamoto | Jul 2011 | A1 |
20110196900 | Drobychev et al. | Aug 2011 | A1 |
20110265083 | Davis | Oct 2011 | A1 |
20110276781 | Sengupta et al. | Nov 2011 | A1 |
20110295914 | Mori | Dec 2011 | A1 |
20120016845 | Bates | Jan 2012 | A1 |
20120054445 | Swart et al. | Mar 2012 | A1 |
20120137054 | Sadri et al. | May 2012 | A1 |
20120159115 | Cha et al. | Jun 2012 | A1 |
20120254131 | Al Kiswany et al. | Oct 2012 | A1 |
20130013865 | Venkatesh et al. | Jan 2013 | A1 |
20130117494 | Hughes et al. | May 2013 | A1 |
20130124523 | Rogers et al. | May 2013 | A1 |
20130166831 | Atkisson et al. | Jun 2013 | A1 |
20130238876 | Fiske et al. | Sep 2013 | A1 |
20130282627 | Faddoul et al. | Oct 2013 | A1 |
20130283004 | Devine et al. | Oct 2013 | A1 |
Entry |
---|
PCT Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, or the Declaration for PCT Application No. PCT/US2013/076683, 8 pgs. (May 23, 2014). |
U.S. Appl. No. 13/269,525, Office Action, Mailed Jul. 26, 2013, 6 pages. |
U.S. Appl. No. 13/269,525, Final Office Action, Mailed Jan. 2, 2014, 9 pages. |
U.S. Appl. No. 13/269,525, Office Action, Mailed May 12, 2014, 8 pages. |
PCT/US2013/076704, International Search Report and Written Opinion, Mailed Aug. 22, 2014. |
U.S. Appl. No. 13/725,942, Office Action, Mailed Oct. 6, 2014, 7 pages. |
U.S. Appl. No. 13/725,942, Notice of Allowance, Mailed Feb. 25, 2015, 8 pages. |
U.S. Appl. No. 13/765,687, Office Action, Mailed Mar. 9, 2015, 21 pages. |
U.S. Appl. No. 13/765,689, Office Action, Mailed Oct. 1, 2014, 9 pages. |
U.S. Appl. No. 13/765,689, Final Office Action, Mailed Apr. 22, 2015, 12 pages. |
Number | Date | Country | |
---|---|---|---|
20140181236 A1 | Jun 2014 | US |