Virtual machines are software emulations of a machine, such as a computer, in which the software implementation is restricted within boundaries of the physical host computer. Conventionally, there are system virtual machines and process virtual machines. A system virtual machine emulates an entire system platform machine that includes an operating system, whereas a process virtual machine emulates a specific process. Regardless of the type of virtual machine, the emulated software is restricted to the resources provided by the virtual machine.
Generally, virtual machines enable a host computer to run multiple application environments (e.g., processes) or operating systems on the same computer simultaneously. The host computer allots a certain amount of the host's resources to each of the virtual machines in which each virtual machine uses such allotted resources to execute applications and processes (including operating systems). Typical virtual machines make use of virtual machine image files (e.g., virtual machine images) to store the desired application environment, operating system, and data related thereto. The virtual machine includes a virtual hard drive (VHD) as a typical virtual machine image. From the host's perspective, the VHD is a large file handled much like other files regardless of being associated with a virtual machine. Yet, from the virtual machine's perspective, the VHD is a full hard drive including data related to an operating system, processes, user information, and the like.
With the increase use and complexity of virtual machines, virtual machine images can become large in size (e.g., several gigabytes). Moreover, environments and hosts of virtual machines are rarely static in regards to allotted resources and storage location for images. For example, a virtual machine image may be moved from one storage location on a network to another storage location on the network. In other words, relocation of a storage location(s) for virtual machine images can be a resource intensive event based alone on the size of virtual image file size. Conventionally, the virtual machine image files are moved with lengthy and repetitive transfers, which tend to be costly in terms of system resources, among others.
The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed subject matter. This summary is not an extensive overview. It is not intended to identify key/critical elements or to delineate the scope of the claimed subject matter. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
Briefly described, the subject disclosure generally pertains to virtual machine image management. Virtual machine images can be evaluated to create a master image that includes shared data segments found in the virtual machine images. The master image can be generated based upon a peer pressure technique, offline machine learning techniques, runtime machine learning techniques, among others. For instance, the peer pressure technique can facilitate creating the master image by including common data segments found in a majority of the virtual machine image. In another example, the peer pressure technique enhances the generation of the master image by inclusion of an influential data segment identified within the virtual machine images. Further, a master image server can allow access to master images, templates to create master images, and additional virtual machine images for a larger sample set for peer pressure techniques.
To the accomplishment of the foregoing and related ends, certain illustrative aspects of the claimed subject matter are described herein in connection with the following description and the annexed drawings. These aspects are indicative of various ways in which the subject matter may be practiced, all of which are intended to be within the scope of the claimed subject matter. Other advantages and novel features may become apparent from the following detailed description when considered in conjunction with the drawings.
Details below are generally directed toward managing virtual machine images with a master image (e.g., golden image). Virtual machines often utilize numerous images that tend to require large amounts of storage space which make transitioning data from one location to another costly in regards to system resources. Managing these virtual machines and respective images can include migration of images, virtual machine load balancing, and virtual machine scaling. Conventional techniques often include repetitive and lengthy transfers for each virtual machine image based on the large sizes and quantities thereof. The above situation can be addressed by a master image for the virtual machine images. A master image is generated from identified segments of data that are common between the virtual machines. From these identified segments of data, a single instant of each segment of data is used to create the master image for the virtual machines. In one example, the master image includes a majority of data segments common between the virtual machine images, in order to optimize migration of images, virtual machine load balancing, and virtual machine scaling when such operations include a creation of a new virtual machine image and/or virtual machine.
Various aspects of the subject disclosure are now described in more detail with reference to the annexed drawings, wherein like numerals refer to like or corresponding elements throughout. It should be understood, however, that the drawings and detailed description relating thereto are not intended to limit the claimed subject matter to the particular form disclosed. Rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the claimed subject matter.
Referring initially to
The virtual machine image system 100 includes a generation component 110 that compares virtual machine images to create a master image. Specifically, the virtual machine image system 100 includes an evaluation component 120 that analyzes virtual machines and, in particular, virtual machine images. The evaluation component 120 can receive or collect virtual machine images from a virtual machine environment (e.g., a machine environment that includes or accesses virtual machines and virtual machine images). For example, a user can select a set or subset of virtual machine images to evaluate or selection can be automated. Upon manual or automatic selection of the virtual machine images, the evaluation component 120 compares data from each of the virtual machine images in order to identify commonalities or shared data segments. Specifically, the evaluation component 120 analyzes virtual machine images to extract common data segments from such virtual machine images.
Additionally, the virtual machine image system 100 includes a master component 130 that creates a master image (e.g., also referred to as a “golden image”) based upon the analysis of the evaluation component 120. As utilized herein, the term “master image” and “golden image” refer to a collection of data including data segments that are common between virtual machine images. Moreover, the master image can include data representative of a software program that can be executed within a virtual machine environment, and in particular, a virtual machine. It is to be appreciated that the master image can be any size (e.g., bytes, megabytes, gigabytes, etc.) and can include any type of data from any suitable source within the virtual machine environment. As stated, the master component 130 generates the master image by including a single instance of common data segments identified by the evaluation component 120. In other words, the master component 130 can monitor the identified common data segments and incorporate a single copy of each data segment into the master image. In particular, the generation component 110 and incorporated components (e.g., the evaluation component 120, the master component 130) can implement a peer pressure technique (discussed in more details below) in order to identify shared data segments in a majority of the virtual machine images.
As utilized herein, a virtual machine image includes any suitable data related to a virtual machine. By way of example and not limitation, a virtual machine image can include an operating system for a virtual machine, a process associated with a virtual machine, data related to an operating system for a virtual machine, data related to a process associated with a virtual machine, and the like. Moreover, a virtual machine image can include components/data required by all users of clients (e.g., installation files of a guest operating system, a web browser application, an antivirus application, an email application, etc.) and components specific to individual users (e.g., profiles, user specific applications, etc.). In addition, the virtual machine image can encompass data regardless of being stored on a remote virtual machine server, a local virtual hard drive (VHD), a remote VHD, a cloud-based server, a cloud-based virtual machine, a Platform as a service (PaaS) virtual machine, a PaaS VHD, a PaaS server, and the like.
By way of example and not limitation, a virtual machine environment may include a first group of virtual machines and a second group of virtual machines. The first group of virtual machines can be selected in which virtual machine images related thereto are evaluated in order to identify shared data segments existent between the virtual machine images (corresponding to the selected group of virtual machines). In other words, common data segments located on the virtual machine images can be collected and used to create a master image, wherein the master image includes a single instance of each common data segment. Once generated, the master image can be employed for migration of at least one of the virtual machines and/or virtual machine images within the first group (selected group of virtual machines). Moreover, the master image can be employed in the establishment of a new or updated virtual machine and/or virtual machine image.
The virtual machine image system 200 further includes a peer pressure component 210 that incorporates a peer pressure technique to facilitate creating a master image for a set of virtual machine images. As utilized herein, a peer pressure technique relates to any statistical analysis based on calculating a majority from a sample set and converging to a value or data that is identified as the majority. In other words, the peer pressure technique can provide a “power in numbers” analysis to identify shared data segments that exist within a majority or most of the virtual machine images. In another example, the peer pressure technique can relate to any statistical analysis to identify an influential data segment within the set of virtual machine images. In other words, the peer pressure technique can provide a “bully mentality” analysis to identify influential and high priority data segments that exist within the virtual machine images. In general, the system 200 can employ any suitable statistical peer pressure technique with the peer pressure component 210 in which the peer pressure technique enhances the master image by including the common data segments found within a majority of the virtual machine images or found to have an influence within the virtual machine images.
The generation component 110 can further include a trend component 310 that implements machine learning techniques in order to ascertain common data segments to include within a master image. Additionally, the trend component 310 facilitates migrating and creating virtual machines and/or virtual machine images (migration is discussed in more detail in
For instance, the trend component 310 and implemented machine learning techniques (e.g., offline and/or during runtime) can identify capacity or size of a virtual machine and/or virtual machine image. Based on the capacity or size of virtual machines and/or virtual machine images, the trend component 310 can ascertain a data size for a master image. By way of example and not limitation, a master image size can be identified based upon trend component 310 analysis (e.g., offline and/or during runtime). In another example, the trend component 310 can provide course level analysis, Operating System for Monitoring (OSM) details and application level settings (e.g., based upon known application details).
In still another example, the trend component 310 can employ machine learning to extract data from memory to facilitate identifying common data segments amongst virtual machine images, migrating virtual machine images, and creating new or updated virtual machines. From memory, the trend component 310 can analyze memory objects to identify security vulnerabilities. By way of example and not limitation, the identified security vulnerabilities can be a factor for migrating virtual machines and/or virtual machine images. Moreover, such security vulnerabilities and related data segments can be excluded from inclusion in a master image. Additionally, the trend component 310 can further employ time series analysis, model predicting, virtual machine capacity prediction, or the like.
The master image system 400 can include a rank component 402 that allows identified common data segments to be prioritized, wherein a higher priority can translate into a higher probability of inclusion with a master image. Conversely, a lower priority can translate into a higher probability of exclusion with a master image. The rank component 420 can receive priority data related to specific traits, characteristics, and/or metrics in which such data can be prioritized or de-prioritized. By way of example and not limitation, data segments associated with user profiles can be set as a higher priority than application data segments. In such example, user profile data segments that are common between the virtual machine images will be prioritized to be included in a master image over the application common data segments (as well as other data segments ranked lower than the user profile data segments).
The rank component 420 enables any data segment to be prioritized based on various characteristics. The data segments can be prioritized by the rank component 420 based upon characteristics such as, but not limited to, host virtual machine (e.g., which virtual machine is hosting the data segments), size on virtual machine image, size on VHD, percentage of commonality (e.g., how often the data segment occurs within the virtual machine images), data segment type (e.g., operating system data, user profile data, application data, etc.), host virtual machine location (e.g., local, remote, cloud-based, PaaS-based, etc.), process-based (e.g., application A data segments have priority over application B since application A is security application), operating system association (e.g., prioritize operating system data segments over other data segments), user-preference, and the like. It is to be appreciated that the rank component 420 can be a factor (e.g., not the sole factor) in constructing the master image with data segments. In other words, by way of example and not limitation, the rank component 420 enables a probability to increase for a data segment to be included in a generated master image. Yet, it is to be appreciated that the rank component 420 can be configured to enable a data segment to be prioritized to automatically be included in the master image for a set of virtual machine images.
The system 500 includes the generation component 110 that constructs a master image as discussed above. Moreover, the system 500 includes a master image server 510 (also referred to as MI server 510). The master image server 510 can be a local server or remote server in which clients can access master images 540 and/or virtual machine images. In general, the master image server 510 can be accessed by local clients and/or remote clients in order to upload, download, store, or view master images 540 and/or virtual machine images. By way of example and not limitation, the master image server 510 can be cloud-based and/or PaaS-based. Additionally, the master image server 510 allows access (with expressed permission from an owner) to master images 540 and/or virtual machine images from various users, clients, companies, and the like. Moreover, it is to be appreciated that for the sake of brevity, a single generation component 110 and/or master image is depicted in the system 500 but a plurality of master images, generation components, and/or clients (not shown) can access the master image server 510.
A master image created by the generation component 110 can be uploaded and stored to the master image server 510. It is to be appreciated that the master image server 510 can be an opt-in or opt-out service. Prior to accessing the master image server 510, an authentication component 520 employs security and authentication techniques. The authentication component 520 can utilize usernames, passwords, security question, cryptography, Human Interactive Proofs (HIPs), and the like. In general, the authentication component 520 provides a validated and secure connection for data communication. The authentication component 520 can further request permission to distribute and share any uploaded master images and/or virtual machine information.
The master image server 510 further includes a global peer pressure component 530. The global peer pressure component 530 expands the peer pressure technique discussed above in
As discussed briefly above, the master image server 510 can store master images 540 created from numerous virtual machine images and created from numerous virtual machine environments. The master images 540 can be viewed, transferred, downloaded, or the like. By way of example and not limitation, a master image can be downloaded and employed within a virtual machine environment. In particular, the master image can be invoked for a new or updated virtual machine. In another example, company A can create a master image 1 for a first set of virtual machine images and a master image 2 for a second set of machines, wherein the master image 1 and master image 2 are stored in the master image server 510. Additionally, company B can create a master image 3 for a set of virtual machine images in which the master image 3 is stored in the master image server 510. Following the above example, company B can leverage the master image 1 and/or the master image 2 in order to create a master image 4. Moreover, company B can invoke a global peer pressure technique that includes company B local virtual machine images but also company A virtual machine images (e.g., first set of virtual machine images and second set of virtual machine images).
Furthermore, the master image server 510 facilitates creation of a master image with the employment of a master image template 550 (also referred to as templates 550). The templates 550 can be a framework from which to create a master image for virtual machine images. The templates 550 can be based upon standardized characteristics for a particular virtual machine and/or virtual machine environment. For instance, a template for a master image can be business-based, company-based, or industry-based in which characteristics for the business, company and/or industry are identified and utilized to identify and include particular common data segments stored in a master image. In another example, a template can be based upon a type of operating system and/or process employed by the virtual machines. The templates 550 can be function-based in which a particular function includes characteristics to assist in identifying data segments to include in a master image. For example, a virtual machine environment related to accounting can create a master image for local virtual machine images based upon a template received from the master image server 510, wherein the template is an accounting-based template.
Referring to
The aforementioned systems, architectures, environments, and the like have been described with respect to interaction between several components. It should be appreciated that such systems and components can include those components or sub-components specified therein, some of the specified components or sub-components, and/or additional components. Sub-components could also be implemented as components communicatively coupled to other components rather than included within parent components. Further yet, one or more components and/or sub-components may be combined into a single component to provide aggregate functionality. The components may also interact with one or more other components not specifically described herein for the sake of brevity, but known by those of skill in the art.
Furthermore, as will be appreciated, various portions of the disclosed systems above and methods below can include or consist of artificial intelligence, machine learning, or knowledge or rule-based components, sub-components, processes, means, methodologies, or mechanisms (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines, classifiers . . . ). Such components, inter alia, can automate certain mechanisms or processes performed thereby to make portions of the systems and methods more adaptive as well as efficient and intelligent. By way of example and not limitation, the generation component 110 or one or more sub-components thereof can employ such mechanisms to efficiently determine or otherwise infer a set of common data segments amongst virtual machine images in order to create a master image.
In view of the exemplary systems described supra, methodologies that may be implemented in accordance with the disclosed subject matter will be better appreciated with reference to the flow charts of
If it is determined to connect to the MI server (e.g., “YES”), the method 900 continues to reference numeral 940. At reference numeral 940, a determination is made whether to employ a template. If a template is not implemented (e.g., “NO”), the methodology 900 continues to reference numeral 950 in which a master image is created for a plurality of virtual machine images. It is to be appreciated that the master image can be created with a global peer pressure technique (e.g., global peer pressure technique includes leveraging the majority of common data for virtual machine images included within the MI server) or a local peer pressure technique (e.g., local peer pressure technique includes leveraging the majority of common data for virtual machine images included locally—not within the MI server). Continuing with reference numeral 960, the master image is stored on the MI server. By way of example and not limitation, the stored master image can be employed as a potential template, source of a template, re-used by another company/user, and the like.
If the determination is to employ a template (e.g., “YES”), the method 900 continues to reference numeral 970. At reference numeral 970, a template is selected from the MI server based upon a matched environment. For example, a matched environment can be user-selected, machine-matched, industry-based, and/or any combination thereof. The template can provide metrics and characteristics related to potential common data segments to collect in order to generate the master image. At reference numeral 980, a master image is created for virtual machine images based upon the selected template. As discussed above, the master image can be created with a global peer pressure technique or a local peer pressure technique. In another example, a user-defined combination can be implemented between a global peer pressure technique and a local peer pressure technique in which a portion of the global virtual machine images are selected for inclusion in a hybrid peer pressure technique. At reference numeral 990, the master image is stored on the MI server. By way of example and not limitation, the stored master image can be employed as a potential template, source of a template, re-used by another company/user, and the like.
As used herein, the terms “component” and “system,” as well as forms thereof are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an instance, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computer and the computer can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
The word “exemplary” or various forms thereof are used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Furthermore, examples are provided solely for purposes of clarity and understanding and are not meant to limit or restrict the claimed subject matter or relevant portions of this disclosure in any manner. It is to be appreciated a myriad of additional or alternate examples of varying scope could have been presented, but have been omitted for purposes of brevity.
As used herein, the term “inference” or “infer” refers generally to the process of reasoning about or inferring states of the system, environment, and/or user from a set of observations as captured via events and/or data. Inference can be employed to identify a specific context or action, or can generate a probability distribution over states, for example. The inference can be probabilistic—that is, the computation of a probability distribution over states of interest based on a consideration of data and events. Inference can also refer to techniques employed for composing higher-level events from a set of events and/or data. Such inference results in the construction of new events or actions from a set of observed events and/or stored event data, whether or not the events are correlated in close temporal proximity, and whether the events and data come from one or several event and data sources. Various classification schemes and/or systems (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines . . . ) can be employed in connection with performing automatic and/or inferred action in connection with the claimed subject matter.
Furthermore, to the extent that the terms “includes,” “contains,” “has,” “having” or variations in form thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.
In order to provide a context for the claimed subject matter,
While the above disclosed system and methods can be described in the general context of computer-executable instructions of a program that runs on one or more computers, those skilled in the art will recognize that aspects can also be implemented in combination with other program modules or the like. Generally, program modules include routines, programs, components, data structures, among other things that perform particular tasks and/or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the above systems and methods can be practiced with various computer system configurations, including single-processor, multi-processor or multi-core processor computer systems, mini-computing devices, mainframe computers, as well as personal computers, hand-held computing devices (e.g., personal digital assistant (PDA), phone, watch . . . ), microprocessor-based or programmable consumer or industrial electronics, and the like. Aspects can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. However, some, if not all aspects of the claimed subject matter can be practiced on stand-alone computers. In a distributed computing environment, program modules may be located in one or both of local and remote memory storage devices.
With reference to
The processor(s) 1020 can be implemented with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any processor, controller, microcontroller, or state machine. The processor(s) 1020 may also be implemented as a combination of computing devices, for example a combination of a DSP and a microprocessor, a plurality of microprocessors, multi-core processors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The computer 1010 can include or otherwise interact with a variety of computer-readable media to facilitate control of the computer 1010 to implement one or more aspects of the claimed subject matter. The computer-readable media can be any available media that can be accessed by the computer 1010 and includes volatile and nonvolatile media and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media.
Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to memory devices (e.g., random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM) . . . ), magnetic storage devices (e.g., hard disk, floppy disk, cassettes, tape . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), and solid state devices (e.g., solid state drive (SSD), flash memory drive (e.g., card, stick, key drive . . . ) . . . ), or any other medium which can be used to store the desired information and which can be accessed by the computer 1010.
Communication media typically embodies computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media.
Memory 1030 and mass storage 1050 are examples of computer-readable storage media. Depending on the exact configuration and type of computing device, memory 1030 may be volatile (e.g., RAM), non-volatile (e.g., ROM, flash memory . . . ) or some combination of the two. By way of example, the basic input/output system (BIOS), including basic routines to transfer information between elements within the computer 1010, such as during start-up, can be stored in nonvolatile memory, while volatile memory can act as external cache memory to facilitate processing by the processor(s) 1020, among other things.
Mass storage 1050 includes removable/non-removable, volatile/non-volatile computer storage media for storage of large amounts of data relative to the memory 1030. For example, mass storage 1050 includes, but is not limited to, one or more devices such as a magnetic or optical disk drive, floppy disk drive, flash memory, solid-state drive, or memory stick.
Memory 1030 and mass storage 1050 can include, or have stored therein, operating system 1060, one or more applications 1062, one or more program modules 1064, and data 1066. The operating system 1060 acts to control and allocate resources of the computer 1010. Applications 1062 include one or both of system and application software and can exploit management of resources by the operating system 1060 through program modules 1064 and data 1066 stored in memory 1030 and/or mass storage 1050 to perform one or more actions. Accordingly, applications 1062 can turn a general-purpose computer 1010 into a specialized machine in accordance with the logic provided thereby.
All or portions of the claimed subject matter can be implemented using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to realize the disclosed functionality. By way of example and not limitation, the generation component 110 can be, or form part, of an application 1062, and include one or more modules 1064 and data 1066 stored in memory and/or mass storage 1050 whose functionality can be realized when executed by one or more processor(s) 1020, as shown.
In accordance with one particular embodiment, the processor(s) 1020 can correspond to a system-on-a-chip (SOC) or like architecture including, or in other words integrating, both hardware and software on a single integrated circuit substrate. Here, the processor(s) 1020 can include one or more processors as well as memory at least similar to processor(s) 1020 and memory 1030, among other things. Conventional processors include a minimal amount of hardware and software and rely extensively on external hardware and software. By contrast, an SOC implementation of processor is more powerful, as it embeds hardware and software therein that enable particular functionality with minimal or no reliance on external hardware and software. For example, the generation component 110, and/or associated functionality can be embedded within hardware in a SOC architecture.
The computer 1010 also includes one or more interface components 1070 that are communicatively coupled to the system bus 1040 and facilitate interaction with the computer 1010. By way of example, the interface component 1070 can be a port (e.g., serial, parallel, PCMCIA, USB, FireWire . . . ) or an interface card (e.g., sound, video . . . ) or the like. In one example implementation, the interface component 1070 can be embodied as a user input/output interface to enable a user to enter commands and information into the computer 1010 through one or more input devices (e.g., pointing device such as a mouse, trackball, stylus, touch pad, keyboard, microphone, joystick, game pad, satellite dish, scanner, camera, other computer . . . ). In another example implementation, the interface component 1070 can be embodied as an output peripheral interface to supply output to displays (e.g., CRT, LCD, plasma . . . ), speakers, printers, and/or other computers, among other things. Still further yet, the interface component 1070 can be embodied as a network interface to enable communication with other computing devices (not shown), such as over a wired or wireless communications link.
What has been described above includes examples of aspects of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations of the disclosed subject matter are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications, and variations that fall within the spirit and scope of the appended claims.