The present disclosure is generally related to storage systems, and more specifically, to systems and methods for placing a volume in a storage system that is closer to the application container (hereafter referred to as “container”).
Container implementations have become utilized to facilitate an agile and flexible application execution platform. Container is a type of virtualization technology. Previously, containers have been used for stateless applications which do not store data in the data storage system. In the related art, the use of container has expanded to stateful applications which store data to data storage systems. To deploy and manage the containerized application on the server cluster, container orchestration systems such as Kubernetes are used.
The location of the container among the servers in the server cluster is dynamically decided by the orchestration system when the container is launched.
In related art implementations, there are systems that automatically switch the access path from the previous container location to the new container location. This mechanism keeps data accessible from containers even if containers are moved between servers.
In a related art implementation, there is a data migration mechanism between servers. When a virtual machine (VM) using a data volume is moved between servers in the server cluster, this mechanism moves the volume to a storage system that is closer to the destination server of a VM. Such mechanisms prevent performance degradation of data access due to worse communication latency between a server and a storage.
Related art implementations may cause performance degradation if the container is located in a server that is far from a storage system managing a data volume used by the container. Further, the mechanisms of the related art may cause an unnecessary migration of data volumes. Although the VM can run for a long time, the variance of the running time of the container can be large. Some containers run in a short time. If the running time is short, data migration is unnecessary because the container terminate soon and may be located to another server at next execution.
Aspects of the present disclosure can involve a method for storage management in conjunction with computing unit management in a system having a plurality of servers and a plurality of storage systems, the method involving, for a request of creating a new volume or attaching an existing volume for a computing unit to be launched: determining, from configuration information, one or more servers from the plurality of servers to which the computing unit is to be launched; estimating performance for each storage system among at least a subset of the plurality of storage systems connected to the determined one or more servers; selecting a storage system from the at least the subset of the plurality of storage systems connected to the determined one or more servers based on the estimated performance; and creating a new volume or migrating an existing volume to the selected storage system.
Aspects of the present disclosure can involve a non-transitory computer readable medium, storing instructions for storage management in conjunction with computing unit management in a system having a plurality of servers and a plurality of storage systems, the instructions involving, for a request of creating a new volume or attaching an existing volume for a computing unit to be launched: determining, from configuration information, one or more servers from the plurality of servers to which the computing unit is to be launched; estimating performance for each storage system among at least a subset of the plurality of storage systems connected to the determined one or more servers; selecting a storage system from the at least the subset of the plurality of storage systems connected to the determined one or more servers based on the estimated performance; and creating a new volume or migrating an existing volume to the selected storage system.
Aspects of the present disclosure can involve a system having a plurality of servers and a plurality of storage systems, the system involving, for a request of creating a new volume or attaching an existing volume for a computing unit to be launched: means for determining, from configuration information, one or more servers from the plurality of servers to which the computing unit is to be launched; means for estimating performance for each storage system among at least a subset of the plurality of storage systems connected to the determined one or more servers; means for selecting a storage system from the at least the subset of the plurality of storage systems connected to the determined one or more servers based on the estimated performance; and means for creating a new volume or migrating an existing volume to the selected storage system.
Aspects of the present disclosure can involve a system involving a plurality of servers and a plurality of storage systems, the system involving, for a request for volume creation for a container to be launched, means for determining, from configuration information, one or more servers from the plurality of server to which the container is likely to be launched; means for selecting a storage system from the plurality of storage systems that is closest to the one or more servers to which the container is likely to be launched; means for creating a volume on the selected storage system; and means for launching the container.
Aspects of the present disclosure can involve an apparatus for a system involving a plurality of servers and a plurality of storage systems, the apparatus involving a processor, configured to, for a request for volume creation for a container to be launched, determine, from configuration information, one or more servers from the plurality of server to which the container is likely to be launched; select a storage system from the plurality of storage systems that is closest to the one or more servers to which the container is likely to be launched; create a volume on the selected storage system; and launch the container.
The following detailed description provides details of the figures and embodiments of the present application. Reference numerals and descriptions of redundant elements between figures are omitted for clarity. Terms used throughout the description are provided as examples and are not intended to be limiting. For example, the use of the term “automatic” may involve fully automatic or semi-automatic implementations involving user or administrator control over certain aspects of the implementation, depending on the desired implementation of one of ordinary skill in the art practicing implementations of the present application. Selection can be conducted by a user through a user interface or other input means, or can be implemented through a desired algorithm. Embodiments as described herein can be utilized either singularly or in combination and the functionality of the embodiments can be implemented through any means according to the desired implementations.
Further, in the following description, the information is expressed in a table format, but the information may be expressed in any data structure. Further, in the following description, a configuration of each information is an example, and one table may be divided into two or more table or a part or all of two or more tables may be one table.
In a first embodiment, there is a volume creation method that creates the volume in the storage system closest to the servers on which the container is likely to be launched.
Then, the container orchestrator requests the storage orchestrator to create a volume. Upon receiving the request, the storage orchestrator selects a storage system that is closest to the group of servers that are likely to be selected for the container placement. This decision is done based on the affinity setting of the container, the server configuration information and the storage configuration information. After selecting the storage, the storage orchestrator requests the storage system to create the volume. Upon receiving the request, the storage system creates the volume. After creation of the volume, the container orchestrator launches the container with attaching the created volume.
According to this embodiment, the volumes are created in the storage systems that is the closest to the group of servers likely to be selected for container placement. This means that volumes are created in the storage systems which has the highest probability that the storages are closest from containers. Therefore, even if container is launched on another server at next launch, distance between a container and a volume can be expected to keep close.
The container orchestrator and storage orchestrator may be executed on a specific server of a plurality of servers. They may also be executed on multiple server of a plurality of servers.
For 503 and 504, a loop is initiated to parse through each server in the list of extracted servers. At 503, the storage orchestrator checks for the closest storage from each server in the list. Storage orchestrator selects the closest storage to the server. In the example of processing based on zone configuration, the storage orchestrator measures distance based on zones that servers and storage systems belongs to. For example, the storage located in the same zone as the server can be considered to be the closest storage. This processing may be implemented based on the relationships of the network connections. In this case, storage orchestrator measures distance as the number of switches which are used when servers communicate with each storage.
At 504, the counter score for the selected storage is incremented by 1. At 505, the storage having the highest counter score is selected for the volume creation. At 506, the storage orchestrator transmits a request to the selected storage having the highest counter score to create the volume. At 507, the storage creates the volume according to the request.
In a second embodiment, there is a method of determining necessity of volume migration based on estimated running time of container. Hereinafter, a difference with the first embodiment will be mainly described, and the description of the points that are in common with the first embodiment will be simplified or omitted.
In the second embodiment illustrated at
At 1001, a determination is made as to whether the container location is changed. This can be determined by comparison between container location at last time recorded in container location information and the location this time. If so (Yes), then an estimate is conducted to determine the running time of the container at 1002 and an estimate of the migration time of the volume 1003 is also conducted. In an example of an estimation of the running time of the container 1002, the average running time for the past executions of the container can be used, or other estimations (e.g., preset server functions that measure running time) can also be utilized in accordance with the desired embodiment. In an example of estimation for the migration time, one example formula can be:
Estimated migration time=(volume size)/(migration band width)
If the container uses multiple volumes, one example formula can be:
Estimated migration time=(total size of each volumes)/(migration band width)
However, other estimations of migration time can be utilized (e.g. preset server functions for indicating expected migration time) in accordance with the desired embodiment.
At 1004, based on the estimations, a determination is made as to whether the estimated running time of the container is longer than the estimated migration time of the volume. If not so (no), then the process proceeds to 1015 and determines that migration is unnecessary. If so (Yes), then the process proceeds to 1005. For this determination, estimated running time of the container may be compared with a threshold for migration time of the volume. The weighted estimated migration time of the volume can be used as the threshold. The weight is a positive and bigger than 1. In this example, the storage orchestrator can determine whether estimated running time of the container is sufficiently long enough than the estimated migration time of the volume. As another example of the threshold, the predetermined value plus estimated migration time can be used.
At 1005, the N/W latency between the server and the current owner storage is measured. At 1006, an estimation of the I/O latency when migration is not applied is conducted. In an example of the estimation of the I/O latency when migration is not applied, one example formula can be:
Estimated I/O latency=(average I/O latency)+{(N/W latency A)−(N/W latency B)}
In the example formula, N/W latency A is the round-trip time between the server that is new location and current volume owner storage. This latency was measured at 1005. N/W latency B is the round-trip time between the server that is last location and current volume owner storage. Such latency is measured at the last execution and recorded in volume information.
At 1007, a determination is made as to whether the estimated I/O latency when migration is not applied is worse than the average I/O latency measured at the last execution. If so (Yes), then it is determined that migration is necessary. For this determination, we compare the estimated I/O latency when migration is not applied with a determined latency threshold (e.g., weighted average I/O latency, latency measured at last execution, etc. in accordance with the desired implementation) determined or measured at the last execution in order to determine whether I/O latency is significantly worse. In this case, the weight is positive number which is equal to or bigger than 1.
To determine a migration destination, the storage orchestrator executes a loop to parse each storage system managed by the system, and then measures the N/W latency between the server that is a new location of the container and each storage system at 1008 and then estimates I/O latency when migration is applied 1009. An example of an estimation method of I/O latency can be as follows:
Estimated I/O latency=(average I/O latency)+{(N/W latency C)−(N/W latency B)}
N/W latency C is the round-trip time between the server that is a new location of the container and the migration destination storage. This latency is measured during the current execution of the container.
To select a storage system for the destination, storage orchestrator conducts the following. First, a determination is made as to whether the estimated I/O latency for the storage is sufficiently short enough than when migration is not applied and whether the estimated I/O latency is the shortest among all storage systems. That is, at 1010, a determination is made as to whether the estimated I/O latency when migration is applied is less than the average I/O latency as normalized with a weight. In this case, the weight is positive number which is equal to or less than 1. This is a example of determination whether estimated I/O latency for the storage is sufficiently short enough. Another method can be applied. For example, a determination is made as to whether the estimated I/O latency when migration is applied is improved more than threshold. If so, (Yes) then the flow proceeds to 1011 to determine whether the estimated I/O latency when migration is applied is less than estimated I/O latency when volume is migrated to a current candidate. At first round of this loop processing, the estimated I/O latency when migration is not applied is used as the Estimated I/O latency when volume is migrated to a current candidate. a skip of 1011 (treat as true) can be also applied at first round of this loop. If both conditions are met, then the storage system is selected as a candidate destination of the migration at 1012.
At 1013, a determination is made as to whether there is a candidate of the migration. If not so (no), the process proceeds to 1015 and determines that migration is unnecessary. If so (yes), at 1014, the process returns an indication that migration is necessary and returns the candidate destination storage system for the container.
Through the embodiments described herein, the performance of data access in the application platform can be improved based on the container.
Computer device 1105 can be communicatively coupled to input/user interface 1135 and output device/interface 1140. Either one or both of input/user interface 1135 and output device/interface 1140 can be a wired or wireless interface and can be detachable. Input/user interface 1135 may include any device, component, sensor, or interface, physical or virtual, that can be used to provide input (e.g., buttons, touch-screen interface, keyboard, a pointing/cursor control, microphone, camera, braille, motion sensor, optical reader, and/or the like). Output device/interface 1140 may include a display, television, monitor, printer, speaker, braille, or the like. In some embodiments, input/user interface 1135 and output device/interface 1140 can be embedded with or physically coupled to the computer device 1105. In other embodiments, other computer devices may function as or provide the functions of input/user interface 1135 and output device/interface 1140 for a computer device 1105.
Examples of computer device 1105 may include, but are not limited to, highly mobile devices (e.g., smartphones, devices in vehicles and other machines, devices carried by humans and animals, and the like), mobile devices (e.g., tablets, notebooks, laptops, personal computers, portable televisions, radios, and the like), and devices not designed for mobility (e.g., desktop computers, other computers, information kiosks, televisions with one or more processors embedded therein and/or coupled thereto, radios, and the like).
Computer device 1105 can be communicatively coupled (e.g., via I/O interface 1125) to external storage 1145 and network 1150 for communicating with any number of networked components, devices, and systems, including one or more computer devices of the same or different configuration. Computer device 1105 or any connected computer device can be functioning as, providing services of, or referred to as a server, client, thin server, general machine, special-purpose machine, or another label.
I/O interface 1125 can include, but is not limited to, wired and/or wireless interfaces using any communication or I/O protocols or standards (e.g., Ethernet, Fibre Channel, SCSI, 802.11x, Universal System Bus, WiMax, modem, a cellular network protocol, and the like) for communicating information to and/or from at least all the connected components, devices, and network in computing environment 1100. Network 1150 can be any network or combination of networks (e.g., the Internet, local area network, wide area network, a telephonic network, a cellular network, satellite network, and the like).
Computer device 1105 can use and/or communicate using computer-usable or computer-readable media, including transitory media and non-transitory media. Transitory media include transmission media (e.g., metal cables, fiber optics), signals, carrier waves, and the like. Non-transitory media include magnetic media (e.g., disks and tapes), optical media (e.g., CD ROM, digital video disks, Blu-ray disks), solid state media (e.g., RAM, ROM, flash memory, solid-state storage), and other non-volatile storage or memory.
Computer device 1105 can be used to implement techniques, methods, applications, processes, or computer-executable instructions in some example computing environments. Computer-executable instructions can be retrieved from transitory media and stored on and retrieved from non-transitory media. The executable instructions can originate from one or more of any programming, scripting, and machine languages (e.g., C, C++, C#, Java, Visual Basic, Go, Python, Perl, JavaScript, and others).
Memory 1115 can be configured to store program such as Operating System(OS), Hypervisor, and applications including a container orchestrator, a storage orchestrator, and containers.
Memory 1115 can also be configured to store and manage configuration information such as illustrated in
Processor(s) 1110 can be in the form of physical hardware processors (e.g., Central Processing Units (CPUs), field-programmable gate array (FPGA), application-specific integrated circuit (ASIC)) or a combination of software and hardware processors.
Processor(s) 1110 can fetch execute programs which are stored in memory 1115. When processor(s) 1110 execute programs, processor(s) 1110 fetch instructions in the programs from memory 1115 and execute them. When processor(s) 1110 execute programs, processor can load information such as illustrated im
One or more applications executed on processor(s) 1110 can include logic unit 1160, application programming interface (API) unit 1165, input unit 1170, output unit 1175, and inter-unit communication mechanism 1195 for the different units to communicate with each other, with the OS, and with other applications (not shown). The described units and elements can be varied in design, function, configuration, or implementation and are not limited to the descriptions provided.
As illustrated in
As illustrated in
Processor(s) 1110 can be configured to select the storage system from the plurality of storage system by, for each of the one or more servers, selecting a closest storage system to the each of the one or more servers as illustrated at 503 and 504 of
Processor(s) 1110 can be configured, to for a detection of a change in location to the launched container as illustrated at 1001 of
Processor(s) 1110 can be configured to determine whether the migration of the volume is necessary by estimating a running time of the launched container as illustrated at 1002 of
Processor(s) 1110 can be configured to determining whether the migration of the volume is necessary by determining whether the I/O latency significantly worsens when the migration is not applied based on the configuration information by determining the latency between the server from the plurality of servers which the container will be launched on and the storage system which currently has the volume as illustrated at 1005 of
Processor(s) 1110 can be configured to select another storage system from the plurality of storage systems for the migration of the volume based on the estimated I/O latency with the migration and the latency between the server from the plurality of servers managing the launched container and the selected storage system by selecting the another storage system from the plurality of storage systems having the estimated I/O latency with the migration being less than a weighted average latency derived from the latency between the server from the plurality of servers managing the launched container and the selected storage system and being less than a current minimum latency as illustrated at 1010 to 1015 of
Although the example implementations described herein is described with respect to container management, other types of computing unit management can also be facilitated by the example implementations described herein. For example, other computing units can include virtual machines (VMs), application programs, programs, processes, and jobs facilitated by the servers and storage systems in accordance with the desired implementation.
Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations within a computer. These algorithmic descriptions and symbolic representations are the means used by those skilled in the data processing arts to convey the essence of their innovations to others skilled in the art. An algorithm is a series of defined steps leading to a desired end state or result. In embodiments, the steps carried out require physical manipulations of tangible quantities for achieving a tangible result.
Unless specifically stated otherwise, as apparent from the discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining,” “displaying,” or the like, can include the actions and processes of a computer system or other information processing device that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system's memories or registers or other information storage, transmission or display devices.
Embodiments may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may include one or more general-purpose computers selectively activated or reconfigured by one or more computer programs. Such computer programs may be stored in a computer readable medium, such as a computer-readable storage medium or a computer-readable signal medium. A computer-readable storage medium may involve tangible mediums such as, but not limited to optical disks, magnetic disks, read-only memories, random access memories, solid state devices and drives, or any other types of tangible or non-transitory media suitable for storing electronic information. A computer readable signal medium may include mediums such as carrier waves. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Computer programs can involve pure software implementations that involve instructions that perform the operations of the desired implementation.
Various general-purpose systems may be used with programs and modules in accordance with the examples herein, or it may prove convenient to construct a more specialized apparatus to perform desired method steps. In addition, the embodiments are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the embodiments as described herein. The instructions of the programming language(s) may be executed by one or more processing devices, e.g., central processing units (CPUs), processors, or controllers.
As is known in the art, the operations described above can be performed by hardware, software, or some combination of software and hardware. Various aspects of the embodiments may be implemented using circuits and logic devices (hardware), while other aspects may be implemented using instructions stored on a machine-readable medium (software), which if executed by a processor, would cause the processor to perform a method to carry out implementations of the present application. Further, some embodiments of the present application may be performed solely in hardware, whereas other embodiments may be performed solely in software. Moreover, the various functions described can be performed in a single unit, or can be spread across a number of components in any number of ways. When performed by software, the methods may be executed by a processor, such as a general purpose computer, based on instructions stored on a computer-readable medium. If desired, the instructions can be stored on the medium in a compressed and/or encrypted format.
Moreover, other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the teachings of the present application. Various aspects and/or components of the described embodiments may be used singly or in any combination. It is intended that the specification and embodiments be considered as examples only, with the true scope and spirit of the present application being indicated by the following claims.