A virtualization software suite for implementing and managing virtual infrastructures in a virtualized computing environment may include (1) a hypervisor that implements virtual machines (VMs) on one or more physical hosts, (2) a virtual storage area network (e.g., vSAN) software that aggregates local storage resources to form a shared datastore for a vSAN cluster of hosts, and (3) a management server software that centrally provisions and manages virtual datacenters, VMs, hosts, clusters, datastores, and virtual networks. For illustration purposes only, one example of the vSAN may be VMware vSAN™. The vSAN software may be implemented as part of the hypervisor software.
The vSAN software uses the concept of a disk group as a container for solid-state drives (SSDs) and non-SSDs, such as hard disk drives (HDDs). On each host (node) in a vSAN cluster, local drives are organized into one or more disk groups. Each disk group includes one SSD that serves as a read cache and write buffer (e.g., a cache tier), and one or more SSDs or non-SSDs that serve as permanent storage (e.g., a capacity tier). The disk groups from all nodes in the vSAN cluster may be aggregated to form a vSAN datastore distributed and shared across the nodes in the vSAN cluster.
The vSAN software stores and manages data in the form of data containers called objects. An object is a logical volume that has its data and metadata distributed across the vSAN cluster. For example, every virtual machine disk (VMDK) is an object, as is every snapshot. For namespace objects, the vSAN software leverages virtual machine file system (VMFS) as the file system to store files within the namespace objects. A virtual machine (VM) is provisioned on a vSAN datastore as a VM home namespace object, which stores metadata files of the VM including descriptor files for the VM's VMDKs.
Storage capacity planning is critical in a hyper-converged Infrastructure (HCI) environment. Generally, a user usually takes months to complete a procurement process to add new storage resources or remove failed storage resources for a vSAN cluster in the HCI environment. Therefore, without proper storage capacity planning, the vSAN cluster may exceed a storage capacity threshold before the new storage resources have obtained, which will affect the overall performance of the HCI environment such as performance downgrades, upgrade failures or service interruptions. In addition, given complicated storage activities (e.g., storage policies that are applied or going to apply, workload patterns, etc.) in the HCI environment, storage capacity planning in the HCI environment becomes more challenging.
In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented here. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the Figures, can be arranged, substituted, combined, and designed in a wide variety of different configurations, all of which are explicitly contemplated herein.
In some embodiments, on-site system 120 includes one or more virtual storage area network (e.g., vSAN) clusters. On-site system 120 may include any number of vSAN clusters. For illustration purposes only, on-site system 120 includes vSAN cluster 130.
In some embodiments, vSAN cluster 130 includes management entity 131. Management entity 131 is configured to manage vSAN cluster 130. Management entity 131 further includes cluster-specific storage capacity usage data collection module 132, training data preprocessing module 133, cluster-specific model training module 134 and cluster-specific storage capacity planning module 135.
In some embodiments, vSAN cluster 130 further includes one or more hosts 136(1) . . . 136(n). Each host of hosts 136(1) . . . 136(n) includes suitable hardware, which includes any suitable components, such as processor (e.g., central processing unit (CPU)); memory (e.g., random access memory); network interface controllers (NICs) to provide network connection; storage controller that provides access to storage resources provided by each host of the first set of hosts. The storage resource may represent one or more disk groups. In practice, each disk group represents a management construct that combines one or more physical disks, such as hard disk drive (HDD), solid-state drive (SSD), solid-state hybrid drive (SSHD), peripheral component interconnect (PCI) based flash storage, serial advanced technology attachment (SATA) storage, serial attached small computer system interface (SAS) storage, Integrated Drive Electronics (IDE) disks, Universal Serial Bus (USB) storage, etc.
Through storage virtualization, hosts 136(1) . . . 136(n) aggregate their respective local storage resources to form shared datastore 137 in vSAN cluster 130. Data stored in shared datastore 137 may be placed on, and accessed from, one or more of storage resources provided by any host of hosts 136(1) . . . 136(n).
In some embodiments, process 200 may begin with step 201. In conjunction with
In some embodiments, step 201 may be followed by step 202. In conjunction with
In some embodiments, step 202 may be followed by step 203. In conjunction with
In some embodiments, step 203 may be followed by step 204. In conjunction with
In some embodiments, step 204 may be followed by step 205. In conjunction with
In some embodiments, step 205 may be followed by step 206. In conjunction with
In some embodiments, step 206 may be followed by step 207. In conjunction with
In some embodiments, step 207 may be followed by step 208. In conjunction with
In some embodiments, step 208 may be followed by step 209. In conjunction with
In some embodiments, step 209 may be followed by step 210. In conjunction with
In some embodiments, step 210 may be followed by step 211. Cluster-specific storage capacity planning module 135 is configured to generate a prediction of storage capacity usage of cluster 130 based on the retrieved storage capacity usage data of cluster 130. In some embodiments, the prediction is an output of the trained machine learning model specific to cluster 130.
In some embodiments, step 211 may be followed by step 212. In conjunction with
Process 300 may begin with block 310 “remove storage capacity usage data of invalid cluster”. In some embodiments, in conjunction with
In some embodiments, a cluster which provides its storage capacity usage data to historical storage capacity usage data collection server 111 less than a number of days annually is determined to be an invalid cluster. For example, an invalid cluster can be a cluster providing its storage capacity usage data less than 180 days annually.
In some other embodiments, a cluster failing to provide any of its storage capacity usage data to historical storage capacity usage data collection server 111 within a threshold time period is determined to be an invalid cluster. For example, an invalid cluster can be a cluster failing to provide any of its storage capacity usage data in the past 30 days.
In some yet other embodiments, a cluster failing to provide any of its storage capacity usage data to historical storage capacity usage data collection server 111 for a consecutive threshold time period is determined to be an invalid cluster. For example, an invalid cluster can be a cluster failing to provide any of its storage capacity usage data for consecutive 15 days.
Process 300 may be followed by block 320 “remove spike storage capacity usage data”. In some embodiments, in conjunction with
In some embodiments, assuming a time-series storage capacity usage data of [105, 104, 103, 150, 101, 100]. 105 represents 105 terabytes (TB) of storage capacity usage of a cluster on Day 1, 104 represents 104 TB of storage capacity usage of the cluster on Day 2, 103 represents 103 TB of storage capacity usage of the cluster on Day 3, 150 represents 150 TB of storage capacity usage of the cluster on Day 4, 101 represents 101 TB of storage capacity usage of the cluster on Day 5 and 100 represents 100 TB of storage capacity usage of the cluster on Day 6. In conjunction with
In some embodiments, training data preprocessor 112 is configured to calculate a “total difference” associated with the time-series storage capacity usage data. The “total difference” may be an absolute value of a difference between the last number (i.e., 100) of the time-series storage capacity usage data and the first number (i.e., 105) of time-series storage capacity usage data. Therefore, the “total difference” associated with the time-series storage capacity usage data is |100-105|=5.
In some embodiments, training data preprocessor 112 is configured to calculate a set of “range difference” for a data in the time-series storage capacity usage data according to a “range length.” For example, assuming the “range length” is 3, training data preprocessor 112 is configured to calculate a first set of “range difference” of |104-105|, |103-105| and |150-105| for the first number 105 in the time-series storage capacity usage data. Similarly, training data preprocessor 112 is also configured to calculate a second set of “range difference” of |103-104|, |150-104| and |101-104| for the second number 104 in the time-series storage capacity usage data and a third set of “range difference” of |150-103|, |101-103| and | 100-103| for the third number 103 in the time-series storage capacity usage data. In some embodiments, training data preprocessor 112 is configured to determine a spike exists in response to a “range difference” is greater than the “total difference”. Accordingly, training data preprocessor 112 is configured to determine a first spike exists in response to that |150-105| greater than the total difference of 5, a second spike exists in response to that |150-104| greater than the total difference of 5 and a third spike exists in response to that | 150-103| greater than the total difference of 5. In some embodiments, in response to the number 150 is associated with all of the first spike, second spike and third spike, training data preprocessor 112 is configured to determine that number 150 is a spike data in the time-series storage capacity usage data and remove number 150 from the time-series storage capacity usage data for further processing.
Process 300 may be followed by block 330 “normalize storage capacity usage data”. In some embodiments, in conjunction with
Following the example time-series storage capacity usage data above, in some embodiments, at block 330, in conjunction with
In some embodiments, training data preprocessor 112 is configured to identify the maximum and the minimum values from the time-series storage capacity usage data of [105, 104, 103, 101, 100]. Therefore, the maximum value is 105 and the minimum value is 100. In some embodiments, training data preprocessor 112 is configured to normalize a value X in the time-series storage capacity usage data based on the following equation: normalized
Accordingly, the time-series storage capacity usage data is normalized as
which is [1, 0.8, 0.6, 0.2, 0].
In some embodiments, in conjunction with
In some embodiments, in conjunction with
The above examples can be implemented by hardware (including hardware logic circuitry), software or firmware or a combination thereof. The above examples may be implemented by any suitable computing device, computer system, etc. The computer system may include processor(s), memory unit(s) and physical NIC(s) that may communicate with each other via a communication bus, etc. The computer system may include a non-transitory computer-readable medium having stored thereon instructions or program code that, when executed by the processor, cause the processor to perform process(es) described herein with reference to
In some implementations, signal bearing medium 404 may encompass a non-transitory computer readable medium 408, such as, but not limited to, a solid-state drive, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, memory, etc. In some implementations, signal bearing medium 404 may encompass a recordable medium 410, such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, etc. In some implementations, signal bearing medium 404 may encompass a communications medium 406, such as, but not limited to, a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.). Computer program product 400 may be recorded on non-transitory computer readable medium 408 or another similar recordable medium 410.
The techniques introduced above can be implemented in special-purpose hardwired circuitry, in software and/or firmware in conjunction with programmable circuitry, or in a combination thereof. Special-purpose hardwired circuitry may be in the form of, for example, one or more application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), and others. The term ‘processor’ is to be interpreted broadly to include a processing unit, ASIC, logic unit, or programmable gate array etc.
The foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, flowcharts, and/or examples. Insofar as such block diagrams, flowcharts, and/or examples contain one or more functions and/or operations, it will be understood by those within the art that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or any combination thereof.
Those skilled in the art will recognize that some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computing systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware would be well within the skill of one of skill in the art in light of this disclosure.
Software and/or to implement the techniques introduced here may be stored on a non-transitory computer-readable storage medium and may be executed by one or more general-purpose or special-purpose programmable microprocessors. A “computer-readable storage medium”, as the term is used herein, includes any mechanism that provides (i.e., stores and/or transmits) information in a form accessible by a machine (e.g., a computer, network device, personal digital assistant (PDA), mobile device, manufacturing tool, any device with a set of one or more processors, etc.). A computer-readable storage medium may include recordable/non recordable media (e.g., read-only memory (ROM), random access memory (RAM), magnetic disk or optical storage media, flash memory devices, etc.).
The drawings are only illustrations of an example, wherein the units or procedure shown in the drawings are not necessarily essential for implementing the present disclosure. Those skilled in the art will understand that the units in the device in the examples can be arranged in the device in the examples as described, or can be alternatively located in one or more devices different from that in the examples. The units in the examples described can be combined into one module or further divided into a plurality of sub-units.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2023/123070 | 10/3/2023 | WO |