This U.S. patent application claims priority under 35 U.S.C. § 119 to: Indian Patent Application number 202321076675, filed on Nov. 9, 2023. The entire contents of the aforementioned application are incorporated herein by reference.
The disclosure herein generally relates to cloud services, and, more particularly, to method and system for carbon-aware storage service selection in the multi-cloud environment.
Cloud computing offers various benefits such as pay-as-you-go-model, flexibility, scalability, resilience, and robust security. Hence, enterprises are switching to cloud for their various storage and computational requirements and to efficiently serve its globally distributed customers. Further, to avoid vendor lock-in and reduce dependencies, enterprises are choosing multi-cloud environment. However, the decision of cloud service provider (CSP) selection and data placement in cloud becomes complex due to multiple factors such as varied pricing, Quality of Service (QOS), data regulations, carbon footprint, and so on.
These decisions need to be taken by the enterprise at the cloud adoption phase as well as post cloud adoption phase. There could be various challenges post cloud adoption, such as changes in the carbon footprint target set by the organization, emerging data regulations at some geographical locations; and changes in the business requirements. Due to these challenges, the decisions need to be reviewed. Also, both sustainability and data regulation compliance are a shared responsibility in cloud. To make efficient decision on global data placement, enterprises need to consider multiple related factors. Existing approaches addressing the cloud selection and data placement do not appear to be capable of various requirements/parameters that are to be considered for optimum selection of cloud. Another challenge is that many of these parameters are subject to changes over a period of time. For example, data regulations in different countries may change from time to time. Existing approaches fail to cater to the changes, which in turn affects the cloud selection strategies.
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a processor implemented method is provided. The method includes receiving, via one or more hardware processors, information on a non-compliant initial allocation of a plurality of User Centers (UC), a plurality of enterprise user requirements, and specifications of one or more service providers as input data. Further, the initial allocation is optimized, via the one or more hardware processors, wherein the optimization of the initial allocation includes: identifying one or more non-compliant UC locations from among the plurality of UCs; determining a) a migration cost, b) an operational cost, and c) a carbon footprint involved in migrating each of the one or more non-compliant UC locations to a Data Regulatory (DR) and latency compliant Data centre (DC) location; determining a priority weight for each of the migration cost, the operational cost, and the carbon footprint; and obtaining a plurality of migration solutions, for a plurality of combinations of the priority weights. Obtaining the plurality of migration solutions includes performing for each of the one or more non-compliant UC locations: listing a plurality of migrations to the compliant DC location for the one or more non-compliant UC locations; determining a score for each of the plurality of migrations based on the determined priority weights, by using Multi Criteria Decision Making (MCDM) technique for the one or more non-compliant UC locations; ranking the plurality of migrations based on the determined score for the one or more non-compliant UC locations; selecting a migration having highest rank from among the plurality of migrations; migrating each of the one or more non-compliant UC locations to the compliant DC location, using the selected migration as a feasible solution; and generating a list of the plurality of feasible solutions, wherein the plurality of feasible solutions are generated by selectively prioritizing a) a migration cost, b) an operational cost, and c) a carbon footprint for each solution, and wherein the plurality of feasible solutions are used as the plurality of migration solutions to optimize the non-compliant initial allocation of the plurality of User Centers (UC).
In an embodiment of the method, a UC from among a plurality of UCs is identified as the non-compliant UC if the UC is not DR compliant and a measured latency of UC-DC is exceeding a pre-defined threshold value.
In an embodiment of the method, the priority weight for each of the migration cost, the operational cost, and the carbon footprint is determined such that sum of the priority weights is 1.
In an embodiment of the method, the initial allocation is generated by iteratively performing till each of the one or more UCs is allocated to a compliant DC among the one or more DCs: obtaining, via the one or more hardware processors, a) one or more storage and processing requirements of one or more UCs, b) locations of a plurality of data centers (DCs), c) a resource cap, and d) pricing of the plurality of DCs; identifying, via the one or more hardware processors, one or more DR compliant DCs from among the plurality of DCs, and in same region of the one or more UCs; and allocating, via the one or more hardware processors, the one or more UCs to the identified one or more DR compliant DCs if a total carbon footprint of the one or more DR compliant DCs is less than a predefined threshold value, wherein a total operation cost is updated after the allocation, and wherein if not available in the same region as that of the one or more UCs, the allocation is done in a different region.
In an embodiment of the method, an initial allocation is determined as the non-compliant allocation, comprising detecting a plurality of non-compliances in the initial allocation over a period of time due to one or more of: (i) a change in DR criteria for the one or more UCs, (ii) a change in invocation frequencies (demands) for one or more UCs resulting in the violation of pre-defined QoS thresholds, and (iii) a change in carbon footprint threshold for the allocation where the current total carbon footprint of the allocation violates the new threshold on carbon footprint.
In an embodiment of the method, selectively prioritizing the migration cost, the operational cost, and the carbon footprint for each solution is based on i) a priority given to a first objective in a pre-defined range (0,1), wherein the priority increases in arithmetic progression for each of the solution, ii) a priority given to a second objective for each of the plurality of feasible solutions, wherein the priority to the second objective is selected in a pre-defined range of (0, 1-priority of the first objective), and iii) priority given to a third objective for each of the plurality of feasible solutions, wherein the priority given to the third objective is equal to 1−(priority of first objective+priority of second objective).
In an embodiment of the method, the plurality of feasible solutions are ordered to form an ordered solution list in each of a plurality of iterations by constructing an ideal function for each of the plurality of iterations, wherein the ideal function is constructed by taking a minimal value of (i) operational cost, (ii) migration cost, (iii) total carbon footprint considering the plurality of solutions in the solution list, and with one or more enterprise decided priority values for each of first objective, the second objective, and the third objective.
In an embodiment of the method, the plurality of feasible solutions are used with one or more other data to obtain a set of compliant solutions having (i) a minimal operational cost, (ii) a minimal migration cost, and (iii) a total carbon footprint value less than a predefined threshold, comprising iteratively performing in each of a plurality of iterations till a pre-defined number of iterations is reached: maintaining a fixed number of ordered solutions in the solution list and ordering the fixed number of ordered solutions based on an ideal function in each of a plurality of iterations; initializing a duplicate solution for each of the plurality of feasible solutions in a solution list, and replacing each UC-DC allocation in the duplicate solution by a DR criteria, latency, and carbon footprint compliant DC among a plurality of DCs, satisfying a pre-defined criterion wherein by replacing each UC-DC allocation, a new solution is obtained for each solution existing in the solution list; and including each newly generated solution in the solution list, if the newly generated solution if a determined quality of the newly generated solution is exceeding a measured quality of at least one existing solution in the solution list, and wherein while including the newly generated solution in the solution list, a solution having least value of measured quality among the solutions existing in the solution list is discarded the last solution or else discarding the newly generated solution.
In an embodiment of the method, the pre-defined number of iterations is a value in a predefined range (0,1), and is dependent on (i) a current iteration count, and (ii) the pre-defined limit on iterations, such that the value of this threshold deceases with increase in iteration count.
In an embodiment of the method, the DC is carbon footprint compliant for an UC if a resulting carbon footprint of the allocation is less than an estimated UC-carbon footprint limit, where the estimated UC-carbon footprint limit for each UC is obtained based on: (i) storage and processing resource requirements of the UC, and (ii) a predefined threshold on the total carbon footprint.
In another embodiment, a system is provided. The system includes one or more hardware processors, a communication interface, and a memory storing a plurality of instructions, wherein the plurality of instructions cause the one or more hardware processors to receive information on a non-compliant initial allocation of a plurality of User Centers (UC), a plurality of enterprise user requirements, and specifications of one or more service providers as input data. Further, the initial allocation is optimized, via the one or more hardware processors, wherein the optimization of the initial allocation includes: identifying one or more non-compliant UC locations from among the plurality of UCs; determining a) a migration cost, b) an operational cost, and c) a carbon footprint involved in migrating each of the one or more non-compliant UC locations to a Data Regulatory (DR) and latency compliant Data centre (DC) location; determining a priority weight for each of the migration cost, the operational cost, and the carbon footprint; and obtaining a plurality of migration solutions, for a plurality of combinations of the priority weights. Obtaining the plurality of migration solutions includes performing for each of the one or more non-compliant UC locations: listing a plurality of migrations to the compliant DC location for the one or more non-compliant UC locations; determining a score for each of the plurality of migrations based on the determined priority weights, by using Multi Criteria Decision Making (MCDM) technique for the one or more non-compliant UC locations; ranking the plurality of migrations based on the determined score for the one or more non-compliant UC locations; selecting a migration having highest rank from among the plurality of migrations; migrating each of the one or more non-compliant UC locations to the compliant DC location, using the selected migration as a feasible solution; and generating a list of the plurality of feasible solutions, wherein the plurality of feasible solutions are generated by selectively prioritizing a) a migration cost, b) an operational cost, and c) a carbon footprint for each solution, and wherein the plurality of feasible solutions are used as the plurality of migration solutions to optimize the non-compliant initial allocation of the plurality of User Centers (UC).
In another embodiment of the system, the one or more hardware processors are configured to identify a UC from among a plurality of UCs as the non-compliant UC if the UC is not DR compliant and a measured latency of UC-DC is exceeding a pre-defined threshold value.
In another embodiment of the system, the one or more hardware processors are configured to determine the priority weight for each of the migration cost, the operational cost, and the carbon footprint such that sum of the priority weights is 1.
In another embodiment of the system, the one or more hardware processors are configured to generate the initial allocation by iteratively performing till each of the one or more UCs is allocated to a compliant DC among the one or more DCs: obtaining, via the one or more hardware processors, a) one or more storage and processing requirements of one or more UCs, b) locations of a plurality of data centers (DCs), c) a resource cap, and d) pricing of the plurality of DCs; identifying, via the one or more hardware processors, one or more DR compliant DCs from among the plurality of DCs, and in same region of the one or more UCs; and allocating, via the one or more hardware processors, the one or more UCs to the identified one or more DR compliant DCs if a total carbon footprint of the one or more DR compliant DCs is less than a predefined threshold value, wherein a total operation cost is updated after the allocation, and wherein if not available in the same region as that of the one or more UCs, the allocation is done in a different region.
In another embodiment of the system, the one or more hardware processors are configured to determine the an initial allocation as the non-compliant allocation, by detecting a plurality of non-compliances in the initial allocation over a period of time due to one or more of: (i) a change in DR criteria for the one or more UCs, (ii) a change in invocation frequencies (demands) for one or more UCs resulting in the violation of pre-defined QoS thresholds, and (iii) a change in carbon footprint threshold for the allocation where the current total carbon footprint of the allocation violates the new threshold on carbon footprint.
In another embodiment of the system, the one or more hardware processors are configured to selectively prioritize the migration cost, the operational cost, and the carbon footprint for each solution based on i) a priority given to a first objective in a pre-defined range (0,1), wherein the priority increases in arithmetic progression for each of the solution, ii) a priority given to a second objective for each of the plurality of feasible solutions, wherein the priority to the second objective is selected in a pre-defined range of (0, 1-priority of the first objective), and iii) priority given to a third objective for each of the plurality of feasible solutions, wherein the priority given to the third objective is equal to 1−(priority of first objective+priority of second objective).
In another embodiment of the system, the one or more hardware processors are configured to order the plurality of feasible solutions to form an ordered solution list in each of a plurality of iterations by constructing an ideal function for each of the plurality of iterations, wherein the ideal function is constructed by taking a minimal value of (i) operational cost, (ii) migration cost, (iii) total carbon footprint considering the plurality of solutions in the solution list, and with one or more enterprise decided priority values for each of first objective, the second objective, and the third objective.
In another embodiment of the system, the one or more hardware processors are configured to use the plurality of feasible solutions with one or more other data to obtain a set of compliant solutions having (i) a minimal operational cost, (ii) a minimal migration cost, and (iii) a total carbon footprint value less than a predefined threshold, comprising iteratively performing in each of a plurality of iterations till a pre-defined number of iterations is reached: maintaining a fixed number of ordered solutions in the solution list and ordering the fixed number of ordered solutions based on an ideal function in each of a plurality of iterations; initializing a duplicate solution for each of the plurality of feasible solutions in a solution list, and replacing each UC-DC allocation in the duplicate solution by a DR criteria, latency, and carbon footprint compliant DC among a plurality of DCs, satisfying a pre-defined criterion wherein by replacing each UC-DC allocation, a new solution is obtained for each solution existing in the solution list; and including each newly generated solution in the solution list, if the newly generated solution if a determined quality of the newly generated solution is exceeding a measured quality of at least one existing solution in the solution list, and wherein while including the newly generated solution in the solution list, a solution having least value of measured quality among the solutions existing in the solution list is discarded the last solution or else discarding the newly generated solution.
In another embodiment of the system, the pre-defined number of iterations is a value in a predefined range (0,1), and is dependent on (i) a current iteration count, and (ii) the pre-defined limit on iterations, such that the value of this threshold deceases with increase in iteration count.
In another embodiment of the system, the DC is carbon footprint compliant for an UC if a resulting carbon footprint of the allocation is less than an estimated UC-carbon footprint limit, where the estimated UC-carbon footprint limit for each UC is obtained based on: (i) storage and processing resource requirements of the UC, and (ii) a predefined threshold on the total carbon footprint.
In yet another aspect, a non-transitory computer readable medium is provided. The non-transitory computer readable medium includes a plurality of instructions, which when executed, cause the one or more hardware processors to receive information on a non-compliant initial allocation of a plurality of User Centers (UC), a plurality of enterprise user requirements, and specifications of one or more service providers as input data. Further, the initial allocation is optimized, via the one or more hardware processors, wherein the optimization of the initial allocation includes: identifying one or more non-compliant UC locations from among the plurality of UCs; determining a) a migration cost, b) an operational cost, and c) a carbon footprint involved in migrating each of the one or more non-compliant UC locations to a Data Regulatory (DR) and latency compliant Data centre (DC) location; determining a priority weight for each of the migration cost, the operational cost, and the carbon footprint; and obtaining a plurality of migration solutions, for a plurality of combinations of the priority weights. Obtaining the plurality of migration solutions includes performing for each of the one or more non-compliant UC locations: listing a plurality of migrations to the compliant DC location for the one or more non-compliant UC locations; determining a score for each of the plurality of migrations based on the determined priority weights, by using Multi Criteria Decision Making (MCDM) technique for the one or more non-compliant UC locations; ranking the plurality of migrations based on the determined score for the one or more non-compliant UC locations; selecting a migration having highest rank from among the plurality of migrations; migrating each of the one or more non-compliant UC locations to the compliant DC location, using the selected migration as a feasible solution; and generating a list of the plurality of feasible solutions, wherein the plurality of feasible solutions are generated by selectively prioritizing a) a migration cost, b) an operational cost, and c) a carbon footprint for each solution, and wherein the plurality of feasible solutions are used as the plurality of migration solutions to optimize the non-compliant initial allocation of the plurality of User Centers (UC).
In yet another embodiment of the non-transitory computer readable medium, a UC from among a plurality of UCs is identified as the non-compliant UC if the UC is not DR compliant and a measured latency of UC-DC is exceeding a pre-defined threshold value.
In yet another embodiment of the non-transitory computer readable medium, the priority weight for each of the migration cost, the operational cost, and the carbon footprint is determined such that sum of the priority weights is 1.
In yet another embodiment of the non-transitory computer readable medium, the initial allocation is generated by iteratively performing till each of the one or more UCs is allocated to a compliant DC among the one or more DCs: obtaining, via the one or more hardware processors, a) one or more storage and processing requirements of one or more UCs, b) locations of a plurality of data centers (DCs), c) a resource cap, and d) pricing of the plurality of DCs; identifying, via the one or more hardware processors, one or more DR compliant DCs from among the plurality of DCs, and in same region of the one or more UCs; and allocating, via the one or more hardware processors, the one or more UCs to the identified one or more DR compliant DCs if a total carbon footprint of the one or more DR compliant DCs is less than a predefined threshold value, wherein a total operation cost is updated after the allocation, and wherein if not available in the same region as that of the one or more UCs, the allocation is done in a different region.
In yet another embodiment of the non-transitory computer readable medium, an initial allocation is determined as the non-compliant allocation, comprising detecting a plurality of non-compliances in the initial allocation over a period of time due to one or more of: (i) a change in DR criteria for the one or more UCs, (ii) a change in invocation frequencies (demands) for one or more UCs resulting in the violation of pre-defined QoS thresholds, and (iii) a change in carbon footprint threshold for the allocation where the current total carbon footprint of the allocation violates the new threshold on carbon footprint.
In yet another embodiment of the non-transitory computer readable medium, selectively prioritizing the migration cost, the operational cost, and the carbon footprint for each solution is based on i) a priority given to a first objective in a pre-defined range (0,1), wherein the priority increases in arithmetic progression for each of the solution, ii) a priority given to a second objective for each of the plurality of feasible solutions, wherein the priority to the second objective is selected in a pre-defined range of (0, 1-priority of the first objective), and iii) priority given to a third objective for each of the plurality of feasible solutions, wherein the priority given to the third objective is equal to 1−(priority of first objective+priority of second objective).
In yet another embodiment of the non-transitory computer readable medium, the plurality of feasible solutions are ordered to form an ordered solution list in each of a plurality of iterations by constructing an ideal function for each of the plurality of iterations, wherein the ideal function is constructed by taking a minimal value of (i) operational cost, (ii) migration cost, (iii) total carbon footprint considering the plurality of solutions in the solution list, and with one or more enterprise decided priority values for each of first objective, the second objective, and the third objective.
In yet another embodiment of the non-transitory computer readable medium, the plurality of feasible solutions are used with one or more other data to obtain a set of compliant solutions having (i) a minimal operational cost, (ii) a minimal migration cost, and (iii) a total carbon footprint value less than a predefined threshold, comprising iteratively performing in each of a plurality of iterations till a pre-defined number of iterations is reached: maintaining a fixed number of ordered solutions in the solution list and ordering the fixed number of ordered solutions based on an ideal function in each of a plurality of iterations; initializing a duplicate solution for each of the plurality of feasible solutions in a solution list, and replacing each UC-DC allocation in the duplicate solution by a DR criteria, latency, and carbon footprint compliant DC among a plurality of DCs, satisfying a pre-defined criterion wherein by replacing each UC-DC allocation, a new solution is obtained for each solution existing in the solution list; and including each newly generated solution in the solution list, if the newly generated solution if a determined quality of the newly generated solution is exceeding a measured quality of at least one existing solution in the solution list, and wherein while including the newly generated solution in the solution list, a solution having least value of measured quality among the solutions existing in the solution list is discarded the last solution or else discarding the newly generated solution.
In yet another embodiment of the non-transitory computer readable medium, the pre-defined number of iterations is a value in a predefined range (0,1), and is dependent on (i) a current iteration count, and (ii) the pre-defined limit on iterations, such that the value of this threshold deceases with increase in iteration count.
In yet another embodiment of the non-transitory computer readable medium, the DC is carbon footprint compliant for an UC if a resulting carbon footprint of the allocation is less than an estimated UC-carbon footprint limit, where the estimated UC-carbon footprint limit for each UC is obtained based on: (i) storage and processing resource requirements of the UC, and (ii) a predefined threshold on the total carbon footprint.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
The accompanying drawings, which are incorporated in and
constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
Cloud service selection related decisions need to be taken by the enterprise at the cloud adoption phase as well as post cloud adoption phase. There could be various challenges post cloud adoption, such as changes in the carbon footprint target set by the organization, emerging data regulations at some geographical locations; and changes in the business requirements. Due to these challenges, the decisions need to be reviewed. Also, both sustainability and data regulation compliance are a shared responsibility in cloud. To make efficient decision on global data placement, enterprises need to consider multiple related factors. Existing approaches addressing the cloud selection and data placement do not appear to be capable of various requirements/parameters that are to be considered for optimum selection of cloud. Another challenge is that many of these parameters are subject to changes over a period of time. For example, data regulations in different countries may change from time to time. Existing approaches fail to cater to the changes, which in turn affects the cloud selection strategies.
In order to address these challenges, embodiments disclosed herein provide a method and system for carbon-aware storage service selection in the multi-cloud environment The method includes receiving information on a non-compliant initial allocation of a plurality of User Centers (UC), a plurality of enterprise user requirements, and specifications of one or more service providers as input data. Further, the initial allocation is optimized, wherein the optimization of the initial allocation includes: identifying one or more non-compliant UC locations from among the plurality of UCs; determining a) a migration cost, b) an operational cost, and c) a carbon footprint involved in migrating each of the one or more non-compliant UC locations to a Data Regulatory (DR) and latency compliant Data centre (DC) location; determining a priority weight for each of the migration cost, the operational cost, and the carbon footprint; and obtaining a plurality of migration solutions, for a plurality of combinations of the priority weights. Obtaining the plurality of migration solutions includes performing for each of the one or more non-compliant UC locations: listing a plurality of migrations to the compliant DC location for the one or more non-compliant UC locations; determining a score for each of the plurality of migrations based on the determined priority weights, by using Multi Criteria Decision Making (MCDM) technique for the one or more non-compliant UC locations; ranking the plurality of migrations based on the determined score for the one or more non-compliant UC locations; selecting a migration having highest rank from among the plurality of migrations; migrating each of the one or more non-compliant UC locations to the compliant DC location, using the selected migration as a feasible solution; and generating a list of the plurality of feasible solutions, wherein the plurality of feasible solutions are generated by selectively prioritizing a) a migration cost, b) an operational cost, and c) a carbon footprint for each solution, and wherein the plurality of feasible solutions are used as the plurality of migration solutions to optimize the non-compliant initial allocation of the plurality of User Centers (UC). By using this approach, the system and method are able to make the cloud selection related decisions based on a variety of parameters.
Referring now to the drawings, and more particularly to
The system 100 includes or is otherwise in communication with hardware processors 102, at least one memory such as a memory 104, an I/O interface 112. The hardware processors 102, memory 104, and the Input/Output (I/O) interface 112 may be coupled by a system bus such as a system bus 108 or a similar mechanism. In an embodiment, the hardware processors 102 can be one or more hardware processors.
The I/O interface 112 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 112 may include a variety of software and hardware interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, a printer and the like. Further, the I/O interface 112 may enable the system 100 to communicate with other devices, such as web servers, and external databases.
The I/O interface 112 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite. For the purpose, the I/O interface 112 may include one or more ports for connecting several computing systems with one another or to another server computer. The I/O interface 112 may include one or more ports for connecting several devices to one another or to another server.
The one or more hardware processors 102 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, node machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 102 is configured to fetch and execute computer-readable instructions stored in the memory 104.
The memory 104 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, the memory 104 includes a plurality of modules 106.
The plurality of modules 106 include programs or coded instructions that supplement applications or functions performed by the system 100 for executing different steps involved in the process of the carbon-aware storage service selection in the multi-cloud environment, being performed by the system 100. The plurality of modules 106, amongst other things, can include routines, programs, objects, components, and data structures, which performs particular tasks or implement particular abstract data types. The plurality of modules 106 may also be used as, signal processor(s), node machine(s), logic circuitries, and/or any other device or component that manipulates signals based on operational instructions. Further, the plurality of modules 106 can be used by hardware, by computer-readable instructions executed by the one or more hardware processors 102, or by a combination thereof. The plurality of modules 106 can include various sub-modules (not shown). The plurality of modules 106 may include computer-readable instructions that supplement applications or functions performed by the system 100 for the carbon-aware storage service selection in the multi-cloud environment.
The data repository (or repository) 110 may include a plurality of abstracted piece of code for refinement and data that is processed, received, or generated as a result of the execution of the plurality of modules in the module(s) 106.
Although the data repository 110 is shown internal to the system 100, it will be noted that, in alternate embodiments, the data repository 110 can also be implemented external to the system 100, where the data repository 110 may be stored within a database (repository 110) communicatively coupled to the system 100. The data contained within such external database may be periodically updated. For example, new data may be added into the database (not shown in
In an embodiment, the system 100 comprises one or more data storage devices or the memory 104 operatively coupled to the processor(s) 102 and is configured to store instructions for execution of steps of the method 300 by the processor(s) or one or more hardware processors 102. The steps of the method 300 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in
At step 302 of the method 300 in
The initial allocation is generated by the system 100 by executing steps involved in method 500 in
After the allocation of the one or more UCs to the DR compliant DC, the system 100 calculates and updates a total operation cost. In an embodiment, if the DR compliant DC is not available in the same region as that of the one or more UCs, the system 100 searches for a DR compliant DC in a different region, and upon finding one, allocates the one or more UCs to the identified DC.
The system 100 may determine an initial allocation as the non-compliant allocation, in terms of one or more of (i) a change in DR criteria for the one or more UCs, (ii) a change in invocation frequencies (demands) for one or more UCs resulting in the violation of pre-defined Quality of Service (QOS) thresholds, and (iii) a change in carbon footprint threshold for the allocation where the current total carbon footprint of the allocation violates the new threshold on carbon footprint, which may be monitored over a period of time. For example, if a UC is not DR compliant and a measured latency of UC-DC is exceeding a pre-defined threshold value, then the system 100 may identify the UC as non-compliant UC.
Referring back to method 300, at step 304 of the method 300, the system 100 optimizes the initial allocation, via the one or more hardware processors 102. Steps 304a through 304e in
Given a solution which is a mapping between UCs and DCs, the values of metrics (i.e., the migration cost, the operational cost, and the carbon footprint) are obtained as:
Further, at step 304c, the system 100 determines a priority weight for each of the migration cost, the operational cost, and the carbon footprint. Process of determining the priority weight for each of the migration cost, the operational cost, and the carbon footprint involves: i) a priority given to a first objective in a pre-defined range (0,1), wherein the priority increases in arithmetic progression for each of the solution, ii) a priority given to a second objective for each of the plurality of feasible solutions, wherein the priority to the second objective is selected in a pre-defined range of (0, 1-priority of the first objective), and iii) priority given to a third objective for each of the plurality of feasible solutions, wherein the priority given to the third objective is equal to 1−(priority of first objective+priority of second objective). Here, the first objective, the second objective, and the third objective refers to one of the migration cost, the operational cost, and the carbon footprint, in no particular order.
In an embodiment of the method, the priority weight for each of the migration cost, the operational cost, and the carbon footprint is determined such that sum of the priority weights is 1.
Further, at step 304d, the system 100 obtains a plurality of migration solutions, for a plurality of combinations of the priority weights. Various steps involved in the process of obtaining the plurality of migration solutions by the system 100 are depicted in method 400 in
For a non-compliant UC, there are multiple compliant DCs for migration. Each migration is characterized by three values (i) migration cost, (ii) operational cost, (iii) total carbon footprint. Then steps involved in the MCDM technique are:
Further, at step 406 of the method 400, the system 100 ranks the plurality of migrations based on the determined score for the one or more non-compliant UC locations. In various embodiments, the system 100 may rank the plurality of migrations in ascending or descending order. Further, at step 408 of the method 400, the system 100 selects a migration having highest rank from among the plurality of migrations. Highest rank or highest score for a migration indicates that the migration is most suitable in terms of compliance as well as cost factors. A particular migration having low score and in turn low rank would indicate that the migration being considered is less feasible in terms of the considered parameters/aspects. In an embodiment, high/low in the context of score/rank is in comparison with a reference threshold, which could be a pre-defined value. Further, at step 410 of the method 400, the system 100 migrates each of the one or more non-compliant UC locations to the compliant DC location, using the selected migration as a feasible solution. Further, at step 412 of the method 400, the system 100 generates a list of the plurality of feasible solutions, wherein the plurality of feasible solutions are generated by selectively prioritizing a) a migration cost, b) an operational cost, and c) a carbon footprint for each solution, and wherein the plurality of feasible solutions are used as the plurality of migration solutions to optimize the non-compliant initial allocation of the plurality of User Centers (UC). The system 100 performs the steps 402 through 412 for each of the one or more non-compliant UC locations.
In an embodiment, the system 100 may order the plurality of feasible solutions to form an ordered solution list in each of a plurality of iterations by constructing an ideal function for each of the plurality of iterations, wherein the ideal function is constructed by taking a minimal value of (i) operational cost, (ii) migration cost, (iii) total carbon footprint considering the plurality of solutions in the solution list, and with one or more enterprise decided priority values for each of first objective, the second objective, and the third objective.
In another embodiment, the plurality of feasible solutions are used with one or more other data to obtain a set of compliant solutions having (i) a minimal operational cost, (ii) a minimal migration cost, and (iii) a total carbon footprint value less than a predefined threshold. Various steps involved in the process of obtaining the set of compliant solutions are depicted in method 600 in
Algorithmic explanation of method 300:
In an embodiment, the system 100 models the storage selection process in the multi-cloud environment as below:
The term ‘repair’ in the context of the embodiments disclosed herein represents a step of making a con-compliant allocation a compliant one by generating appropriate solution(s). Consider an enterprise with a set of user centers U={u1, u2, . . . , un} belonging to multiple geographic regions {R}. Let P={p1, p2, . . . , pm} be the set of cloud service providers (CSPs). Let K=Up∈P Kp, where Kp={d1, d2, . . . , dp} be the set of data centers (DCs) of the service provider (p). Each data centre belongs to a geographical location.
Model decision variables: A decision variable xik=1, if UC (i) is allocated to DC (k), otherwise 0. A migration decision variable yijk=1, if UC (i) is migrated from DC (j) to DC (k), otherwise 0. Most of the Cloud Storage Providers (CSPs) have volume-based tiered pricing and the pricing at each DC could be different. The decision variable Ikb∈[0, (qkb−qkb″)] indicates the amount of data allotted in tier (b) of DC (k), where qkb denotes the data break point for DC (k) for tier (b) (TB). Variable skb=1, if bth tier is used for DC (k) for storing data, otherwise 0.
Objective: The objective is to minimize the total cost, which comprises migration cost, and operational cost. As operational cost may be incurred on pre-defined intervals (for example, on a monthly basis), while migration cost may be relatively one-time cost. A decision horizon factor (H) is considered. Further, operational cost includes data transfer, storage, and processing cost.
Migration and Allocation constraints:
Constraint (2) ensures that each UC can be re-allocated to maximally one DC i.e., UC (i) data which is currently placed on DC (j) can be migrated to only one DC (k). While Constraint (3) states that each UC must be allocated to exactly one DC.
Utilization and Capacity constraints: The storage and processing capacity at each DC for an enterprise is limited. Constraints (6) and (7) ensure that the capacity for storage and processing at each DC is not exceeded, where Cap_Sk and Cap_Pk are the data storage (TB) and processing capacities (cores) at DC (k) respectively. Constraint (4) computes the data stored at each DC (k), represented by integer variable zk∈[0, Caps
Constraint (8) distributes the data among multiple CSPs with proportion (δp), where op δp∈[0, 1] is the fraction of data allotted to CSP (p).
Constraint (9) captures data stored at each DC (k). Constraints (10) and (11) distribute data in tiers for each DC (k). Constraints (12) and (13) limit the amount of data to be allotted for each DC (k). Tiers b, b′, and b″ are indexed by b′=(b+1)th and b″=(b−1)th from the set of tiers (B).
Constraint (14) ensures the data regulatory compliance while allocating UCs to DCs. Parameter βik∈{0, 1} captures DR regulation between UC (i) and DC (k), where βik=0 indicates that regulations at country of UC (i) do not permit to store their citizen data at DC location (k).
Constraint (15) ensures that UC can be migrated only from those DCs where it was previously allocated, where xij* denotes the current configuration of UC (i) at DC (j). Constraint (16) ensures that UCs can be migrated only to those DCs that are preferred by the model. For example, xik=0, it means DC (k) is not selected for allocation in the model, then UC (i) cannot migrate to DC (k), hence yijk must be 0.
Latency is the average response time to access the data from DC (k) for UC (i). Enterprise has stringent criteria on maximum allowable latency (λi) (milliseconds) for serving its users. Constraint (17) prohibits selecting data centers which exceeds latency threshold value. Where, Fi is the average invocation frequency at UC (i) for accessing data per month. And Lik denotes the latency (ms) between UC (i) and DC (k).
Carbon emission at data center is directly proportional to the power consumed. There are many factors contributing to the energy consumption at a data center, for example, energy required for cooling, thermal etc. Average carbon emission (CO2k) at (kth) data center also depends on the proportion of energy supply obtained from renewables and power grids. The procedure for the power consumption and carbon emission calculations at data centers, is obtained from state of the art approaches. As from the enterprise perspective, energy required for storage and compute, plays a significant role in carbon footprint estimation. The operating power required (Ereqk) at data center (k) is computed in terms of storage and compute requirements as follows:
Where,
Constraint (19) states that the total CFP contributed by all the UCs must not exceed a carbon threshold value (ξ). Parameter ξ is defined as the maximum allowable carbon footprint (tons/month) for the network, which is set by the enterprise.
The optimization model for repair of data placement is a NP-hard multi-objective optimization model with multiple stringent constraints. Finding feasible solution in a short time for large problem instances by optimizers becomes computationally expensive. Hence, the system 100 uses a search-based procedure, i.e., the method 300, for generating the compliant solutions, which is explained below:
In the context of the embodiments disclosed herein, a solution is said to be non-compliant if there exists at least one non-compliant (ui, dj) mapping in the solution. A pair (ui, dj) is non-compliant if:
Algorithm 1 below depicts process of generating compliant initial solutions using a multi-criterion ranking method, and Algorithm 2 depicts a procedure for fine-tuning the initial solutions.
Constructing Compliant Solutions using Multi-criteria Ranking: The procedure takes the non-compliant solution (S0), as an input (refer Algorithm 1). To obtain a compliant solution, first step is to identify compliant DCs for each of the non-compliant UC. Next, a migration matrix is constructed, where changes in operational cost (ΔOC), migration cost (ΔMC), and carbon footprint (ΔCFP) for each non-compliant UC are computed, if it is migrated to a compliant DC (lines 4-9). The next step is to provide weights (priority) to each objective function and select the mapping with the highest rank (lines 12-15), thereby finding the most appropriate DC to migrate non-compliant UC. After migrating all non-compliant UCs the feasible solutions are obtained. The procedure is repeated with different combinations of priority weights to obtain several compliant solutions (line 10).
However, using multi-criteria ranking method for initial solution generation can result in sub-optimal solution. To improve the solution quality, the generated solution is fine-tuned using the steps depicted in the Algorithm 2.
Creating List of Compliant Data Centers: Algorithm 2 takes the set of compliant solutions (Slist) obtained by the algorithm 1 as input. Slist contains N number of solutions. For each UC, a list (DCUC*) of compliant DCs (lines 2-4) is obtained. The list is created based on data regulation, latency, and carbon threshold value. Generally, carbon footprint target for cloud computing is set by the enterprise considering all the user centers and regions. To effectively reduce the CFP and to make carbon-aware data placement decisions, the maximum permissible carbon footprint (carbon threshold) for each UC region/location is estimated considering its storage and compute requirements. The carbon threshold for a UC in Equation (21), where datau and coreu represent the data and core resource requirement for a user center (u).
Procedure for Ordering Quality Solution: To evaluate the quality of generated solutions and to decide, which solution to accept or to reject, a distance function is required. As the problem is a multi-objective problem, there are several solutions. For ordering the solutions based on the quality, its distance from an ideal solution id measured. In this approach, the ideal objective values for each criterion are obtained from the complete solution set (Slist) as given in the Equation (22). Then, a Euclidean distance between each new solution and fgen* and order the solutions (line 7) is computed.
To improve the initial solution quality, the system 100 explore the neighborhood by randomization. The existing solutions are randomized using probability value (α) to obtain new solutions. The randomization probability is initially high and further as the generation progresses the value a decreases (line 9). This helps to generate diverse solutions at the start and later, do intensification.
For each generation, randomization probability (a) is obtained (line 9) using parameters β0=0.9, β=0.9, and φ=2. Each initial solution is randomized to obtain a new solution (lines 10-25). UC-DC pair in the solution is selected for repair with the probability value (val) (lines 12-14). If selected, user center's current DC allocation is replaced by a new DC, which is chosen with uniform distribution from the list DCUC* (line 15-18). The process is repeated for each UC and a new solution is generated. The new solution is included in Slist, if it is better than at least one solution in Slist (lines 20-24). If selected, the last solution in Slist is discarded. Slist maintains N number of solutions in each generation. The process is repeated till completion of (MaxGen) generations/iterations.
Due to customer data confidentiality, the results are demonstrated on synthetically generated data set. The data related to Cloud service providers (CSPs) has been obtained from three cloud service providers. Data pertaining to storage and processing cost, as well as carbon emission details are obtained from published data. Names of the providers are anonymized and are represented as {P1, P2, P3}.
Data of 50 user centers is generated synthetically. The UCs are spread across six continents. Table I describes the basic stats of different CSPs and user centers. Number of data centers along with data centers with low carbon emission values for each of the service provider belonging to different geographic regions are also given in Table. 1. The data centers where the emission values are lower than 250 gram/KWhr are considered as Low carbon emission centers. For example, CSP P3 has highest number of low carbon data centers (DCs).
The data generation procedure for other attributes is discussed below:
Migration cost among data centers was generated using normal distributions in the following way:
For DR regulation data, a strict data localization and GDPR policy were considered. Strict data localization necessitates that user of the same country must be assigned to data centers within the same country while GDPR permits the transfer of data within European Union countries.
The values of parameters Estore=1.2 (Watt/TB), Ekidle=100 (Watt), and Ecore=10 (Watt) were considered.
The decision horizon (H) of 6 months was considered.
The experimentation was performed on AMD RYZEN-E495 15 CPU @1.5 GHz with 8 GB RAM and 256 GB SSD of memory. In this approach, a mathematical integer optimization model was solved using open-source solver SCIP contributed by OR Tools by Google developers.
Various results pertaining to single and multiple CSP selection with CFP minimization, repair with minimal migration cost, data distribution across CSPs and repair algorithm in this section are demonstrated.
B. CSP Selection based on minimal CFP
Table II describes the results on minimal CFP and operational cost that can be achieved by selecting single and multiple CSPs. Results were obtained by solving the optimization problem as detailed above, with the single objective of minimizing carbon footprint. Later, the optimization problem was solved with the objective of minimizing the operational cost and using the CFP obtained in the first phase as restriction in the optimization problem. The other criteria of capacity, latency, and data regulations were kept as stringent constraints, which needed to be met.
Results from the Table II states that among the three single CSPs, minimal CFP can be achieved by selecting P3 provider for the enterprise compute and storage requirements. However, if the enterprise goal is to operate with minimal operational cost, in that scenario, service provider P2 could be selected. But by selecting P2 service provider, the CFP is higher by 75.18%. Further, it was observed that the results of service providers P2 and P3 dominated results of CSP P1 for the enterprise requirements. One reason could be due to higher number of data centers in {P2, P3} compared to P1.
Best multi-cloud combination for the enterprise requirements was found to be {P2, P3} with minimal CFP. Last column in the Table III displays the data distribution across the selected CSPs. The model achieved minimal carbon footprint and operational cost on the approximate distribution of 70% and 30% data across CSPs P2 and P3 respectively. CSP combination {P1, P2, P3} gives the same results as combination {P2, P3} as there was no data distributed at P1.
Table III demonstrates the implications of various tradeoffs on CFP and operational cost with CSP combination {P1, P2, P3}. The row and column with value ‘0’ indicates that the criteria were ignored, and ‘1’ means criteria is considered, while solving the optimization problem. When both latency and DR criteria were considered, finding economical solution with lower CFP became difficult. As due to data regulation restriction selecting green data centers can be prohibited, similarly due to latency constraint few economic data center selection was restricted.
indicates data missing or illegible when filed
It was observed that minimum CFP could be achieved by selecting {P3} service provider for enterprise storage and compute requirements. However, there existed a trade-off between CFP and operational cost. With minimum CFP target, an enterprise may need to pay high operational cost. As the CFP target was relaxed, the operational cost reduced significantly. This trade-off between the operational cost and carbon footprint is demonstrated in sub-figures (a) and (b) in
The results are demonstrated on the three test cases. In each test case, the data set was changed for each of the 50 user centers, while maintaining its distribution and parameters. Service provider P3 data centers, pricing and carbon emission was considered. The modelled problem was solved considering the objective of operational cost. The model was bounded by CFP threshold (ξ). CFP threshold value (ξ) was varied starting from its minimum value till the point where there is no further change in operational cost. It was observed that for stiff CFP threshold (ξ), operational cost is very high and as threshold is relaxed, cost is reduced. There exists multiple feasible solution in the given range of CFP threshold values. This scenario was demonstrated considering the CFP threshold (ξ) values in the range (27, 28) (ton/month) in sub-figure (b) in
Solution Repair: Various decisions pertaining to cloud adoption for an enterprise, i.e., service provider selection using the modelled approach have been already discussed. However, the enterprises on cloud are required to deal with post adoption challenges such as: compliance with the emerging data regulations, meet the revised carbon (emission) footprint targets, and business pressures of keeping the operational cost low.
A first step in the solution repair approach adopted by the system 100 is to analyze the current configuration. Table IV displays the initial solution. The data of 50 user centers of the enterprise was placed on P2 service provider. The overall operational cost and total CFP for the enterprise requirements were estimated for the current settings. There are 10 data regulation violations detected by the compliance checker, while no violation in latency criteria. Detail mapping of each user center and data center placement (xij*) is known.
The generated solutions are repaired such that it is data regulatory compliant and meets the revised carbon footprint targets. For an enterprise to be compliant and achieve minimal carbon footprint, there are multiple options (a) re-allocate (repair) within the same cloud service provider or (b) opt for multi-cloud environment and check, which service providers would be best suited for attaining the enterprise goals on carbon footprint. Results on three CFP threshold values (ξ)={85, 80, 75} ton/month have been demonstrated.
The experimental results stated that the enterprise migrating to multi-cloud (i.e., CSPs combinations) is advantageous than migrating within the same service provider {P2}. The total cost is minimum in multi-cloud {P2, P1, P3} and {P2, P3} combination compared to single provider. However, the repair solution with multi-cloud combination {P2, P3} has the highest savings in operational cost. Hence, the enterprise can select CSP P3 as another service provider for its data placement. Further, the results illustrate that as CFP threshold is set tighter, total cost increases while percentage savings in operational cost decreases.
Note, the savings in total cost depends on the quality of the initial configuration. If the initial configuration i.e., data placement is not optimal, and there are violations in data regulations, then repair (brownfield analysis) would result in significant improvements.
D. Repair with Specific Data Distribution at CSPs
Data distribution among CSPs is important to avoid vendor lock-in and dependency issues. Table V illustrates the implications of data distribution among CSPs on costs. Results are demonstrated considering distinct data proportion among {P2, P3} service provider. The distribution proportions considered are {(30, 70), (50, 50), (70, 30)} (%) with CFP threshold (ξ) value as 80 ton/month.
It was observed that migrating more data to P3 leads to larger user center migrations, along with increase in migration and total cost.
Storage service selection is a multi-objective problem, which comprises multiple Pareto-optimal solutions. Using Algorithm 1 multiple non-dominating solutions were obtained. Quality of solutions generated was evaluated using the Algorithm 2, by comparing it against the solutions obtained using an optimization method. For each of the non-dominating solutions the optimization problem was solved by minimizing the migration cost and setting the carbon footprint and operational cost obtained from the repair algorithm as a constraint. 50 non-dominating solutions generated by the method 300 were compared, and it was observed that on an average the algorithm is 5% away from the optimal solutions. In a few cases, Pareto-optimal solutions were obtained. These results are depicted in
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
The embodiments of present disclosure herein address unresolved problem of optimal storage selection in multi-cloud environments. The embodiment, thus provides a mechanism to generate one or more solutions related to the storage selection in multi-cloud environments, satisfying a plurality of constraints. Moreover, the embodiments herein further provide a mechanism of repairing the solutions, i.e., fine-tuning the solutions to accommodate changes in one or more of the constraints.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored.
Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
202321076675 | Nov 2023 | IN | national |