Method for Analysing the Rules of Changes Between the Levels of Use of Resources of a Computer System

Information

  • Patent Application
  • 20190384688
  • Publication Number
    20190384688
  • Date Filed
    January 11, 2018
    6 years ago
  • Date Published
    December 19, 2019
    5 years ago
Abstract
A method for evaluating the performance of an application chain within a computer infrastructure comprising a number N resources denoted Ri (1≤i≤N), where the method comprises the steps of: collecting over a same time interval with a same sampling period a predefined number M of series of measurements Xk (1≤k≤M) relating to the level of use of the resources; for all the possible combinations of two series of measurements (Xk1, Xk2), with k1≠k2: creating a plurality of pairs of subsets (X′k1, X′k2) by selecting a predefined number nv of values based on the series Xk1 and Xk2; applying an algorithm for searching affine correlation relation(s) over each pair of subsets; calculating the percentages differences between the values of X′k2(t) and of aX′k1(t)+b for each index t (between 1 and nv); and calculating the saturation values of the series X′k2.
Description

The present invention relates to the field of monitoring an IT (Information Technology) infrastructure, this expression denoting all the hardware and software elements forming the computer system of a company or organization. The invention relates more particularly to the field of analyzing resources (notably processors, operating systems and memories) of an IT infrastructure on which there is hosted an application link chain, i.e. for a process, a functional chain connecting several applications which operate together to perform the process.


A number of IT infrastructures are poorly dimensioned, and most often under-dimensioned. Poor dimensioning results in inadequate performance, or even stopping of production. Correctly dimensioning an IT infrastructure is a major challenge for companies for which production depends on the performance of their IT systems. The term “dimensioning” denotes the capacities (computational and memory) of the servers, coupled with the availability of resources (hardware and software).


An increase in the load of an IT system can be accompanied by a gradual saturation of the resources of the system within the same functional chain (or application link chain). The saturation of a resource blocks the increase in the load of the system and therefore prevents the observation of possible saturation of other resources in the chain.


The use of a resource can bring about the use of another resource. By way of example, in the case of an application ordering a calculation to be performed on a machine A and its result to be saved on a machine B, the level of use of the processors of machine A depends on the progress of the save operations on machine B.


Each resource is characterized by a maximum level of use for optimal functioning (for example, twenty-four percent for a processor).


The present invention aims to propose a method for defining a correlation of the level of use of a resource A with respect to the level of use of a resource B in order to determine, when resource B is saturated and resource A is not, the dimensioning of resource B required to reach the maximum level of resource A.


The objective is to dimension, coherently and optimally, the resources of an IT system and prevent the resources from saturating and the consequences thereof.


The search for correlations in the changes of the levels of use of the resources of an application chain aims to predict:

    • the change in consumptions and the saturations of the resources when the load is increased,
    • the dimensioning of the resources of an application chain comprising several servers.


Solutions exist for monitoring servers individually, but they do not provide for determining the levels of future use of resources, nor establishing a correlation between the various levels of use of resources of different servers within the same application chain.


An objective of the present invention is to enable an automatic analysis of the consumption of resources of an IT system and derive therefrom correlations between the levels of use of the resources.


To this end, there is proposed a method for evaluating the performance of an application chain within an IT infrastructure, comprising a number N of resources Ri (where i is an integer between 1 and N), comprising the steps of:

    • collection, over the same time interval and with the same sampling period periodech of a predefined number M of series of measurements Xk (where k is an integer between 1 and M) relating to the levels of use of different resources,
    • for all possible combinations of two series of measurements (Xk1,Xk2), where k1≠k2, among the collected series:
      • creation of several pairs of subsets (X′k1,X′k2) by selecting a predefined number nv of values from the series of measurements Xk1 and Xk2 respectively,
      • application of an affine correlation relationship search algorithm on each pair of subsets (X′k1,X′k2), the affine correlation being modeled by the equation X′k2=aX′k1+b, where a and b are real numbers,
      • calculation, for each pair (X′k1,X′k2), of the percentages P(t) of the difference between the values of X′k2 (t) and of aX′k1(t)+b according to the formula








P


(
t
)


=

100







X

k





2





(
t
)


-

(



aX

k





1





(
t
)


+
b

)




X

k





2





(
t
)







,








      •  at each index t (between 1 and nv),

      • calculation, for each pair (X′k1,X′k2), and provided that all the values of P(t) are less than or equal to a predefined value T, of saturation values













X

k





1

smin



=





X

k





2

m





i





n



-
b

a






and






X

k





1

sm





ax




=



X

k





2

ma





x



-
b

a



,








      •  where X′k2 min and X′k2 max are respectively the minimum and maximum values of the series of measurements X′k2.







According to various characteristics taken alone or in combination:

    • the value of nv is between 3 and 60.
    • each series of measurements is carried out over a time interval greater than or equal to two hours.
    • each series of measurements is carried out with a sampling period periodech of one minute.
    • the value T is 95%.
    • the number of pairs of subsets is between 1 and 100.


The step for selecting the subsets X′k1 and X′k2 includes the operations of:

    • taking into account the following parameters: the minimum values pmin and maximum values pmax of a search period denoted by p, where p is a variable of the method, the increment size ppas of the period p, a sampling period periodech,
    • creation of the nv values of the subset X′k1 by selecting nv values in the series Xk1,
    • creation of the nv values of the subset X′k2 by selecting nv values in the series Xk2,


The algorithm for searching for an affine relationship between two series of measurements X′k2 and X′k1 comprises the operations of:

    • calculation of a as being the ratio between X′k2moy and X′k1moy, i.e.







a
=


X

k





2

moy




X

k





1

moy





,






    •  where X′k2moy is the average of the differences between the successive values in the list X′k2, i.e.










X

k





2

moy



=


1


n
v

-
1







t
=
2


n
v




(



X

k





2





(
t
)


-


X

k





2





(

t
-
1

)



)









    •  and X′k1moy is the average of the differences between successive values in the list X′k1 i.e.











X

k





1

moy



=


1


n
v

-
1







t
=
2


n
v




(



X

k





1





(
t
)


-


X

k





1





(

t
-
1

)



)




,






    • calculation of b according to the formula









b
=


a
(




i
=
1


n
v




(



X

k





2





(
t
)


-


X

k





1





(
t
)



)




n
v








    •  where X′k2(t) and X′k1(t) are the values in the series X′k2 and X′k1 at the index t.





According to various characteristics taken alone or in combination:

    • the parameter pmin is fixed at a value between 1 and 10.
    • the parameter pmax is fixed at a value between 1 and 100.
    • the parameter ppas is fixed at a value between 1 and 10.





The invention will be better understood and other details, features and advantages of the invention will emerge from reading the following description, given by way of nonlimiting example with reference to the drawings in which:



FIG. 1 is a schematic representation of five resources and possible combinations between the series of measurements carried out on these resources.



FIG. 2 is a functional diagram illustrating various steps of the method for searching for rules of changes between the various resources of an IT system.



FIG. 3 is a pseudocode describing an example embodiment of the method in the case of searching for a rule of change between two series of measurements.





An IT architecture (or system, or infrastructure) conventionally comprises various hardware and/or software resources which, to perform processes, are connected to each other to form one or more functional chains (or application link chains, or application chains).


To optimize the operation of such an application chain, its performance and notably the use of the resources forming it must be evaluated. N (where N is an integer) denotes the number of resources, denoted by Ri (where i is an integer such that 1≤i≤N), of the application chain.


To evaluate the performance of the application chain, the principle is to search for the rules of change between several series of measurements performed on the resources, typically the level of use, load, available memory, occupied disk space or memory. “Rule of change” is understood to mean an affine type correlation relationship between two series of measurements relating to levels of use of resources Ri. FIG. 1 provides an example of five resources Ri (1≤i≤N, N=5), denoted by R1 to R5.


One step in the method involves performing and collecting a plurality of series of measurements denoted by Xk, each measurement supplying a level (or rate) of use of a resource Ri. These series are denoted by X1 to X5 in the example of FIG. 1. The level of use of a resource is a physical quantity, the nature of which can vary according to the type of resource examined. It can be the power consumed in the case of a processor (for example, a central processing unit), a percentage of the maximum transfer rate in the case of a hard disk, or a percentage of the total capacity (or occupation rate) in the case of random access memory.



FIG. 2 illustrates the main steps of the method.


A preliminary step consists in collecting a predefined number M (where M is an integer not necessarily equal to N) of series of measurements Xk(1≤k≤M) carried out over the same time interval and with the same sampling period denoted by periodech.


The measurements are advantageously carried out automatically by a program executed on one or more servers incorporated in the IT infrastructure.


The measurements are preferably performed (and collected) over a time interval of at least two hours, with a sampling period of one minute. By way of example, the measurements are carried out over a period of four hours (typically between 08:00 and 12:00), with a sampling period of one minute (i.e. two successive measurements are spaced out by one minute).


The measurements provide, for example, for determining the level of activity of a central processing unit (CPU) and disks of two servers. In this example, the method proposed by the present invention provides for determining affine type correlations between the activities of processors and disks of two servers, in all possible combinations:

    • correlation between the level of activity of the CPU of the first server and that of its own disk,
    • correlation between the level of activity of the CPU of the first server and that of the disk of the second server,
    • correlation between the level of activity of the CPU of the second server and that of its own disk,
    • correlation between the level of activity of the CPU of the second server and that of the disk of the first server,
    • correlation between the level of activity of the CPU of the first server and that of the CPU of the second server,
    • correlation between the level of activity of the disk of the first server and that of the disk of the second server,


A series of measurements can be the result of one measurement or the combining of the results of several measurements carried out simultaneously. For example, a series of measurements can contain the sum of the data rates of all the disks present on the machine.


The correlation search method proposed by the present invention aims, for a set of series of measurements collected, to establish correlation relationships between different pairs of series of measurements denoted by (Xk1,Xk2) (where k1 and k2 are integers between 1 and M and where k1≠k2) from the collected measurements. Each pair of series of measurements corresponds to a particular combination of two series of measurements. In the example of FIG. 1, if a series of measurements Xk is collected for each resource Ri, i.e. each series of measurements Xk corresponds to the level of use of a resource Ri, then there will be 10 possible pairs of series of measurements denoted by 1 to 10. Recall that an objective of the present invention is the determination of correlation relationships for all possible combinations of two series of measurements.


A first step consists in selecting two series of measurements Xk1 and Xk2 from the set of series of measurements collected.


A second step consists in searching for an affine correlation relationship over at least nv values (where nv is an adjustable integer) between the two series of measurements Xk1 and Xk2. This affine correlation relationship is illustrated by equation (1):






X
k2
=aX
k1
+b  (1)

    • where a and b are real numbers.


Percentages P(t) of the difference between the values Xk2(t) and aXk1(t)+b are calculated, Xk2(t) referring to the value of the measurement of index t in the series Xk2, and Xk1(t) referring to the value of the measurement of index t in the series Xk1. This calculation is illustrated by equation (2), these percentages being defined as follows:










P


(
t
)


=

100







X

k





2




(
t
)


-

(



aX

k





1




(
t
)


+
b

)




X

k





2




(
t
)










(
2
)









    • where t is an integer index such that 1≤t≤nv.





If each value of P(t) obtained is less than or equal to a predefined value T, for example fixed by an operator (typically the network administrator), then the affine correlation relationship (1) is validated and saved. T is called the tolerance percentage and is advantageously fixed at 95%. According to a preferred embodiment, nv is advantageously between 3 and 60.


In this case, the method comprises a next step for calculating saturation values Xk1s min and Xk1s max for the series of measurements Xk1 using the following formulas (3) and (4):










X

k





1

smin


=



X

k





2

m





i





n


-
b

a





(
3
)







X

k





1

smax


=



X

k





2

ma





x


-
b

a





(
4
)









    • where Xk2 min and Xk2 max are the minimum and maximum values, respectively, of the series of measurements Xk2. If at least one of the values Xk1s min or Xk1s max belongs to the interval ]Xk1 min,Xk1 max[ where Xk1 min and Xk1 max are the minimum and maximum values of the series Xk1, then the rule of change found is such that the resource associated with the series of measurements Xk2 will saturate before the resource associated with the series of measurements Xk1. More specifically, the resource Xk2 will begin to saturate when the resource Xk1 comes close to the value of Xk1s min.





If no correlation relationship has been found, an additional step consists in processing the next combination of series of measurements, this step being repeated until all the possible combinations have been analyzed. One variant consists in carrying out this same process for a multitude of pairs of subsets (X′k1,X′k2) obtained from a pair of series of measurements (Xk1,Xk2). In this case, the series X′k1 is obtained by selecting a predefined number nv of values in the series Xk1. Likewise, X′k2 is obtained from Xk2.



FIG. 1 illustrates an example in which the correlation relationship is calculated directly on the series of measurements Xk1 and Xk2, thereby corresponding to the particular case in which nv is equal to the number of values contained in each series Xk1 or Xk2. A variant of the method consists in calculating correlation relationships on subsets (X′k1,X′k2) obtained from a pair of series of measurements (Xk1,Xk2) as indicated earlier. This possibility is offered to the user by proposing an initial configuration illustrated in FIG. 3. This example is provided for an example pair of series of measurements denoted by (Xk1,Xk2). The same steps are applied on all the possible combinations of series of measurements (Xk1,Xk2) from the collected data.


The parameters that can be adjusted by the user are:

    • [Xk1deb,Xk1fin]: an interval for searching values,
    • [Yk2deb,Yk2fin]: an interval for searching values,
    • pmin: the minimum value of variable p corresponding to a period for selecting subsets X′k1 and X′k2,
    • pmax: the maximum value of the period p,
    • ppas: the increment size for the period p,
    • nv: the number of values in each subset X′k1 and X′k2,
    • T: the tolerance percentage for the validation of a correlation relationship between the series X′k1 and X′k2.


For the particular case in which the values of pmin, pmax and ppas are 1, the search intervals cover all the values of Xk1 and Xk2, and nv is equal to the size of the sequence Xk1 and to the size of Xk2. The value of p will then be 1 and the subsets X′k1 and X′k2 will be the same as the initial series Xk1 and Xk2. The search is then performed directly on the series of measurements Xk1 and Xk2.


To construct a subset X′k1 from Xk1, the operation consists in selecting a value on ns in Xk1 and in incorporating it into the subset X′k1. For example, if ns is 2, a value on 2 will be selected in Xk1 to construct X′k1.


If for example pmin is 1, pmax is 8 and ppas is 2, then the values of the variable ns will successively be 2, 4, 6 and 8. This results in four pairs of subsets (X′k1,X′k2) for which a correlation relationship will be sought. A correlation relationship is found for the series Xk1 and Xk2 if correlation relationships are found for all the pairs of subsets (X′k1,X′k2) generated. If during the process, a correlation relationship is not found for at least one pair of subsets, then no correlation is generated between the series of measurements. In that case, a new combination of series of measurements (X′k1,X′k2) is selected in the collected data and the process is restarted.


The benefit of working on pairs of subsets (X′k1,X′k2) of pairs of series of initial measurements (Xk1,Xk2), and not directly on the series of initial measurements is to provide an indicator of the relevance of the correlation found. Specifically, for a pair of series of measurements (Xk1,Xk2), and provided that correlations are found for all the subsets (X′k1,X′k2) generated, the greater the number of subsets, the stronger the correlation relationship between the series of measurements Xk1 and Xk2. According to a preferred embodiment, the number of pairs of subsets used is between 1 and 100.


The variation in the sampling period p, between pmin and pmax, provides for taking into account only the extreme values (high or low, for example in the case of a series of measurements representing a sinusoidal curve).


At the end of this step, a pair of two subsets X′k1 and X′k2 is obtained, each containing nv values.


An affine type correlation equation is sought between these two subsets. It can be expressed as in equation (1): X′k2=aX′k1+b.


The value of a is calculated by calculating the ratio between the average X′k2moy of the differences between the successive values in the list X′k2 and the average X′k1moy of the differences between the successive values in the list X′k1. The calculation of a is illustrated by equation (5):










a
=


X

k





2

moy




X

k





1

moy





,




(
5
)







The calculations of the average values X′k2moy and X′k1moy are illustrated by equations (6) and (7):










X

k





2





moy



=


1


n
v

-
1







t
=
2


n
v




(



X

k





2





(
t
)


-


X

k





2





(

t
-
1

)



)







(
6
)







X

k





1

moy



=


1


n
v

-
1







t
=
2


n
v




(



X

k





1





(
t
)


-


X

k





1





(

t
-
1

)



)







(
7
)







The calculation of the value of b is illustrated by equation (8):









b
=


a
(




t
=
1


n
v




(



X

k





2





(
t
)


-


X

k





1





(
t
)







n
v






(
8
)







where X′k2(t) and X′k1(t) are the respective values in the series X′k2 and X′k1 at index t.


The next step is the test for the reliability of the correlation relationship thus generated. To that end, an example embodiment consists in generating a list Z from nv values of the list X′k1 in which each value Z(t) is connected to the value X′k1(t) by the affine correlation relationship (1): Z(t)=aX′k1(t)+b. Each percentage P(t) of the difference between the values Z(t) and X′k2(t) is calculated, as illustrated in equation (9):










P


(
t
)


=

100







X

k





2





(
t
)


-

Z


(
t
)





X

k





2





(
t
)










(
9
)







The step consisting in generating a list Z(t) is an intermediate step which is not indispensable for calculating the percentage P(t), which can be calculated directly as shown by equation (2). This step generates a sequence of percentages that can be denoted by P and which contain nv values denoted by P(t).


If at least one value of P(t) is strictly greater than the tolerance percentage T, then there is no correlation between the series of measurements X′k1 and X′k2, and the search algorithm processes the next combination of series of measurements. If among a multitude of pairs of subsets (X′k1,X′k2), one among them does not provide a correlation equation, then it is considered that there is no correlation between the series of measurements Xk1 and Xk2 (from which the subsets were generated).


If all the values of P(t) are less than or equal to the tolerance percentage T, then the correlation equation X′k2=aX′k1+b is validated for the pair of subsets X′k1 and X′k2. In that case, the next step is calculating the saturation values X′k1s min and X′k1s max in the same way as in equations (3) and (4), replacing Xk2 min and Xk2 max by X′k2 min and X′k2 max, and as illustrated in FIG. 3.


If a correlation relationship is found for each pair of subsets (X′k1,X′k2), then a correlation exists between the initial series of measurements Xk1 and Xk2. The final values of a, b and the saturation values Xk1s min and Xk1s max are obtained by calculating the average of the values obtained for the subsets exhibiting a correlation.


Thus, this method provides for generating correlation relationships between several series of measurements, which may be used to define a better dimensioning of production infrastructures.

Claims
  • 1. A method for evaluating the performance of an application chain within an IT (Information Technology) infrastructure, comprising a number N of resources Ri (where i is an integer between 1 and N), comprising the steps of: collection, over the same time interval and with the same sampling period periodech of a predefined number M of series of measurements Xk, where k is an integer between 1 and M, relating to the levels of use of different resources,for all possible combinations of two series of measurements (Xk1,Xk2), where k1≠k2, among the collected series: creation of several pairs of subsets (X′k1,X′k2) by selecting a predefined number nv of values from the series of measurements Xk1 and Xk2 respectively,application of an affine correlation relationship search algorithm on each pair of subsets (X′k1,X′k2), this affine correlation being modeled by the equation X′k2=aX′k1+b, where a and b are real numbers,calculation, for each pair (X′k1,X′k2), of the percentages P(t) of the difference between the values of X′k2(t) and of aX′k1(t)+b according to the formula
  • 2. The method as claimed in claim 1, characterized in that the value of nv is between 3 and 60.
  • 3. The method as claimed in claim 1, characterized in that each series of measurements is carried out over a time interval greater than or equal to two hours.
  • 4. The method as claimed in claim 1, characterized in that each series of measurements is carried out with a sampling period periodech of one minute.
  • 5. The method as claimed in claim 1, characterized in that the value T is 95%.
  • 6. The method as claimed in claim 1, characterized in that the number of pairs of subsets is between 1 and 100.
  • 7. The method as claimed in claim 1, characterized in that the selection of the subsets X′k1 and X′k2 includes the operations of: taking into account the following parameters: the minimum values pmin and maximum values pmax of a search period denoted by p, where p is a variable of the method, the increment size ppas of the period p, a sampling period periodech,creation of the nv values of the subset X′k1 by selecting nv values in the series Xk1,creation of the nv values of the subset X′k2 by selecting nv values in the series Xk2.
  • 8. The method as claimed in claim 7, characterized in that the parameter pmin is fixed at a value between 1 and 10.
  • 9. The method as claimed in claim 7, characterized in that the parameter pmax is fixed at a value between 1 and 100.
  • 10. The method as claimed in claim 7, characterized in that the parameter ppas is fixed at a value between 1 and 10.
  • 11. The method as claimed in claim 1, characterized in that the algorithm for searching for an affine relationship between two series of measurements X′k2 and X′k1 comprises the operations of: calculation of a as being the ratio between X′k2moy and X′k1moy, i.e.
Priority Claims (1)
Number Date Country Kind
1750281 Jan 2017 FR national
PCT Information
Filing Document Filing Date Country Kind
PCT/FR2018/000005 1/11/2018 WO 00