Method For Detecting Abnormal Values Of A Biomarker

Information

  • Patent Application
  • 20200388388
  • Publication Number
    20200388388
  • Date Filed
    November 14, 2018
    6 years ago
  • Date Published
    December 10, 2020
    4 years ago
Abstract
The invention proposes a new method for monitoring a health event in a mammal, comprising detecting at least one abnormal value within a series of values related to at least one biomarker, based on appropriate Z-scores.
Description
FIELD OF THE INVENTION

The invention relates to clinical and biological monitoring and follow-up of individuals, such as the biological passport for athletes or medical files for patient, based on the individual and longitudinal monitoring of biomarkers. The invention allows to monitor any health event in a mammal.


Such monitoring or follow-up are used to identify abnormal behavior by comparing the deviation of an individual's biological sample to an established baseline. These comparisons may be done via different ways, but each of them requires an appropriate extra population to compute the significance levels, which is a non-trivial issue. Moreover, it is not necessarily relevant to compare the measures of a biomarker of a professional athlete to that of a reference population (even restricted to other athletes), and a reasonable alternative is to detect the abnormal values by considering only the other measurements of the same person.


The invention is therefore a tool to sort or identify individuals who should be subjected to further analysis. For instance, in medical studies, upon the result of the monitoring, a more detailed check-up of the patient can be run.


The method of the invention is also of interest for identifying a deficient measuring material by detecting abnormal value or series of values.


PRIOR ART

The systematic monitoring of biological variables over time allows for the early detection of physiological or biological changes. Such longitudinal studies are currently used for the identification of atypical biological observations in elite athletes.


This biological monitoring allows for determining the baseline value and its associated variability, provided a sufficient follow-up and a large cohort. Several sport organizations and researchers already developed such approaches in order to establish the biological profile of elite athletes for doping detection or medical concerns.


Several approaches for such a monitoring are known.


A well know method for identifying an abnormal value in longitudinal biomarker has been described in “Bayesian detection of abnormal values in longitudinal biomarkers with an application to TIE ratio”, 2006, Sottas et al. in Biostatitics.


A known method, close to Sottas', called method 0, will now be disclosed.


Let X1, X2, . . . , Xn as n independent Gaussian random variables, with Xi˜N(μi2), where N means a normal distribution. The method allows testing if the “new” observation xn is abnormal, given x1, x2, . . . , xn-1. They represent values of a biomarker of a person.


The usual way is to consider the following indicator:







T
n

=



X
n

-


X
_


n
-
1






σ
^


n
-
1





1
+

1

n
-
1










Where Xn-1 and {circumflex over (σ)}n-1 are the empirical mean and variance of the sample X1, . . . , Xn-1, that is








X
_


n
-
1


=


1

n
-
1







k
=
1


n
-
1




X
k











σ
^


n
-
1

2

=


1

n
-
2







k
=
1


n
-
1





(


X
k

-


X
_


n
-
1



)

2







Sottas quoted that, for small n, one cannot use this indicator to detect an abnormal value of Xn by comparing the observed value |tn| of |Tn| to the quantile 1−α/2 of the N(0,1) distribution.


However, if all the μi are equal, Tn is distributed according to the Student(n−2) distribution, and the quantiles of this distribution can be used.


Given the observations x1, x2, . . . , xn-1 from the sample X1, X2, . . . , Xn there seems to be no good reason to consider only the case where the last observation could be abnormal given the previous ones.


A much more natural question is: how to detect an abnormal observation among all the observations? Even if one focuses only on the last observation, this question is of interest, because the presence of one (or more) abnormal observation in the past could lead to a wrong decision about the new observation xn.


The first way is to iterate Method 0 as follows: one checks if x3 is abnormal given x1,x2 then one checks if x4 is abnormal given x1, x2, x3 and so on up to the last step where one checks if xn is abnormal given x1, x2, . . . , xn-1.


This approach is not satisfactory for at least two reasons.


First, the procedure is not invariant by time inversion (the forward and backward procedures can lead to two distinct results). Secondly, there is a multi-test problem, because n−2 tests are performed. If each test is performed at level a (alpha), then the level of the series of tests will be greater than a. Moreover, the level correction seems difficult, since these tests are not independent, and the Bonferroni correction is known to be too conservative.


The practical consequences are the following: if no correction is made, then the series of tests is not well calibrated, with a type I error rate greater than announced, leading to an over-detection of false-positive values (specificity problem, this issue being more and more important as n increases). If the Bonferroni correction is performed, then again the series of tests is not well calibrated, with a type I error rate smaller than announced, leading to an under-detection of true positive values (sensitivity problem, this issue being more and more important as n increases)”.


SUMMARY OF THE INVENTION

In a global aspect, the invention proposes a method for monitoring a health event in a mammal, comprising detecting at least one abnormal value within a series of values related to at least one biomarker, said biomarker corresponding to a physiological parameter or to the level of any biological or chemical entity measured from a biological sample of said mammal, the series of values (x1, x2, . . . , xn related to said at least one biomarker being obtained from independent variables (X1, X2, . . . , Xn) normally distributed (according to N(μx2));


said method comprising the following steps of:


E1: determining from each biological sample collected at different periods of time from said mammal, a value related to said at least one biomarker thereby acquiring a series of values (x1, x2, . . . , xn) related to said biomarker,


E2: storing the series of values (x1, x2, . . . , xn) on a database stored in a memory;


said method comprising the following steps run with a processor that can retrieve data from the database in the memory:


E3: calculating, for the whole series of values of step E2 a single value tn of an indicator (Tn), said indicator (Tn) being based on a studentized form (Rn,i) of the variables (X1, X2, . . . , Xn), said calculation consisting of extracting the maximum observed value (tn) of the studentized form (Rn,i) calculated for each value of the series of values (x1, x2, . . . , xn),


E4: comparing the observed value (tn) of the indicator (Tn) to the quantile (cα,n) of the distribution of (Tn) said quantile being stored in the memory,


E5: if the observed value (tn) of the indicator (Tn) is above the quantile, reporting, on displaying means, a presence of an abnormal value in the series thereby indicating the occurrence of a health event for said mammal.


To summarize, the claimed method allows for a single detection rule, thus avoiding the above mentioned multi-test issues that would occur by testing if each new value is abnormal.


Besides, there is no need for comparison to values of a test population that can be considered as normal. Indeed, at the end of step E5, no decision is made towards a particular clinical picture: the single information that pops out is that the series contains a value “abnormal” compared to the other values of the series. This allows to expect a gain in specificity, intra individual variability being lower than the inter individual variability, let alone the issue of the actual relevance of the use of a standard (e.g. an athlete cannot be compared with standard population).


The method can comprise the further step of


E6: identifying from the series of values reported in step E2 at least one value being considered as abnormal, and


E7: reporting on displaying means that at least one abnormal value.


In a first aspect, the method comprises the following features:

    • a single series x1, x2, . . . , xn of n values is stored on the database,
    • n is the number of variables,
    • X1, X2, . . . , Xn represent the n variables,
    • the studentized form is a studentized residual expressed as:







R

n
,
i


=



X
i

-


X
_


n
,

-
i







σ
^


n
,

-
i






1
+

1

n
-
1












    • the indicator Tn is expressed as:










T
n

=


max

i


{

1
,

,
n

}







R

n
,
i










where







X
_


n
,

-
i



=


1

n
-
1








k
=
1

,

k

i



n
-
1




X
k











σ
^


n
,

-
1


2

=


1

n
-
2








k
=
1

,

k

i



n
-
1





(


X
k

-


X
_


n
,

-
i




)

2







For that aspect of the invention, a relevant table of quantiles of the distribution of Tn is table 1.


This first aspect can also be expressed under a multivariate extension with the following features:

    • a single series x1, X2, . . . , Xn of n values is stored on the database,
    • n is the number of variables of dimension d,
    • X1, X2, . . . , Xn represent the variables, where Xi˜N(μi,C) and C, the covariance matrix is assumed to be invertible,
    • the studentized form is expressed as a normalized length of the residual vector:







R

n
,
i


=



n
-
1


n

d





(


X
i

-


X
_


n
,

-
i




)






C

n
,

-
i



-
1




(


X
i

-


X
_


n
,

-
i




)







where the notation z′ means the transpose of the vector Z,

    • the indicator Tn is expressed as:







T
n

=


max

i


{

1
,

,
n

}





R

n
,
i











X
_


n
,

-
i



=


1

n
-
1








k
=
1

,

k

i



n
-
1




X
k










C

n
,

-
i



=


1

n
-
1
-
d








k
=
1

,

k

i



n
-
1





(


X
k

-


X
_


n
,

-
i




)




(


X
k

-


X
_


n
,

-
i




)










For that aspect of the invention, a relevant table of quantiles of the distribution of Tn is table 1b or 1c (for d=2 and d=3).


In a second aspect, the method comprises the following features:

    • a single series x1, x2, . . . , xn of n values is stored on the database,
    • n is the number of variables,
    • X1, X2, . . . , Xn represent the variables,
    • φ is the collection of all possible intervals I of consecutive integers included in {1, . . . , n} with length 1≤|I|<n, {1, . . . , n}=I∪Ī and I∩Ī=ϕ,
    • the studentized form is expressed as:







R

n
,
I


=




X
_

I

-


X
_


I
_






σ
^


n
,
I





1
+

1

n
-
1












    • the indicator Tn is expressed as:










T
n

=


max

I

ϕ






R

n
,
I










where







X
_

I





=


1


I








k

I




X
k











X
_


I
_


,

=


1

n
-


I









k


I
_





X
k












σ
^


n
,
I

2

=


1

n
-
2




(





k

I





(


X
k

-


X
_

I


)

2


+




k


I
¯






(


X
k

-


X
_


I
_



)

2



)






For that aspect of the invention, a relevant table of quantiles of the distribution of Tn is table 2.


In a third aspect, the method comprises the following features:

    • two series of n1 and n2 values are stored on the database (DB), x1, x2, . . . , xn1 and y1, y2, . . . , yn2,
    • the studentized forms are studentized residuals expressed as:







R


n
1

,
i


=



X
i

-


X
_



n
1

,

-
i







σ
^


X
,

-
i

,
Y





1
+

1


n
1

-
1











and






R


n
2

,
j


=



Y
j

-


Y
_



n

2
,


-
j






σ
^


Y
,

-
j

,
X





1
+

1


n
2

-
1












    • the indicator Tn is expressed as:










T
n

=

max


{



max

i


{

1
,

,

n
1


}







R


n

1





i





,


max

j


{

1
,

,

n
2


}







R


n
2

,
j






}







where







X
_



n
1

,

-
i



=


1


n
1

-
1








k
=
1

,

k

i



n
1




X
k











Y
_



n
2

,

-
j



=



1


n
2

-
1








k
=
1

,

k

j



n
2





Y
k








σ
^


X
,

-
i

,
Y

2




=



1

n
-
3




(






k
=
1

,

k

i



n
1





(


X
k

-


X
¯



n

1




-
i



)

2


+




k
=
1


n
2





(


Y
k

-


Y
_


n
2



)

2



)








σ
^


Y
,

-
j

,
X

2


=


1

n
-
3




(





k
=
1


n
1





(


X
k

-


X
¯


n
1



)

2


+





k
=
1

,

k

j



n
2





(


Y
k

-


Y
_



n
2

,

-
j




)

2



)








For that aspect of the invention, a relevant the table of quantiles of the distribution of Tn is table 3.


This third aspect can also be expressed under a multivariate extension with the following features:

    • two series of n1 and n2 values are stored on the database (DB), x1, x2, . . . , xn1 and y1, y2, . . . , yn2,
    • n is the number of variables of dimension d,
    • X1, X2, . . . , Xn1 and Y1, Y2, . . . , Yn2 represent the variables, where Xi˜N(μxi,C), Yj˜N(μYj,C) and C, the covariance matrix is assumed to be invertible,
    • the studentized form is expressed as a normalized length of the residual vector:







R


n

1
,



i


=




n
1

-
1



n
1


d





(


X
i

-


X
_



n
1

,

-
i




)






C

X
,

-
i

,
Y


-
1




(


X
i

-


X
_



n

1




-
i



)










R


n
2

,
j


=




n
2

-
1



n
2


d





(


Y
j

-


Y
_



n
2

,

-
j




)






C

Y
,

-
j

,
X


-
1




(


Y
j

-


Y
_



n
2

,

-
i




)









    • the indicator Tn is expressed as:










T
n

=

max


{





max

i


{

1
,

,

n
1


}





R


n
1

,
i









max

j


{

1
,

,

n
2


}





R


n
2

,
j






}







where







X
_



n
1

,

-
i



=


1


n
1

-
1








k
=
1

,

k

i



n
1




X
k











Y
_



n
2

,

-
j



=


1


n
2

-
1








k
=
1

,

k

j



n
2




Y
k










C

X
,

-
i

,
Y


=


1

n
-
2
-
d




(






k
=
1

,

k

i



n
1





(


X
k

-


X
_



n
1

,

-
i




)




(


X
k

-


X
_



n
1

,

-
i




)





+




k
=
1


n
2





(


Y
k

-


Y
¯


n
2



)




(


Y
k

-


Y
_


n
2



)






)









C

Y
,

-
j

,
X


=


1

n
-
2
-
d




(





k
=
1


n
1





(


X
k

-


X
_


n
1



)




(


X
k

-


X
_


n
1



)





+





k
=
1

,

k

j



n
2





(


Y
k

-


Y
¯



n

2




-
j



)




(


Y
k

-


Y
_



n

2




-
j



)






)






In the uni-variate case, Method 1 of the invention can be applied with at least three observations per individual. This is of course a (rather mild) restriction of the method but, on another hand, it can be applied to any sequence of independent Gaussian outcomes, without the help of an extra cohort to determine an appropriate a priori distribution.


It can be applied to any sequence of independent Gaussian outcomes, without the help of an extra cohort to determine an appropriate a priori distribution. That is to say, there is no need to have a predetermined known mean value or standard value (or values) of a biomarker within a population to apply the method. Besides the above-mentioned expectation of a gain in specificity in detecting abnormal variation by overcoming the issues in link with multitests, it has also an interest in term of cost of the study (no need for creating or buying standard values dataset) and reliability (relevance of the comparison of individual data with the corresponding data of a cohort).


The method of the invention can be applied to independent and normally distributed random variables, for series with at least three observations. This last point is not a strong limitation, since today's follow-ups often include more than three observations, and the sampling rate is increasing over the years. These methods can be continued with further investigations carried out by a medical staff in order to shed light on detected abnormal follow-ups.


The method can be improved to include additional periodic environmental effects, such as training at altitude. Practical applications include detecting abnormal values in elite athletes' or sport animal follow-ups for clinical or anti-doping purposes, and in individual's routine health check-up. It can also be useful in detecting pathological values in clinical follow-ups of subject based on individual rather than population-based thresholds. However, in that latest case the method is not aimed at identifying any disease for curative purpose, but merely to identify any potential health issue for the individual that would afterwards need further care or medical analysis to obtain a diagnosis by the physician.


The method may comprise the following steps:

    • a step ES of running a Shapiro test on the stored values (x1, x2, . . . , xn1) to check if the variables are normally distributed,
    • a step ET of applying a function to the values to turn them into normally distributed variables, such as a log transformation.


The invention also relates to a program for computer comprising code lines to be executed by a processor, said code lines being configured to operate a method as disclosed previously from its step E2 to E4. The code lines may also be configured to run the different aspects of the method disclosed previously and/or run steps ES, ET.


The invention also relates to a system for monitoring a health event in a mammal, comprising detecting at least one abnormal value within a series of values related to at least one biomarker, said biomarker corresponding to a physiological parameter or to the level of any biological or chemical entity measured from a biological sample of said mammal, the system comprising:

    • a database stored on a memory, the database comprising at least a series of values (x1, x2, . . . , xn) related to said at least one biomarker being obtained from independent variables (X1, X2, . . . , Xn) normally distributed (according to N(μx2));


      the values representing the evolution of the biomarker at different period of times, the values (x1, x2, . . . , xn) of the biomarker being obtained from independent variables (X1, X2, . . . , Xn) normally distributed N(μx2),
    • a processor comprising:
      • calculating means to calculate, for the whole series of values (x1, x2, . . . , xn) stored in the database (DB), a single value (tn) of an indicator (Tn), said indicator (Tn) being based on a studentized form (Rn,i) of the variables (X1, X2, . . . , Xn), said calculation consisting of extracting the maximum value (tn) of the studentized residual form (Rn,i) calculated for each value of the series (x1, x2, . . . , xn),
      • comparing means to compare the observed value (tn) of the indicator (Tn) to the quantile (cα,n) of the distribution of (Tn),
      • instruction means to instruct displaying means to report, or reporting means to report on displaying means, a presence of an abnormal value in the series, if the observed value (tn) of the indicator (Tn) is above the quantile.


The system can also comprise acquisition means configured to acquire a series of values related to at least one biomarker and store them on a database stored in a memory and/or displaying means to receive the instructions of the processor.


The least one biomarker represented by the series of values may be chosen in the following list: ferritin, serum iron, hemoglobin, erythrocyte count, hematocrit levels, complete blood count, platelets, reticulocytes, soluble transferrin receptor, vitamin B9 in red blood cell, blood sugar, cholesterol, triglycerides, serum glutamic oxaloacetic transaminase (SGOT), serum glutamate pyruvate transaminase (SGPT), gamma-glutamyltransferase (γ-GT), lactate dehydrogenase (LDH), bilirubin, electrolytes (e.g. Na+, Cl, K+, HCO3, Ca2+, Mg2+), alkaline phosphatases Magnesium in red blood cells, creatinine, androstenedione, urea, uric acid, haptoglobin, C-reactive protein (CRP), transthyretin, orosomucoid, creatine phosphokinase (CPK), inorganic phosphate (PO4), thyroid-stimulating hormone (TSH), testosterone, cortisol, erythropoietin (EPO), ferritin, luteinizing hormone (LH), Insulin-like growth factor 1 (IGF-1), osteocalcin, calcifediol (25 OHD3).





DRAWING

The following figures are given as a complement to understand the invention in a non-limitative way:



FIG. 1 illustrates a set-up to run the invention,



FIG. 2 illustrates the main steps of an embodiment of the invention,



FIG. 3 illustrates the frequency of abnormal values detected by different methods according to embodiments of the invention,



FIG. 4 illustrates the distribution of estimated Tn of an embodiment of the invention versus the normal distribution for erythrocyte count and hematocrit,



FIG. 5 illustrates the frequency of abnormal series for each biomarker and methods according different embodiments of the invention (α=5%),



FIGS. 6a and 6b illustrate examples of results for the different embodiments of the invention, for several biomarkers,



FIG. 7 illustrates a flow chart representing some steps of an embodiment according to the invention,

    • Tables 1 to 3 illustrate the value of the quantile for the different disclosed embodiments. Table 1 includes three tables 1a, 1b, 1c.



FIG. 8 illustrates the frequency of abnormal series for each biomarker and methods according different embodiments of the invention for α=1.0, 2.5, 5.0 or 10.0% (number of abnormal series/total of series (Percentage of abnormal series)),



FIG. 9 illustrates an example of an abnormal series identified with method 2 for ferritin (increase of ferritin levels over the time),



FIG. 10 illustrates an example of an abnormal series identified with method 2 for ferritin (decrease of ferritin levels over the time),





DETAILED DESCRIPTION

In the biological field, working under the assumptions that the biomarkers can be represented as independent variables is true. A study by Sottas (Pierre-Edouard Sottas, Norbert Baume, Christophe Saudan, Carine Schweizer, Matthias Kamber, Martial Saugy; Bayesian detection of abnormal values in longitudinal biomarkers with an application to T/E ratio, Biostatistics, Volume 8, Issue 2, 1 Apr. 2007, Pages 285-296) showed that after three days, no correlation can be observed between two values of a biomarker.


In the case of blood sample for instance, the average sampling rate is of the order of several months.


“Biomarker” correspond for the present invention to the level of any biological or chemical entity measured from a biological sample of a mammal. Complementarily, biomarkers can also represent physiological parameters of the mammal, which have to be first collected and gathered and then extemporaneously applied in the method according to the invention, thereby not requiring any interaction with the body of the subject.


The biomarker(s) is (are) for instance one or more of biomarker(s) chosen in the following list: the concentrations of ferritin, serum iron, hemoglobin, erythrocyte count, hematocrit levels, complete blood count, platelets, reticulocytes, soluble transferrin receptor, vitamin B9 in red blood cell, blood sugar, cholesterol, triglycerides, serum glutamic oxaloacetic transaminase (SGOT), serum glutamate pyruvate transaminase (SGPT), gamma-glutamyltransferase (γ-GT), lactate dehydrogenase (LDH), bilirubin, electrolytes (e.g. Na+, Cl, K+, HCO3, Ca2+, Mg2+), alkaline phosphatases magnesium in red blood cells, creatinine, androstenedione, urea, uric acid, haptoglobin, C-reactive protein (CRP), transthyretin, orosomucoid, creatine phosphokinase (CPK), inorganic phosphate (PO4), thyroid-stimulating hormone (TSH), testosterone, cortisol, erythropoietin (EPO), ferritin, luteinizing hormone (LH), Insulin-like growth factor 1 (IGF-1), osteocalcin, calcifediol (25 OHD3).


In a particular aspect, the biomarker(s) is (are) for instance one or more of biomarker(s) chosen in the following list: the concentrations of ferritin, serum iron, hemoglobin, erythrocyte count, hematocrit levels.


Biomarkers corresponding to the level of any biological or chemical entity are measured from a sample of a subject, for example, a fluid sample as blood, plasma, serum or urine.


Examples of basic physiological parameters of the art are ECG, heart rate, respiratory rate, respiratory volume, body temperature, blood pressure, electromyogram measured by any technical mean of the art.


The terms “health event” correspond to any situation related to the state of health of a subject, being suspected from the detection using the method of the invention of an abnormal value in a series for one or more biomarker.


In the present description, reference will be made to a human subject but the disclosed method applies to any mammal.


The rationale of the invention is based on the fact that the joint distribution of some appropriate residuals is free of the parameters in a Gaussian context.


In a first step E1, several biological samples of the mammal, collected at different periods of time are analyzed. For each sample, a value related to at least one of the precited biomarker is acquired. Therefore, for each mammal, a series of values x1, x2, . . . , xn is determined in step E1.


Those values represent the evolution of the biomarker in function of the time. Typically, index 1 refers to the oldest measurement while index n generally refers to the latest one.


Step E1 can encompass the in vitro analysis run in labs on the blood sample. In a particular aspect step E1 also encompasses the tests run on the individual to obtain a value of a biomarker, more particularly a biomarker related to a physiological parameter. In another particular aspect, step E1 does not encompass the test run on the individual but only the gathering and formatting of the data related to said biomarkers in order to allow their processing in the further steps of the method. In a more particular aspect, step E1, when related to a physiological parameter does not encompass the test run on the individual but only the gathering and the formatting of the data related to said physiological parameter.


In second step E2, the series of values x1, x2, . . . , xn are stored on a database DB. This database preferably contains several series related to different biomarkers


As show on FIG. 1, which illustrates a system according to an embodiment of the invention, the database is stored on a memory 12, which can be ROM or RAM, of a calculation unit 10. The calculation unit 10 also comprises processing means 14, such as a processor, to compute data from the database DB of the memory 12. To implement data in the memory 12, the calculation unit 10 comprises interface means 16. The processing means 14 comprises calculation means, comparing means and instruction means (all of them can be included in the same processor for instance).


The system can also comprise acquisition means (sensors, laboratory materials, etc.) which can be used for during step E1.


The results obtained through the calculation unit 10 are shown on displaying means 20, for instance a screen. Communication between the calculation unit 10 and the displaying means 20 is achieved via a wire for instance (VGA, HMDI, etc.)


The calculation unit 10 can be a personal computer or a delocalized server (cloud computing) communicating with a local interface for instance through a network (ethernet, WIFI, etc.)


Before being computed by the processor according to the method that shall be disclosed herebelow, the series of values x1, x2, . . . , xn can be treated to be in a proper shape in a step ET. Step ET will be illustrated later.


The database DB is updated with data obtained from different processes.


The method of the invention used the maximum of a “studentized” form. Several embodiments are going to be disclosed herebelow.


That method allows detecting an abnormal value (or an abnormal subseries of values) in a series of values.


The invention will be disclosed in details for the first embodiment. The other embodiments will be disclosed more briefly.



FIG. 2 illustrates the main steps of a method according to an embodiment.


Method 1 of the Invention


In a first aspect of the invention, the method proposes a global test, defined by the following indicator:







T
n

=


max

i


{

1
,

,
n

}







R

n
,
i









Where







R

n
,
i


=



X
i

-


X
_


n
,

-
i







σ
^


n
,

-
i






1
+

1

n
-
1










Rn,i being the studentized residual form








X
_


n
,

-
i



=


1

n
-
1








k
=
1

,

k

i



n
-
1




X
k











σ
^


n
,

-
i


2

=


1

n
-
2








k
=
1

,

k

i



n
-
1





(


X
k

-


X
_


n
,

-
i




)

2







In that embodiment, the studentized residual form Rn,i is actually a studentized residual.


This method is based on the maximum of n (non independent) variables with Student(n−2) distribution.


Under the null hypothesis H0, μ12= . . . =μn=μ, the distribution of Tn does not depend on the unknown parameters μ,σ2 and can therefore be tabulated.


Indeed, although an analytical resolution is not possible, a largely good enough precision can be obtained through tabulation (for instance twenty million tries per threshold are easily computable with standard computational means, to allow a 10−3 precision).


Once the indicators Tn has been implemented, the processor 14 computes the calculus in a step E3. As a matter of fact, at least n operations are calculated: each value of student residuals form Rn,i and then at least one operation to extract the maximum value tn of Tn.


According to that method, for one series of samples x1, x2, . . . , xn representing the values of the variables X1, X2, . . . , Xn the step of calculation shall provide a single output, noted Tn.


The next step E4 is the comparison of the observed value tn of the indicator Tn to a threshold. The threshold is taken from a quantile table, representing the distribution of Tn.


The quantile tables are typically precomputed and stored in the memory.


For a threshold α, it can be considered that the series (i.e. at least one value of the series) is abnormal if tn>cα,n, where cα,n is such that Pn([cα,n,∝[=α.


Therefore the processor computes that comparison with the results tn obtained at the end of the former step E3.


In a step E5, the result of the comparison is displayed on the displaying means 20 if tn>cα,n, where cα,n is the quantile of order 1−α of the distribution of Tn. In other words, if the observed value tn of the indicator Tn is above the quantile, a presence of an abnormal value in the series is reported on the displaying means, thereby indicating the occurrence of an health event for said mammal.


Table 1a provides values for the quantile of the distribution of Tn.


The method can be applied as soon as n≥3.


With the information of an abnormal value, application in the doping control or the medical follow-up of individuals are immediate.


In another embodiment this method can be equally applied to multivariate extension, that is when X1, X2, . . . , Xn are n independent Rd valued random vector, where Xi˜N(μi,C) and C (the covariance matrix) is assumed to be invertible.


To detect if a vector of the series is abnormal, indicator Tn is implemented in the following way:







T
n

=


max

i


{

1
,

,
n

}





R

n
,
i







Where







R

n
,
i


=



n
-
1

nd




(


X
i

-


X
_


n
,

-
i




)






C

n
,

-
i



-
1




(


X
i

-


X
_


n
,

-
i




)







Rn,i being a kind of studentized residual form, more precisely a normalized length of the residual vector, z′ being the transpose vector of vector z.








X
_


n
,

-
i



=


1

n
-
1








k
=
1

,

k

i



n
-
1




X
k










C

n
,

-
i



=


1

n
-
1
-
d








k
=
1

,

k

i



n
-
1





(


X
k

-


X
_


n
,

-
i




)




(


X
k

-


X
_


n
,





-
i




)










Step E2 is identical, excepted for the indicators to be applied.


As previously explained, under the null hypothesis, μ12= . . . =μn=μ, the distribution of Tn does not depend on μ or C and can therefore be tabulated.


The comparison step E4 is the same and the displaying step E5 is also the same.


This method can be applied as soon as n≥d+2.


Table 1b or 1c provides values for the quantile of the distribution of Tn (for d=2 and d=3).


The univariate case described before is just a specific case of the multivariate case (up to a square in the expression of Tn).


Method 2 of the Invention


A further implementation of the invention would be to detect a series of consecutive observations that are abnormal compared to the rest of the series.


Formally speaking, there is need to know if there is a set I={i, . . . , j} of consecutive integers, for which the expectations of the Xk, k∈I is different from the expectation of the other variables, that is μkiA if k, | do not belong to I and μklB if k, | belong to I.


The rationale behind method 2 is the same as behind method 1. The maximum of a studentized form is calculated by the processor 14.


More precisely, the proposed indicator is







T
n

=


max

i

ϕ






R

n
,
I









Where φ is the collection of all possible intervals I included in {1, . . . , n} with length 1≤|I|<n, {1, . . . , n}=I∪Ī and I∩Ī=ϕ







R

n
,
I


=




X
_

I

-


X
_


I
_






σ
^


n
,
I





1
+

1

n
-
1



















X
_

I

=


1


I








k

I




X
k












X
_


I
_


=


1


n
-

|
I
|







k


I
_





X
k











σ
^


n
,
I

2

=


1

n
-
2




(





k

I





(


X
k

-


X
_

I


)

2


+





k


I
_











(


X
k

-


X
_


I
_



)

2



)






As the indicator is working on intervals I, the studentized form has been adapted from the studentized residual stricto sensu of Method 1.


Under the null hypothesis, again, the distribution of Tn does not depend on the unknown parameters μ,σ2 and can be tabulated.


The comparison and displaying steps E4, E5 are similar to the ones previously disclosed.


If the comparison step E4 is positive, then it means that there is a subseries of consecutive values that can be considered as abnormal compared to the rest of the series.


Table 2 provides values for the quantile of the distribution of Tn.


Like method 1, method 2 can be extended to multivariate extension.


Method 3 of the Invention


At last, the method of the invention can be applied to two series of observations x1, x2, . . . , xn1, y1, y2, . . . , yn2. In that case, there is a need to know if one of these observations is abnormal given the other ones in the same subsample.


This method allows considering the case where there is an influence of season on the expectation of the variables. Indeed, the behavior of the biomarkers may differ according to the period of the year (summer and winter for instance). The sample is thus split into two subsamples x1, x2, . . . , xn1 and y1, y2, . . . , yn2.


The proposed indicator is







T
n

=

max


{



max

i


{

1
,

,

n
1


}







R


n
1

,
i





,


max

j


{

1
,

,

n
2


}







R


n
2

,
j






}






Where








R


n
1

,
i


=




X
_

i

-


X
_



n
1

,

-
i







σ
^


X
,

-
i

,
Y





1
+

1


n
1

-
1














R


n
2

,
j


=




Y
_

j

-


Y
_



n
2

,
1






σ
^


Y
,

-
j

,
X





1
+

1


n
2

-
1











Rn1,1, Rn2,j being studentized residuals,








X
_



n
1

,

-
i



=


1


n
1

-
1








k
=
1

,

k

i



n
1




X
k











Y
_



n
2

,

-
j



=


1


n
2

-
1








k
=
1

,

k

j



n
2




Y
k











σ
^


X
,

-
i

,
Y

2

=


1

n
-
3




(






k
=
1

,

k

i



n
1





(


X
k

-


X
_



n
1

,

-
i




)

2


+




k
=
1


n
2





(


Y
k

-


Y
_


n
2



)

2



)










σ
^


Y
,

-
j

,
X

2

=


1

n
-
3




(





k
=
1


n
1





(


X
k

-


X
_


n
1



)

2


+





k
=
1

,

k

j



n
2





(


Y
k

-


Y
_



n
2

,

-
j




)

2



)






Under the null hypothesis again, the distribution of Tn does not depend on the unknown parameters μAB2 and can therefore be tabulated.


Table 3 provides values for the quantile of the distribution of Tn.


This method can be applied as soon as n1≥2 and n2≥2


The same extension to multivariate cases also applies.


In that case, let us consider X1, x2, . . . , Xn1 and Y1, Y2, . . . , Yn2 and two independent series of independent Gaussian Rd valued random vectors with XiN(μxi,C), Yj˜N(μYj,C) and C is assumed to be invertible.


The purpose is to detect if the vector xk or yl is abnormal given the other ones in the same subsample.


The following indicators is proposed







T
n

=

max


{






max

i


{

1
,

,

n
1


}







R


n
1

,
i





,







max

j


{

1
,

,

n
2


}







R


n
2

,
j








}






Where







R


n
1

,
i


=




n
1

-
1



n
1


d





(


X
i

-


X
_



n
1

,

-
i




)






C

X
,

-
i

,
Y


-
1




(


X
i

-


X
_



n
1

,

-
i




)










R


n
2

,
j


=




n
2

-
1



n
2


d





(


Y
j

-


Y
_



n
2

,

-
j




)






C

Y
,

-
j

,
X


-
1




(


Y
j

-


X
_



n
2

,

-
i




)







Rn1,i, Rn2,j being the normalized lengths of the residual vectors,








X
_



n
1

,

-
i



=


1


n
1

-
1








k
=
1

,

k

i



n
1




X
k











Y
_



n
2

,

-
j



=


1


n
2

-
1








k
=
1

,

k

j



n
2




Y
k










C

X
,

-
i

,
Y


=


1

n
-
2
-
d




(






k
=
1

,

k

i



n
1





(


X
k

-


X
_



n
1

,

-
i




)




(


X
k

-


X
_



n
1

,

-
i




)





+




k
=
1


n
2





(


Y
k

-


Y
_


n
2



)




(


Y
k

-


Y
_


n
2



)






)









C

Y
,

-
j

,
X


=


1

n
-
2
-
d




(





k
=
1


n
1





(


X
k

-


X
_


n
1



)




(


X
k

-


X
_


n
1



)





+





k
=
1

,

k

j



n
2





(


Y
k

-


Y
_



n
2

,

-
j




)




(


Y
k

-


Y
_



n
2

,

-
j




)






)






Under the null hypothesis, μx1= . . . =μXn1X and μY1= . . . =μYn2Y, the distribution of Tn does not depend on the unknown parameters μX, μY, C and can be tabulated.


The comparison and displaying steps E4, E5 are the same as previously explained.


The table of the values for the quantile of the distribution of Tn can be computed (not disclosed in the Tables here).


This method can be applied as soon as n1≥2 and n2≥2 and n2≥d+3.


The steps of the different embodiments of the disclosed method are summarized in flow chart in FIG. 7.


Further Elements


To check that the variables are normally distributed, a Shapiro test can be run on the stored values (step ES). This step ES can be run during step E2 for instance, or between E2 and E3.


To ensure that some variables are normally distributed, some series are log-transformed in a further step ET. Any mathematical transformation that works would be applicable.


This step ET can be run during step E2 for instance, or between steps E2 and E3.


The computation of the quantiles is known for a skilled person and will not be disclosed here.


Once step E5 has been achieved and it is known whether or not an abnormal value (or a subseries of value) exists, that abnormal value can be identified from the series and extracted in step E6 and reported on the displaying means 20 in step E7.


The abnormal value is the value that maximizes the indicators Tn.


Step E5 and E6 are not resources-consuming since all the values of the indicators Tn, have already been computed.


The present invention comprises a new and simple method based on maxima of Z-scores that do not rely on the use of data from an external population (reference population). This method is extended to three Z-scores-based analyses for detecting abnormal values in a series of biological measures from an individual's sample with particular specificities. For example, making use of this invention, it is possible to assess the individual baseline while taking into account the seasonal changes that alter the values of biomarkers. The multivariate approach is also developed in order to avoid multi-test issues and to take care of the possible correlations between biomarkers.


As it will be disclosed below, the embodiments of the invention (Methods 1 to 3) have been tested on the follow-up of elite athletes and the results are in accordance with the expected false discovery rate, in most of the cases.


Performance of Method 1


This example shows how Method 1 detects an abnormal value when all the others are identically distributed.


1×106 sequences consisting of n independent Gaussian random variables are simulated. The third random variable has distribution N(μ,1), with μ>0, whereas all the other random variables have distribution N(0,1). The simulations are performed for a number of observations n ranging from 4 to 15 and μ ranging from 0 to 10 with a step of 0.1.


The same procedure is used for the multivariate analysis, with d=2. The third random vector has distribution n({tilde over (μ)},C), whereas all the other random vectors have distribution N(0,Id). Here ũ=(μ,μ) with the same possible values of μ as in the uni-variate case, and the covariance matrix C is given by:






C
=

[



1


0.5




0.5


1



]





The results are given in FIG. 3 (Method 1: sketches A and C and its multivariate extension: sketches B and D, as a function of μ and μd for the levels α=5% (A,B) and α=1% (C,D); the dash line μ=3 is merely an indicator).


As expected, the power curve is increasing with n and μ. Percentages of abnormal sequences detected for n=9 and μ, respectively, equals to 2, 4 and 6 are 10.53%, 51.06% and 90.57%.


Application to Football Players


After obtaining the approval of the Institutional Ethics Committee the 3 methods were applied to a database of elite soccer players.

    • A) Application to blood biomarkers from a sample of 2577 soccer players.


It consists of five typical biological markers from 2577 male soccer players from the French elite leagues 1 & 2. Biomarkers include concentrations of ferritin (μmol/L), serum iron (μmol/L), hemoglobin (g/L), erythrocytes (T/L) and hematocrit levels (%).


The biomarkers were collected every six months in July/August and in January/March from 2006 to 2012 for a total of twelve collections. The large interval between two measures (around six months) allowed for independent sampling (Sharpe and others (2006)). Only the series for which at least five measurements over the twelve possible measurements were available had been analyzed, totalizing: ferritin & serum iron levels from 757 players, erythrocytes & hematocrit levels from 799 players and for hemoglobin levels from 807 players because of the high number of transfers between clubs, injuries and the progressive inclusion of new clubs in the elite league.


Moreover, a technical issue with the sampling instrument resulted in the loss of the data in the markers of the 2009 July/August collection. According to Custer and others (1995); Tufts and others (1985) the measure of the ferritin and serum iron have a log-normal distribution, so the logarithm of the observations for these two biomarkers was used (step E1).


The individual series of biomarkers must comply with the conditions related to the normal distribution: each individual and each biomarker, step ES was conducted and a Shapiro test was achieved to confirm that it is not unrealistic to assume that the observations are drawn from a normal sample.


Some non-detailed analysis on the data set is run to conclude that only ferritin and serum iron are eligible for methods 1 and 2. For the other biomarkers, the third method can be applied as they were found subjected to a significant seasonal variability.


For the two markers ferritin and serum iron, the empirical distribution of Tn (Method 1) is close to the theoretical one under H0 (see FIG. 4: distribution of estimated Tn with its correspond simulated distribution 106 individuals, gray curve) for erythrocytes (A) and hematocrit (B)). This confirms that the quantiles of the theoretical distribution of Tn can be used to detect abnormal values on our dataset.


The frequency of abnormal series detected by the procedures are reported in FIG. 5 for a level α=5%.


Most of the results are not too far from the expected false discovery rate: between 5% and 6.7% for Ferritin/Method 1, Serum iron/Method 2 and all three others biomarkers for Method 3; 8.19% for Serum Iron/Method 1. However, the percentage of abnormal sub-series for Ferritin/Method 2 is close to 11%, hence significantly different from the expected level α=5%. Though further analyses are required to shed light on these series, and notably on the actual cause for these abnormal values, method of the invention is found effective in detecting abnormal values within series of data related to biomarkers. Indeed, the same percentage of abnormal values is obtained by applying the multi-variate version of Method 1 or by applying the two uni-variate procedures (6.9% in both cases).


In FIGS. 6a and 6b, some examples are given of abnormal observations detected by the three methods, for different biomarkers: method 1 (A, B, C), Method 2 (D, E, F) and Method 3 (G, H, I); panels J, K, L show the multivariate version of Method 1. The y-axis represents the biomarker values (units depend on the biomarker), and the x-axis corresponds to time (semesters).

    • B) Application to blood biomarkers from a sample of 3936 soccer players and blind analysis validation study.


As above, concentrations of ferritin (μmol/L), serum iron (μmol/L), hemoglobin (g/L), erythrocytes (T/L) and hematocrit levels (%) were five typical biological markers from 3936 male soccer players from the French elite leagues 1 & 2 collected from 21/06/2005 to 05/10/2017.


The frequency of abnormal series detected by the procedures are reported in FIG. 8 for a level α=1, 2.5, 5 or 10%. The more alpha is high, the more the number of supposed abnormal series is important.


Results confirm those of the above analysis. A percentage of abnormal series close to that of false discovery rate is observed, except for ferritin/Method 2, for which the number of abnormal series is found unexpectedly high. (15.5% for α=5%). As shown for two typical cases depicted in FIGS. 9 and 10 ferritin levels are found to increase or to decrease as a function of time for several individuals of the tested population. In FIGS. 9 and 10, the x-axis corresponds to the indices of the samples, starting at x=1. The y-axis is the ferritin value in μmol/L.


Validation of Data


Data related to hemoglobin levels of 60 individuals (comprising at least 7 measures) randomly chosen were extracted from the base and independently analyzed by a medicine doctor (data not transformed with log) and by the methods of the invention (choosing a threshold as low as α=2.5) for seeking abnormal profiles. The abnormal profiles identified by the doctor and those while using the methods of the invention 1, 2 or 3 were then compared.


The doctor spotted out 13 “abnormal” series (22% of “abnormal” series) of the 60 series:

    • four of the four series (100%) detected by method 1 are found in the doctor's list.
    • Six of the seven series (85.7%) detected by method 2 are found also in the doctor's list.
    • Five of the seven series (71.4%) detected by method 3 are in the doctor's list.


Then it can be inferred that methods 1, 2 and 3 are in line with the diagnostic of the doctor, and therefore can be used to detect abnormal series with good accuracy. A lower concordance rate is found between the doctor analysis and results identified by method 3, this might be explained by the fact that seasonality has not been considered by the doctor, but used in method 3.

Claims
  • 1. A method for monitoring a health event in a mammal, comprising detecting at least one abnormal value within a series of values related to at least one biomarker, said biomarker corresponding to a physiological parameter or to the level of any biological or chemical entity measured from a biological sample of said mammal, the series of values (x1, x2, . . . , xn) related to said at least one biomarker being obtained from independent variables (X1, X2, . . . , Xn) normally distributed; said method comprising the following steps of:E1: determining from each biological sample collected at different periods of time from said mammal, a value related to said at least one biomarker thereby acquiring a series of values (x1, x2, . . . , xn) related to said biomarker,E2: storing the series of values (x1, x2, . . . , xn) on a database (DB) stored in a memory (12);said method comprising the following steps run with a processor (14) that can retrieve data from the database (DB) in the memory:E3: calculating, for the whole series of values of step E2 a single value tn of an indicator (Tn), said indicator (Tn) being based on a studentized form (Rn,i) of the variables (X1, X2, . . . , Xn), said calculation consisting of extracting the maximum observed value (tn) of the studentized form (Rn,i) calculated for each value of the series of values (x1, x2, . . . , xn),E4: comparing the observed value (tn) of the indicator (Tn) to the quantile (cα,n) of the distribution of (Tn) said quantile being stored in the memory,E5: if the observed value (tn) of the indicator (Tn) is above the quantile, reporting, on displaying means (20), a presence of an abnormal value in the series thereby indicating the occurrence of a health event for said mammal; wherein, when searching for one series of samples which comprises an abnormal value:a single series x1, x2, . . . , xn of n values is stored on the database (DB),n is the number of variables,X1, X2, . . . , Xn represent the n variables,the studentized form is a studentized residual expressed as:
  • 2. The method according to claim 1, comprising a further step of: E6: identifying from the series of values reported in step E2 at least one value being considered as abnormal, andE7: reporting on displaying means (20) that at least one abnormal value.
  • 3. The method according to any of the preceding claims, further comprising a step ES of running a Shapiro test on the stored values (x1, x2, . . . , xn1) to check if the variables are normally distributed.
  • 4. The method according to any of the preceding claims, further comprising a step ET of applying a function to the values to turn them into normally distributed variables, such as a log transformation.
  • 5. The method according to any of the preceding claims, wherein the at least one biomarker represented by the series of values is chosen in the following list: ferritin, serum iron, hemoglobin, erythrocyte count, hematocrit levels, complete blood count, platelets, reticulocytes, soluble transferrin receptor, vitamin B9 in red blood cell, blood sugar, cholesterol, triglycerides, serum glutamic oxaloacetic transaminase (SGOT), serum glutamate pyruvate transaminase (SGPT), gamma-glutamyltransferase (γ-GT), lactate dehydrogenase (LDH), bilirubin, electrolytes (e.g. Na+, Cl−, K+, HCO3−, Ca2+, Mg2+), alkaline phosphatases Magnesium in red blood cells, creatinine, androstenedione, urea, uric acid, haptoglobin, C-reactive protein (CRP), transthyretin, orosomucoid, creatine phosphokinase (CPK), inorganic phosphate (PO4), thyroid-stimulating hormone (TSH), testosterone, cortisol, erythropoietin (EPO), ferritin, luteinizing hormone (LH), Insulin-like growth factor 1 (IGF-1), osteocalcin, calcifediol (25 OHD3).
  • 6. A program for computer comprising code lines to be executed by a processor (14), said code lines being configured to operate a method according to any of the preceding claims from its step E2 to E4.
  • 7. A system for monitoring a health event in a mammal, comprising detecting at least one abnormal value within a series of values related to at least one biomarker, said biomarker corresponding to a physiological parameter or to the level of any biological or chemical entity measured from a biological sample of said mammal, the system comprising: a database (DB) stored on a memory (12), the database comprising at least a series of values (x1, x2, . . . , xn) related to said at least one biomarker being obtained from independent variables (X1, X2, . . . , Xn) normally distributed according to N(μx,σ2);
  • 8. Use of a method according to any of claims 1 to 5, or a program according to claim 7, or a system according to claim 7, for detecting a potential doping issue.
  • 9. Use of a method according to any of claims 1 to 5, or a program according to claim 6, or a system according to claim 7, wherein monitoring a health event comprises detecting any potential health issue of the individual, in order to launch further investigations.
Priority Claims (1)
Number Date Country Kind
17306583.0 Nov 2017 EP regional
PCT Information
Filing Document Filing Date Country Kind
PCT/EP2018/081271 11/14/2018 WO 00