This application incorporates-by-reference nucleotide and/or amino acid sequences which are present in the file named “210706_91753_SequenceListing_DH.txt”, which is 4 kilobytes in size, and which was created Jul. 5, 2021 in the IBM-PC machine format, having an operating system compatibility with MS-Windows, which is contained in the text file filed Jul. 6, 2021 as part of this application.
The present disclosure relates to methods and systems for assessing the risk of a human subject developing a severe response to a coronavirus infection such as a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) viral infection.
In December 2019, there were a series of unexplained cases of pneumonia reported in Wuhan, China. On 12 Jan. 2020, the World Health Organization (WHO) tentatively named this new virus as the 2019 novel coronavirus (2019-nCoV). On 11 Feb. 2020, the WHO formally named the disease triggered by 2019-nCoV as coronavirus disease 2019 (COVID-19). The coronavirus study group of the International Committee on Taxonomy of Viruses named 2019-nCoV as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The WHO declared the virus a Public Health Emergency of International Concern on 30 Jan. 2020. The WHO eventually declared a pandemic on 11 Mar. 2020.
Like many complex diseases, there are a multitude of host factors that influence the severity of disease once infected with a virus. This means viral infections are complex multifactorial diseases like many cancers, cardiovascular disease and diabetes.
As global health systems try to manage resources and governments attempt to manage their respective economies there is a need to identify which people are at most risk of developing severe symptoms in response to the viral infection. Such a tool would enable earlier hospitalization and targeted treatments which may lead to the saving of lives. Of great importance to the economy, there is potential that lower risk individuals could be recommended to continue their normal employment given the lower risk of developing a life threatening disease should they contract a Coronavirus infection such as a SARS-Cov-2 viral infection.
The present inventors have found that a severe response to a Coronavirus infection risk model provides useful risk discrimination for assessing a subject's risk of developing a severe response to a Coronavirus infection such as a SARS-CoV-2 infection.
In an aspect, the present invention provides a method for assessing the risk of a human subject developing a severe response to a Coronavirus infection, the method comprising performing a genetic risk assessment of the human subject, wherein the genetic risk assessment involves detecting, in a biological sample derived from the human subject, the presence at least two polymorphisms associated with a severe response to a Coronavirus infection.
In an embodiment, the Coronavirus is an Alphacoronavirus, Betacoronavirus, Gammacoronavirus or an Deltacoronavirus.
In an embodiment, the Coronavirus is Alphacoronavirus 1, Human coronavirus 229E, Human coronavirus NL63, Miniopterus bat coronavirus 1, Miniopterus bat coronavirus HKU8, Porcine epidemic diarrhea virus, Rhinolophus bat coronavirus HKU2, Scotophilus bat coronavirus 512, Betacoronavirus 1 (Bovine Coronavirus, Human coronavirus OC43), Hedgehog coronavirus 1, Human coronavirus HKU1, Middle East respiratory syndrome-related coronavirus (MERS), Murine coronavirus, Pipistrellus bat coronavirus HKU5, Rousettus bat coronavirus HKU9, Severe acute respiratory syndrome-related coronavirus (SARS-CoV or SARS-CoV-2), Tylonycteris bat coronavirus HKU4, Avian coronavirus, Beluga whale coronavirus SW1, Bulbul coronavirus HKU11 or Porcine coronavirus HKU15.
In an embodiment, the Coronavirus is Severe acute respiratory syndrome-related coronavirus (SARS-CoV or SARS-CoV-2), Middle East respiratory syndrome-related coronavirus (MERS), Human coronavirus OC43, Human coronavirus HKU1, Human coronavirus 229E or Human coronavirus NL63.
In an embodiment, the Coronavirus is a Betacoronavirus.
In an embodiment, the Betacoronavirus is Severe acute respiratory syndrome-related coronavirus (SARS-CoV or SARS-CoV-2), Middle East respiratory syndrome-related coronavirus (MERS), Human coronavirus OC43 or Human coronavirus HKU1.
In an embodiment, the Coronavirus (Betacoronavirus) is Severe acute respiratory syndrome-related coronavirus (SARS-CoV or SARS-CoV-2), Middle East respiratory syndrome-related coronavirus (MERS), Human coronavirus OC43 or Human coronavirus HKU1.
In an embodiment, the Coronavirus (Betacoronavirus) is Severe acute respiratory syndrome-related coronavirus (SARS-CoV or SARS-CoV-2) or Middle East respiratory syndrome-related coronavirus (MERS).
In a preferred embodiment, the Coronavirus (Betacoronavirus) is Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
In an embodiment, the method comprises detecting the presence of at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 100, at least 120, at least 140, at least 160, at least 180, at least 200, at least 250, at least 300 or at least 306 polymorphisms associated with a severe response to a Coronavirus infection.
In an embodiment, the polymorphisms are selected from Tables 1 to 6, 8, 19 or 22 or a polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the method at least comprises detecting polymorphisms at one or more or all of rs10755709, rs112317747, rs112641600, rs118072448, rs2034831, rs7027911 and rs71481792, or a polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the method at least comprises detecting polymorphisms at one or more or all of rs10755709, rs112317747, rs112641600, rs115492982, rs118072448, rs1984162, rs2034831, rs7027911 and rs71481792, or a polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the polymorphisms are selected from Table 1, Table 6a, Table 6b or a polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the polymorphisms are selected from any one of Tables 1 to 6, 8, 19 or 22, or a polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the polymorphisms are selected from Table 3 or a polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, at least three polymorphisms are analysed.
In an embodiment, the method comprises, or consists of, detecting the presence of at least 60, or each, of the polymorphisms provided in Table 4 or a polymorphism in linkage disequilibrium with one or more thereof.
In another embodiment, the polymorphisms are selected from Table 2 or a polymorphism in linkage disequilibrium with one or more thereof.
In a further embodiment, the polymorphisms are selected from Table 3 and/or Table 8 or a polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the polymorphisms are selected from Table 3 or a polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the method comprises, or consists of, detecting the presence of each of the polymorphisms provided in Table 3 or a polymorphism in linkage disequilibrium with one or more thereof.
The genetic risk assessment may be combined with clinical risk factors to further improve the risk analysis. Thus, in an embodiment, the method further comprises
performing a clinical risk assessment of the human subject; and
combining the clinical risk assessment and the genetic risk assessment to obtain the risk of a human subject developing a severe response to a Coronavirus infection.
In an embodiment, the clinical risk assessment includes obtaining information from the subject on, but not necessarily limited to, one or more of the following: age, family history of a severe response to a Coronavirus infection, race/ethnicity, gender, body mass index, total cholesterol level, systolic and/or diastolic blood pressure, smoking status, does the human have diabetes, does the human have a cardiovascular disease, is the subject on hypertension medication, loss of taste, loss of smell and white blood cell count.
In another embodiment, the clinical risk assessment is based only on one or more or all of age, body mass index, loss of taste, loss of smell and smoking status.
In a further embodiment, the clinical risk assessment is based only on one or more or all of age, loss of taste, loss of smell and smoking status.
In an embodiment, the clinical risk assessment includes obtaining information from the subject on one or more or all of: age, gender, race/ethnicity, blood type, does the human have or has had an autoimmune disease, does the human have or has had an haematological cancer, does the human have or has had an non-haematological cancer, does the human have or has had diabetes, does the human have or has had hypertension and does the human have or has had a respiratory disease (other than asthma). In an embodiment, the autoimmune disease is rheumatoid arthritis, lupus or psoriasis.
In an embodiment, the clinical risk assessment includes obtaining information from the subject on one or more or all of: age, gender, race/ethnicity, blood type, height, weight, does the human have or has had an cerebrovascular disease, does the human have or has had a chronic kidney disease, does the human have or has had diabetes, does the human have or has had an haematological cancer, does the human have or has had hypertension, does the human have or has had an immunocompromised disease, does the human have or has had an haematological cancer, does the human have or has had liver disease, does the human have or has had an non-haematological cancer, and does the human have or has had a respiratory disease (other than asthma).
The skilled person would appreciate that numerous different procedures can be followed to combine the clinical and genetic risk assessments. In an embodiment, combining the clinical risk assessment and the genetic risk assessment comprises multiplying the risk assessments. In an embodiment, combining the clinical risk assessment and the genetic risk assessment comprises adding the risk assessments.
The inventors, for the first time, have identified numerous polymorphisms associated with a subject's risk of developing a severe response to a Coronavirus infection. Thus, in another aspect, the present invention provides a method for assessing the risk of a human subject developing a severe response to a Coronavirus, the method comprising detecting, in a biological sample derived from the human subject, the presence of a polymorphism provided in any one of Tables 1 to 6, 8, 19 or 22, or a polymorphism in linkage disequilibrium therewith.
In an embodiment, the polymorphism is provided in Table 19 and/or 22 or is a polymorphism in linkage disequilibrium therewith.
In an embodiment, the polymorphism is provided in Table 1 or Table 6a or is a polymorphism in linkage disequilibrium therewith.
In an embodiment, the polymorphism is provided in Table 3 or Table 6a or is a polymorphism in linkage disequilibrium therewith.
In an embodiment, the polymorphism is provided in Table 3, Table 6, is rs2274122, is rs1868132, is rs11729561, is rs1984162, is rs8105499 or is a polymorphism in linkage disequilibrium therewith.
In an embodiment, the polymorphism is provided in Table 3, is rs2274122, is rs1868132, is rs11729561, is rs1984162, is rs8105499 or is a polymorphism in linkage disequilibrium therewith.
In another aspect, the present invention provides a method of determining the identity of the alleles of fewer than 100,000 polymorphisms in a human subject selected from the group of subjects consisting of humans in need of assessment for the risk of developing a severe response to a Coronavirus infection to produce a polymorphic profile of the subject, comprising
(i) selecting for allelic identity analysis at least two polymorphisms provided in any one of Tables 1 to 6, 8, 19 or 22, or a polymorphism in linkage disequilibrium with one or more thereof,
(ii) detecting, in a biological sample derived from the human subject, the polymorphisms, and
(iii) producing the polymorphic profile of the subject screening based on the identity of the alleles analysed in step (ii), wherein fewer than 100,000 polymorphisms are selected for allelic identity analysis in step (i) and the same fewer than 100,000 polymorphisms are analysed in step (ii).
In an embodiment of the above aspect, fewer than 100,000 polymorphisms, fewer than 50,000 polymorphisms, fewer than 40,000 polymorphisms, fewer than 30,000 polymorphisms, fewer than 20,000 polymorphisms, fewer than 10,000 polymorphisms, fewer than 7,500 polymorphisms, fewer than 5,000 polymorphisms, fewer than 4,000 polymorphisms, fewer than 3,000 polymorphisms, fewer than 2,000 polymorphisms, fewer than 1,000 polymorphisms, fewer than 900 polymorphisms, fewer than 800 polymorphisms, fewer than 700 polymorphisms, fewer than 600 polymorphisms, fewer than 500 polymorphisms, fewer than 400 polymorphisms, fewer than 300 polymorphisms, fewer than 200 polymorphisms, or fewer than 100 polymorphisms, are selected for allelic identity.
In an embodiment of each of the above aspects, the human subject can be Caucasian, African American, Hispanic, Asian, Indian, or Latino. In a preferred embodiment, the human subject is Caucasian.
In an embodiment of each of the above aspects, the method further comprises obtaining the biological sample.
In an embodiment, the polymorphism(s) in linkage disequilibrium has linkage disequilibrium above 0.9. In another embodiment, the polymorphism(s) in linkage disequilibrium has linkage disequilibrium of 1.
The present inventors have also found that a severe response to a Coronavirus infection risk model that relies solely on clinical factors provides useful risk discrimination for assessing a subject's risk of developing a severe response to a Coronavirus infection such as a SARS-CoV-2 infection. Such a test may be particularly useful in circumstances where a rapid decision needs to be made and/or when genetic testing is not readily available. Thus, in another aspect the present invention provides a method for assessing the risk of a human subject developing a severe response to a Coronavirus infection, the method comprising performing a clinical risk assessment of the human subject, wherein the clinical risk assessment comprises obtaining information from the subject on two, three, four, five or more or all of age, gender, race/ethnicity, height, weight, blood type, does the human have or has had an cerebrovascular disease, does the human have or has had a chronic kidney disease, does the human have or has had an autoimmune disease, does the human have or has had an haematological cancer, does the human have or has had an immunocompromised disease, does the human have or has had an non-haematological cancer, does the human have or has had diabetes, does the human have or has had liver disease, does the human have or has had hypertension and does the human have or has had a respiratory disease (other than asthma).
In an embodiment, the method comprises obtaining information from the subject on age and gender.
In an embodiment, the method comprises obtaining information from the subject on age, gender, race/ethnicity, height, weight, does the human have or has had an cerebrovascular disease, does the human have or has had a chronic kidney disease, does the human have or has had diabetes, does the human have or has had an haematological cancer, does the human have or has had hypertension, does the human have or has had an non-haematological cancer, and does the human have or has had a respiratory disease (other than asthma).
In an embodiment, the method comprises obtaining information from the subject on age, gender, race/ethnicity, blood type, height, weight, does the human have or has had an cerebrovascular disease, does the human have or has had a chronic kidney disease, does the human have or has had diabetes, does the human have or has had an haematological cancer, does the human have or has had hypertension, does the human have or has had an immunocompromised disease, does the human have or has had an haematological cancer, does the human have or has had liver disease, does the human have or has had an non-haematological cancer, and does the human have or has had a respiratory disease (other than asthma).
In an embodiment, the method comprises obtaining information from the subject on one or more of all of age, gender, race/ethnicity, blood type, does the human have or has had an autoimmune disease, does the human have or has had an haematological cancer, does the human have or has had an non-haematological cancer, does the human have or has had diabetes, does the human have or has had hypertension and does the human have or has had a respiratory disease (other than asthma).
In another aspect, the present invention provides a method for assessing the risk of a human subject developing a severe response to a Coronavirus infection, the method comprising
i) performing a genetic risk assessment of the human subject, wherein the genetic risk assessment involves detecting, in a biological sample derived from the human subject, polymorphisms at rs10755709, rs112317747, rs112641600, rs118072448, rs2034831, rs7027911 and rs71481792,
ii) performing a clinical risk assessment of the human subject, wherein the clinical risk assessment comprises obtaining information from the subject on age, gender, race/ethnicity, height, weight, does the human have or has had an cerebrovascular disease, does the human have or has had a chronic kidney disease, does the human have or has had diabetes, does the human have or has had an haematological cancer, does the human have or has had hypertension, does the human have or has had an non-haematological cancer, and does the human have or has had a respiratory disease (other than asthma), and
iii) combining the genetic risk assessment with the clinical risk assessment to determine the risk of a human subject developing a severe response to a Coronavirus infection.
In an embodiment,
a) a β coefficient of 0.124239 is assigned for each G allele at rs10755709;
b) a β coefficient of 0.2737487 is assigned for each C allele at rs112317747;
c) a β coefficient of −0.2362513 is assigned for each T allele at rs112641600;
d) a β coefficient of −0.1995879 is assigned for each C allele at rs118072448;
e) a β coefficient of 0.2371955 is assigned for each C allele at rs2034831;
f) a β coefficient of 0.1019074 is assigned for each A allele at rs7027911; and
g) a β coefficient of −0.1058025 is assigned for each T allele at rs71481792.
In an embodiment, the subject is between 50 and 84 years of age and
a) a β coefficient of 0.5747727 is assigned if the subject is between 70 and 74 years of age;
b) a β coefficient of 0.8243711 is assigned if the subject is between 75 and 79 years of age;
c) a β coefficient of 1.013973 is assigned if the subject is between 80 and 84 years of age;
d) a β coefficient of 0.2444891 is assigned if the subject is male;
e) a β coefficient of 0.29311 is assigned if the subject is an ethnicity other than Caucasian;
f) the subjects height (in metres (m)) and weight (in kilograms (kg)) is applied to the formula: (10 times m2) divided by kg, which is multiplied by −1.602056 to provide the β coefficient to be assigned;
g) a β coefficient of 0.4041337 is assigned if the subject has ever been diagnosed as having a cerebrovascular disease;
h) a β coefficient of 0.6938494 is assigned if the subject has ever been diagnosed as having a chronic kidney disease;
i) a β coefficient of 0.4297612 is assigned if the subject has ever been diagnosed as having diabetes;
j) a β coefficient of 1.003877 is assigned if the subject has ever been diagnosed as having haematological cancer;
k) a β coefficient of 0.2922307 is assigned if the subject has ever been diagnosed as having hypertension;
l) a β coefficient of 0.2558464 is assigned if the subject has ever been diagnosed as having a non-haematological cancer; and
m) a β coefficient of 1.173753 is assigned if the subject has ever been diagnosed as having a respiratory disease (other than asthma).
In an embodiment, the subject is between 18 and 49 years of age and
a) a β coefficient of −1.3111 is assigned if the subject is between 18 and 29 years of age;
b) a β coefficient of −0.8348 is assigned if the subject is between 30 and 39 years of age;
c) a β coefficient of −0.4038 is assigned if the subject is between 40 and 49 years of age;
d) a β coefficient of 0.2444891 is assigned if the subject is male;
e) a β coefficient of 0.29311 is assigned if the subject is an ethnicity other than Caucasian;
f) the subjects height (in metres (m)) and weight (in kilograms (kg)) is applied to the formula: (10 times m2) divided by kg, which is multiplied by −1.602056 to provide the β coefficient to be assigned;
g) a β coefficient of 0.4041337 is assigned if the subject has ever been diagnosed as having a cerebrovascular disease;
h) a β coefficient of 0.6938494 is assigned if the subject has ever been diagnosed as having a chronic kidney disease;
i) a β coefficient of 0.4297612 is assigned if the subject has ever been diagnosed as having diabetes;
j) a β coefficient of 1.003877 is assigned if the subject has ever been diagnosed as having haematological cancer;
k) a β coefficient of 0.2922307 is assigned if the subject has ever been diagnosed as having hypertension;
l) a β coefficient of 0.2558464 is assigned if the subject has ever been diagnosed as having a non-haematological cancer; and
m) a β coefficient of 1.173753 is assigned if the subject has ever been diagnosed as having a respiratory disease (other than asthma).
In an embodiment, the subject is between 18 and 84 years of age and
a) a β coefficient of −1.3111 is assigned if the subject is between 18 and 29 years of age;
b) a β coefficient of −0.8348 is assigned if the subject is between 30 and 39 years of age;
c) a β coefficient of −0.4038 is assigned if the subject is between 40 and 49 years of age;
d) a β coefficient of 0.5747727 is assigned if the subject is between 70 and 74 years of age;
e) a β coefficient of 0.8243711 is assigned if the subject is between 75 and 79 years of age;
f) a β coefficient of 1.013973 is assigned if the subject is between 80 and 84 years of age;
g) a β coefficient of 0.2444891 is assigned if the subject is male;
h) a β coefficient of 0.29311 is assigned if the subject is an ethnicity other than Caucasian;
i) the subjects height (in metres (m)) and weight (in kilograms (kg)) is applied to the formula: (10 times m2) divided by kg, which is multiplied by −1.602056 to provide the β coefficient to be assigned;
j) a β coefficient of 0.4041337 is assigned if the subject has ever been diagnosed as having a cerebrovascular disease;
k) a β coefficient of 0.6938494 is assigned if the subject has ever been diagnosed as having a chronic kidney disease;
l) a β coefficient of 0.4297612 is assigned if the subject has ever been diagnosed as having diabetes;
m) a β coefficient of 1.003877 is assigned if the subject has ever been diagnosed as having haematological cancer;
n) a β coefficient of 0.2922307 is assigned if the subject has ever been diagnosed as having hypertension;
o) a β coefficient of 0.2558464 is assigned if the subject has ever been diagnosed as having a non-haematological cancer; and
p) a β coefficient of 1.173753 is assigned if the subject has ever been diagnosed as having a respiratory disease (other than asthma).
In an embodiment, in step iii) the genetic risk assessment is combined with the clinical risk assessment using the following formula:
Long Odds (LO)=−1.36523+SRF+Σ Clinical β coefficients, and wherein SRF is the SNP Risk Factor which is determined using the following formula:
Σ(No of Risk Alleles×SNP β coefficient).
In another aspect, the present invention provides a method for assessing the risk of a human subject developing a severe response to a Coronavirus infection, the method comprising
i) performing a genetic risk assessment of the human subject, wherein the genetic risk assessment involves detecting, in a biological sample derived from the human subject, polymorphisms at rs10755709, rs112317747, rs112641600, rs115492982, rs118072448, rs1984162, rs2034831, rs7027911 and rs71481792,
ii) performing a clinical risk assessment of the human subject, wherein the clinical risk assessment comprises obtaining information from the subject of age, gender, race/ethnicity, blood type, height, weight, does the human have or has had an cerebrovascular disease, does the human have or has had a chronic kidney disease, does the human have or has had diabetes, does the human have or has had an haematological cancer, does the human have or has had hypertension, does the human have or has had an immunocompromised disease, does the human have or has had an haematological cancer, does the human have or has had liver disease, does the human have or has had an non-haematological cancer, and does the human have or has had a respiratory disease (other than asthma), and
iii) combining the genetic risk assessment with the clinical risk assessment to determine the risk of a human subject developing a severe response to a Coronavirus infection.
In an embodiment,
a) a β coefficient of 0.1231766 is assigned for each G allele at rs10755709;
b) a β coefficient of 0.2576692 is assigned for each C allele at rs112317747;
c) a β coefficient of −0.2384001 is assigned for each T allele at rs112641600;
d) a β coefficient of −0.1965609 is assigned for each C allele at rs118072448;
e) a β coefficient of 0.2414792 is assigned for each C allele at rs2034831;
f) a β coefficient of 0.0998459 is assigned for each A allele at rs7027911;
g) a β coefficient of −0.1032044 is assigned for each T allele at rs71481792;
h) a β coefficient of 0.4163575 is assigned for each A allele at rs115492982; and
i) a β coefficient of 0.1034362 is assigned for each A allele at rs1984162.
In a further embodiment, the subject is between 50 and 84 years of age and
a) a β coefficient of 0.1677566 is assigned if the subject is between 65 and 69 years of age;
b) a β coefficient of 0.6352682 is assigned if the subject is between 70 and 74 years of age;
c) a β coefficient of 0.8940548 is assigned if the subject is between 75 and 79 years of age;
d) a β coefficient of 1.082477 is assigned if the subject is between 80 and 84 years of age;
e) a β coefficient of 0.2418454 is assigned if the subject is male;
f) a β coefficient of 0.2967777 is assigned if the subject is an ethnicity other than Caucasian;
g) the subjects height (in metres (m)) and weight (in kilograms (kg)) is applied to the formula: (10 times m2) divided by kg, which is multiplied by −1.560943 to provide the β coefficient to be assigned;
h) a β coefficient of 0.3950113 is assigned if the subject has ever been diagnosed as having a cerebrovascular disease;
i) a β coefficient of 0.6650257 is assigned if the subject has ever been diagnosed as having a chronic kidney disease;
j) a β coefficient of 0.4126633 is assigned if the subject has ever been diagnosed as having diabetes;
k) a β coefficient of 1.001079 is assigned if the subject has ever been diagnosed as having haematological cancer;
l) a β coefficient of 0.2640989 is assigned if the subject has ever been diagnosed as having hypertension;
m) a β coefficient of 0.2381579 is assigned if the subject has ever been diagnosed as having a non-haematological cancer;
n) a β coefficient of 1.148496 is assigned if the subject has ever been diagnosed as having a respiratory disease (other than asthma);
o) a β coefficient of −0.229737 is assigned if the subject has an ABO blood type;
p) a β coefficient of 0.6033541 is assigned if the subject has ever been diagnosed as having a immunocompromised disease;
q) a β coefficient of 0.2301902 is assigned if the subject has ever been diagnosed as having liver disease.
In an embodiment, in step iii) the genetic risk assessment is combined with the clinical risk assessment using the following formula:
Long Odds (LO)=1.469939+SRF+Σ Clinical β coefficients,
and wherein SRF is the SNP Risk Factor which is determined using the following formula:
Σ(No of Risk Alleles×SNP β coefficient).
In an embodiment, a method of the invention further comprises determining the probability the subject would require hospitalisation if infected with a Coronavirus using the following formula:
e
LO/(1+eLO),
which is then multiplied by 100 to obtain a percent chance of hospitalisation being required.
In an embodiment of each of the above aspects, the risk assessment produces a score and the method further comprises comparing the score to a predetermined threshold, wherein if the score is at, or above, the threshold the subject is assessed at being at risk of developing a severe response to a Coronavirus infection.
In an embodiment, if it is determined the subject has a risk of developing a severe response to a Coronavirus infection, the subject is more likely than someone assessed as low risk, or when compared to the average risk in the population, to be admitted to hospital for intensive care.
In a further aspect, the present invention provides a method for determining the need for routine diagnostic testing of a human subject for a Coronavirus infection comprising assessing the risk of the subject for developing a severe response to a Coronavirus infection using a method of the invention.
In another aspect, the present invention provides a method of screening for a severe response to a Coronavirus infection in a human subject, the method comprising assessing the risk of the subject for developing a severe response to a Coronavirus infection using a method of the invention, and routinely screening for a Coronavirus infection in the subject if they are assessed as having a risk for developing a severe response to a Coronavirus infection.
In an embodiment of the above two aspects, the screening involves analysing the subject for the virus or a symptom thereof.
In a further aspect, the present invention provides a method for determining the need of a human subject for prophylactic anti-Coronavirus therapy comprising assessing the risk of the subject for developing a severe response to a Coronavirus infection using a method of the invention.
In yet another aspect, the present invention provides a method for preventing or reducing the risk of a severe response to a Coronavirus infection in a human subject, the method comprising assessing the risk of the subject for developing a severe response to a Coronavirus infection using a method of the invention, and if they are assessed as having a risk for developing a severe response to a Coronavirus infection
1) administering an anti-Coronavirus therapy and/or
2) isolating the subject.
In an aspect, the present invention provides an anti-Coronavirus infection therapy for use in preventing a severe response to a Coronavirus infection in a human subject at risk thereof, wherein the subject is assessed as having a risk for developing a severe response to a Coronavirus infection using a method of the invention.
Many anti-Coronavirus therapies, such as anti-SARS-CoV-2 virus therapies, are in development. The skilled person would appreciate that any therapy shown to be successful can be used in the above methods. Possible examples include, but are not limited to, intubation to assist breathing, an anti-Coronavirus—such as anti-SARS-CoV-2 virus—vaccine, convalescent plasma (plasma from people who have been infected, developed antibodies to the virus, and have then recovered), chloroquine, hydroxychloroquine (with or without zinc), Favipiravir, Remdesivir, Ivermectin, Quercetin, Kaletra (lopinavir/ritonavir), Arbidol, Baricitinib, CM4620-IE, an IL-6 inhibitor, Tocilizumab and stem cells such as mesenchymal stem cells. In another embodiment, the therapy is Vitamin D. Other examples of therapy include, Dexamethasone (or other corticosteroids such as prednisone, methylprednisolone, or hydrocortisone), Baricitinib in combination with remdesivir, anticoagulation drugs (“blood thinners”), bamlanivimab and etesevimab, convalescent plasma, tocilizumab with corticosteroids, Casirivimab and Imdevimab, Atorvastatin, GRP78 and siRNA-nanoparticle formulations.
Once a vaccine (or indeed possibly many different anti-Coronavirus therapies) is developed it is highly likely there will be supply issues and decisions will need to be made about why one person will receive the vaccine first when compared to another person. The present invention can thus be used to determine who is at most risk, and the anti-Coronavirus therapy (such as a vaccine) first administered to people assessed as likely to develop a severe response to a Coronavirus infection.
In an embodiment, the vaccine is an mRNA vaccine. In an embodiment, the vaccine is a protein vaccine. Examples of vaccines that can be administered include, but are not limited to, the Pfizer-BioNTech vaccine, the Moderna vaccine, the Johnson & Johnson vaccine, the Oxford-AstraZeneca vaccine and the Novavax vaccine (see, for example, Katella, 2021).
In another embodiment, the present invention provides a method for stratifying a group of human subjects for a clinical trial of a candidate therapy, the method comprising assessing the individual risk of the subjects for developing a severe response to a Coronavirus infection using a method of the invention, and using the results of the assessment to select subjects more likely to be responsive to the therapy.
Also provided is a kit comprising at least two sets of primers for amplifying two or more nucleic acids, wherein the two or more nucleic acids comprise a polymorphism selected from any one of Tables 1 to 6, 8, 19 or 22, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises sets of primers for amplifying nucleic acids comprising each of the polymorphisms provided in Table 4, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In another aspect, the present invention provides a genetic array comprising at least two sets of probes for hybridising to two or more nucleic acids, wherein the two or more nucleic acids comprise a polymorphism selected from any one of Tables 1 to 6, 8, 19 or 22, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the array comprises probes hybridising to nucleic acids comprising each of the polymorphisms provided in Table 4, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an aspect, the present invention provides a computer implemented method for assessing the risk of a human subject developing a severe response to a Coronavirus infection, the method operable in a computing system comprising a processor and a memory, the method comprising:
receiving genetic risk data for the human subject, wherein the genetic risk data was obtained by a method of the invention;
processing the data to obtain the risk of a human subject developing a severe response to a Coronavirus infection; and
outputting the risk of a human subject developing a severe response to a Coronavirus infection.
In an aspect, the present invention provides a computer implemented method for assessing the risk of a human subject developing a severe response to a Coronavirus infection, the method operable in a computing system comprising a processor and a memory, the method comprising:
receiving clinical risk data and genetic risk data for the human subject, wherein the clinical risk data and genetic risk data were obtained by a method of the invention;
processing the data to combine the clinical risk data with the genetic risk data to obtain the risk of a human subject developing a severe response to a Coronavirus infection; and
outputting the risk of a human subject developing a severe response to a Coronavirus infection.
In a further aspect, the present invention provides a computer-implemented method for assessing the risk of a human subject developing a severe response to a Coronavirus infection, the method operable in a computing system comprising a processor and a memory, the method comprising:
receiving at least one clinical variable associated with the human subject, wherein at least one clinical variable was obtained by a method of the invention;
processing the data to obtain the risk of a human subject developing a severe response to a Coronavirus infection; and
outputting the risk of a human subject developing a severe response to a Coronavirus infection.
In an embodiment of the three above aspects, processing the data is performed using a risk assessment model, where the risk assessment model has been trained using a training dataset comprising data relating to Coronavirus infection response severity and the genetic data and/or clinical data. In another embodiment, the method further comprises displaying or communicating the risk to a user.
In an aspect, the present invention provides a system for assessing the risk of a human subject developing a severe response to a Coronavirus infection comprising:
system instructions for performing a genetic risk assessment of the human subject according to a method of the invention; and
system instructions to obtain the risk of a human subject developing a severe response to a Coronavirus infection.
In an aspect, the present invention provides a system for assessing the risk of a human subject developing a severe response to a Coronavirus infection comprising:
system instructions for performing a clinical risk assessment and a genetic risk assessment of the human subject according to a method of the invention; and
system instructions for combining the clinical risk assessment and the genetic risk assessment to obtain the risk of a human subject developing a severe response to a Coronavirus infection.
In an aspect, the present invention provides a system for assessing the risk of a human subject developing a severe response to a Coronavirus infection comprising:
system instructions for performing a clinical risk assessment of the human subject using the method according to any one of claims 20 to 26 or 36 to 39; and
system instructions to obtain the risk of a human subject developing a severe response to a Coronavirus infection.
In an embodiment, the risk data for the subject is received from a user interface coupled to the computing system. In another embodiment, the risk data for the subject is received from a remote device across a wireless communications network. In another embodiment, the user interface or remote device is a SNP array platform. In another embodiment, outputting comprises outputting information to a user interface coupled to the computing system. In another embodiment, outputting comprises transmitting information to a remote device across a wireless communications network.
Any embodiment herein shall be taken to apply mutatis mutandis to any other embodiment unless specifically stated otherwise.
The present invention is not to be limited in scope by the specific embodiments described herein, which are intended for the purpose of exemplification only.
Functionally-equivalent products, compositions and methods are clearly within the scope of the invention, as described herein.
Throughout this specification, unless specifically stated otherwise or the context requires otherwise, reference to a single step, composition of matter, group of steps or group of compositions of matter shall be taken to encompass one and a plurality (i.e. one or more) of those steps, compositions of matter, groups of steps or group of compositions of matter.
The invention is hereinafter described by way of the following non-limiting Examples and with reference to the accompanying figures.
Unless specifically defined otherwise, all technical and scientific terms used herein shall be taken to have the same meaning as commonly understood by one of ordinary skill in the art (e.g., epidemiological analysis, molecular genetics, risk assessment and clinical studies).
Unless otherwise indicated, the recombinant protein, cell culture, and immunological techniques utilized in the present invention are standard procedures, well known to those skilled in the art. Such techniques are described and explained throughout the literature in sources such as, J. Perbal, A Practical Guide to Molecular Cloning, John Wiley and Sons (1984), J. Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbour Laboratory Press (1989), T. A. Brown (editor), Essential Molecular Biology: A Practical Approach, Volumes 1 and 2, IRL Press (1991), D. M. Glover and B. D. Hames (editors), DNA Cloning: A Practical Approach, Volumes 1-4, IRL Press (1995 and 1996), and F. M. Ausubel et al. (editors), Current Protocols in Molecular Biology, Greene Pub. Associates and Wiley-Interscience (1988, including all updates until present), Ed Harlow and David Lane (editors) Antibodies: A Laboratory Manual, Cold Spring Harbour Laboratory, (1988), and J. E. Coligan et al. (editors) Current Protocols in Immunology, John Wiley & Sons (including all updates until present).
It is to be understood that this disclosure is not limited to particular embodiments, which can, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used in this specification and the appended claims, terms in the singular and the singular forms “a,” “an” and “the,” for example, optionally include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to “a probe” optionally includes a plurality of probe molecules; similarly, depending on the context, use of the term “a nucleic acid” optionally includes, as a practical matter, many copies of that nucleic acid molecule.
The term “and/or”, e.g., “X and/or Y” shall be understood to mean either “X and Y” or “X or Y” and shall be taken to provide explicit support for both meanings or for either meaning.
As used herein, the term “about”, unless stated to the contrary, refers to +/−10%, more preferably +/−5%, more preferably +/−1%, of the designated value.
Throughout this specification the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.
“Coronavirus” is a group of related RNA viruses that typically cause diseases in mammals and birds, such as respiratory tract infections in humans. Coronaviruses constitute the subfamily Orthocoronavirinae in the family Coronaviridae. Coronaviruses are enveloped viruses with a positive-sense single-stranded RNA genome and a nucleocapsid of helical symmetry. Coronaviruses have characteristic club-shaped spikes that project from their surface. Examples of Coronaviruses which cause disease in humans include, but are not necessarily limited to, Severe acute respiratory syndrome-related coronavirus (SARS-CoV or SARS-CoV-2), Middle East respiratory syndrome-related coronavirus (MERS), Human coronavirus OC43, Human coronavirus HKU1, Human coronavirus 229E and Human coronavirus NL63. In some embodiments, the SARS-CoV-2 the strain is selected from, but not limited to, the L strain, the S strain, the V strain, the G strain, the GR strain, the GH strain, hCoV-19/Australia/VIC01/2020, BetaCoV/Wuhan/WIV04/2019, B.1.1.7 variant, B.1.351 variant, B.1.427 variant, B.1.429 variant and P.1 variant.
As used herein, “risk assessment” refers to a process by which a subject's risk of developing a severe response to a Coronavirus infection can be assessed. A risk assessment will typically involve obtaining information relevant to the subject's risk of developing a severe response to a Coronavirus infection, assessing that information, and quantifying the subject's risk of developing a severe response to a Coronavirus infection, for example, by producing a risk score.
As used herein, the term “a severe response to a Coronavirus infection” encompasses any factor, or a symptom thereof, considered by a medical practitioner that would warrant the subject being hospitalised, the subject's life being at risk, or the subject requiring assistance to breath. Examples of symptoms of a severe response to a Coronavirus infection include, but are not limited to, difficulty breathing or shortness of breath, chest pain or pressure, loss of speech or loss of movement. A phenotype that displays a predisposition for a severe response to a Coronavirus infection, can, for example, show a higher likelihood that a severe response to a Coronavirus infection will develop in an individual with the phenotype than in members of a relevant general population under a given set of environmental conditions (diet, physical activity regime, geographic location, etc.).
As used herein, “biological sample” refers to any sample comprising nucleic acids, especially DNA, from or derived from a human patient, e.g., bodily fluids (blood, saliva, urine etc.), biopsy, tissue, and/or waste from the patient. Thus, tissue biopsies, stool, sputum, saliva, blood, lymph, or the like can easily be screened for polymorphisms, as can essentially any tissue of interest that contains the appropriate nucleic acids. In one embodiment, the biological sample is a cheek cell sample. These samples are typically taken, following informed consent, from a patient by standard medical laboratory methods. The sample may be in a form taken directly from the patient, or may be at least partially processed (purified) to remove at least some non-nucleic acid material.
As used herein, “gender” and “sex” are used interchangeably and refer to an individual's biological reproductive anatomy. In an embodiment, an individual's gender/sex is self-identified.
As used herein, “human subject”, “human” and subject” are used interchangeably and refer to the individual being assessed for risk of developing a severe response to a coronavirus infection.
A “polymorphism” is a locus that is variable; that is, within a population, the nucleotide sequence at a polymorphism has more than one version or allele. One example of a polymorphism is a “single nucleotide polymorphism” (SNP), which is a polymorphism at a single nucleotide position in a genome (the nucleotide at the specified position varies between individuals or populations). Other examples include a deletion or insertion of one or more base pairs at the polymorphism locus.
As used herein, the term “SNP” or “single nucleotide polymorphism” refers to a genetic variation between individuals; e.g., a single nitrogenous base position in the DNA of organisms that is variable. As used herein, “SNPs” is the plural of SNP. Of course, when one refers to DNA herein, such reference may include derivatives of the DNA such as amplicons, RNA transcripts thereof, etc.
The term “allele” refers to one of two or more different nucleotide sequences that occur or are encoded at a specific locus, or two or more different polypeptide sequences encoded by such a locus. For example, a first allele can occur on one chromosome, while a second allele occurs on a second homologous chromosome, e.g., as occurs for different chromosomes of a heterozygous individual, or between different homozygous or heterozygous individuals in a population. An allele “positively” correlates with a trait when it is linked to it and when presence of the allele is an indicator that the trait or trait form will occur in an individual comprising the allele. An allele “negatively” correlates with a trait when it is linked to it and when presence of the allele is an indicator that a trait or trait form will not occur in an individual comprising the allele.
A marker polymorphism or allele is “correlated” or “associated” with a specified phenotype (a severe response to a Coronavirus infection susceptibility, etc.) when it can be statistically linked (positively or negatively) to the phenotype. Methods for determining whether a polymorphism or allele is statistically linked are known to those in the art. That is, the specified polymorphism occurs more commonly in a case population (e.g., a severe response to a Coronavirus infection patients) than in a control population (e.g., individuals that do not have a severe response to a Coronavirus infection). This correlation is often inferred as being causal in nature, but it need not be, simple genetic linkage to (association with) a locus for a trait that underlies the phenotype is sufficient for correlation/association to occur.
The phrase “linkage disequilibrium” (LD) is used to describe the statistical correlation between two neighbouring polymorphic genotypes. Typically, LD refers to the correlation between the alleles of a random gamete at the two loci, assuming Hardy-Weinberg equilibrium (statistical independence) between gametes. LD is quantified with either Lewontin's parameter of association (D′) or with Pearson correlation coefficient (r) (Devlin and Risch, 1995). Two loci with a LD value of 1 are said to be in complete LD. At the other extreme, two loci with a LD value of 0 are termed to be in linkage equilibrium. Linkage disequilibrium is calculated following the application of the expectation maximization algorithm (EM) for the estimation of haplotype frequencies (Slatkin and Excoffier, 1996). LD (r2) values according to the present disclosure for neighbouring genotypes/loci are selected above 0.1, preferably, above 0.2, more preferable above 0.5, more preferably, above 0.6, still more preferably, above 0.7, preferably, above 0.8, more preferably above 0.9, ideally about 1.0.
Another way one of skill in the art can readily identify polymorphisms in linkage disequilibrium with the polymorphisms of the present disclosure is determining the LOD score for two loci. LOD stands for “logarithm of the odds”, a statistical estimate of whether two genes, or a gene and a disease gene, are likely to be located near each other on a chromosome and are therefore likely to be inherited. A LOD score of between about 2-3 or higher is generally understood to mean that two genes are located close to each other on the chromosome. Various examples of polymorphisms in linkage disequilibrium with the polymorphisms of the present disclosure are shown in Tables 1 to 6, 8, 19 or 22. The present inventors have found that many of the polymorphisms in linkage disequilibrium with the polymorphisms of the present disclosure have a LOD score of between about 2-50. Accordingly, in an embodiment, LOD values according to the present disclosure for neighbouring genotypes/loci are selected at least above 2, at least above 3, at least above 4, at least above 5, at least above 6, at least above 7, at least above 8, at least above 9, at least above 10, at least above 20 at least above 30, at least above 40, at least above 50.
In another embodiment, polymorphisms in linkage disequilibrium with the polymorphisms of the present disclosure can have a specified genetic recombination distance of less than or equal to about 20 centimorgan (cM) or less. For example, 15 cM or less, 10 cM or less, 9 cM or less, 8 cM or less, 7 cM or less, 6 cM or less, 5 cM or less, 4 cM or less, 3 cM or less, 2 cM or less, 1 cM or less, 0.75 cM or less, 0.5 cM or less, 0.25 cM or less, or 0.1 cM or less. For example, two linked loci within a single chromosome segment can undergo recombination during meiosis with each other at a frequency of less than or equal to about 20%, about 19%, about 18%, about 17%, about 16%, about 15%, about 14%, about 13%, about 12%, about 11%, about 10%, about 9%, about 8%, about 7%, about 6%, about 5%, about 4%, about 3%, about 2%, about 1%, about 0.75%, about 0.5%, about 0.25%, or about 0.1% or less.
In another embodiment, polymorphisms in linkage disequilibrium with the polymorphisms of the present disclosure are within at least 100 kb (which correlates in humans to about 0.1 cM, depending on local recombination rate), at least 50 kb, at least kb or less of each other.
For example, one approach for the identification of surrogate markers for a particular polymorphism involves a simple strategy that presumes that polymorphisms surrounding the target polymorphism are in linkage disequilibrium and can therefore provide information about disease susceptibility. Thus, as described herein, surrogate markers can therefore be identified from publicly available databases, such as HAPMAP, by searching for polymorphisms fulfilling certain criteria which have been found in the scientific community to be suitable for the selection of surrogate marker candidates (see, for example, Table 6a which provides surrogates of the polymorphisms in Table 3, and Table 6b which provides surrogates of the polymorphisms in Table 4).
“Allele frequency” refers to the frequency (proportion or percentage) at which an allele is present at a locus within an individual, within a line or within a population of lines. For example, for an allele “A,” diploid individuals of genotype “AA,” “Aa,” or “aa” have allele frequencies of 1.0, 0.5, or 0.0, respectively. One can estimate the allele frequency within a line or population (e.g., cases or controls) by averaging the allele frequencies of a sample of individuals from that line or population. Similarly, one can calculate the allele frequency within a population of lines by averaging the allele frequencies of lines that make up the population. In an embodiment, the term “allele frequency” is used to define the minor allele frequency (MAF). MAF refers to the frequency at which the least common allele occurs in a given population.
An individual is “homozygous” if the individual has only one type of allele at a given locus (e.g., a diploid individual has a copy of the same allele at a locus for each of two homologous chromosomes). An individual is “heterozygous” if more than one allele type is present at a given locus (e.g., a diploid individual with one copy each of two different alleles). The term “homogeneity” indicates that members of a group have the same genotype at one or more specific loci. In contrast, the term “heterogeneity” is used to indicate that individuals within the group differ in genotype at one or more specific loci.
A “locus” is a chromosomal position or region. For example, a polymorphic locus is a position or region where a polymorphic nucleic acid, trait determinant, gene or marker is located. In a further example, a “gene locus” is a specific chromosome location (region) in the genome of a species where a specific gene can be found.
A “marker,” “molecular marker” or “marker nucleic acid” refers to a nucleotide sequence or encoded product thereof (e.g., a protein) used as a point of reference when identifying a locus or a linked locus. A marker can be derived from genomic nucleotide sequence or from expressed nucleotide sequences (e.g., from an RNA, nRNA, mRNA, a cDNA, etc.), or from an encoded polypeptide. The term also refers to nucleic acid sequences complementary to or flanking the marker sequences, such as nucleic acids used as probes or primer pairs capable of amplifying the marker sequence. A “marker probe” is a nucleic acid sequence or molecule that can be used to identify the presence of a marker locus, e.g., a nucleic acid probe that is complementary to a marker locus sequence. Nucleic acids are “complementary” when they specifically hybridize in solution, e.g., according to Watson-Crick base pairing rules. A “marker locus” is a locus that can be used to track the presence of a second linked locus, e.g., a linked or correlated locus that encodes or contributes to the population variation of a phenotypic trait. For example, a marker locus can be used to monitor segregation of alleles at a locus, such as a quantitative trait locus (QTL), that are genetically or physically linked to the marker locus. Thus, a “marker allele,” alternatively an “allele of a marker locus” is one of a plurality of polymorphic nucleotide sequences found at a marker locus in a population that is polymorphic for the marker locus. Each of the identified markers is expected to be in close physical and genetic proximity (resulting in physical and/or genetic linkage) to a genetic element, e.g., a QTL, that contributes to the relevant phenotype. Markers corresponding to genetic polymorphisms between members of a population can be detected by methods well-established in the art. These include, e.g., DNA sequencing, PCR-based sequence specific amplification methods, detection of restriction fragment length polymorphisms (RFLP), detection of isozyme markers, detection of allele specific hybridization (ASH), detection of single nucleotide extension, detection of amplified variable sequences of the genome, detection of self-sustained sequence replication, detection of simple sequence repeats (SSRs), detection of single nucleotide polymorphisms (SNPs), or detection of amplified fragment length polymorphisms (AFLPs).
The term “amplifying” in the context of nucleic acid amplification is any process whereby additional copies of a selected nucleic acid (or a transcribed form thereof) are produced. Typical amplification methods include various polymerase based replication methods, including the polymerase chain reaction (PCR), ligase mediated methods such as the ligase chain reaction (LCR) and RNA polymerase based amplification (e.g., by transcription) methods.
An “amplicon” is an amplified nucleic acid, e.g., a nucleic acid that is produced by amplifying a template nucleic acid by any available amplification method (e.g., PCR, LCR, transcription, or the like).
A “gene” is one or more sequence(s) of nucleotides in a genome that together encode one or more expressed molecules, e.g., an RNA, or polypeptide. The gene can include coding sequences that are transcribed into RNA which may then be translated into a polypeptide sequence, and can include associated structural or regulatory sequences that aid in replication or expression of the gene.
A “genotype” is the genetic constitution of an individual (or group of individuals) at one or more genetic loci. Genotype is defined by the allele(s) of one or more known loci of the individual, typically, the compilation of alleles inherited from its parents.
A “haplotype” is the genotype of an individual at a plurality of genetic loci on a single DNA strand. Typically, the genetic loci described by a haplotype are physically and genetically linked, i.e., on the same chromosome strand.
A “set” of markers (polymorphisms), probes or primers refers to a collection or group of markers probes, primers, or the data derived therefrom, used for a common purpose, e.g., identifying an individual with a specified genotype (e.g., risk of developing a severe response to a Coronavirus infection). Frequently, data corresponding to the markers, probes or primers, or derived from their use, is stored in an electronic medium. While each of the members of a set possess utility with respect to the specified purpose, individual markers selected from the set as well as subsets including some, but not all of the markers, are also effective in achieving the specified purpose.
The polymorphisms and genes, and corresponding marker probes, amplicons or primers described above can be embodied in any system herein, either in the form of physical nucleic acids, or in the form of system instructions that include sequence information for the nucleic acids. For example, the system can include primers or amplicons corresponding to (or that amplify a portion of) a gene or polymorphism described herein. As in the methods above, the set of marker probes or primers optionally detects a plurality of polymorphisms in a plurality of said genes or genetic loci. Thus, for example, the set of marker probes or primers detects at least one polymorphism in each of these polymorphisms or genes, or any other polymorphism, gene or locus defined herein. Any such probe or primer can include a nucleotide sequence of any such polymorphism or gene, or a complementary nucleic acid thereof, or a transcribed product thereof (e.g., a nRNA or mRNA form produced from a genomic sequence, e.g., by transcription or splicing).
As used herein, “Receiver operating characteristic curves” (ROC) refer to a graphical plot of the sensitivity vs. (1—specificity) for a binary classifier system as its discrimination threshold is varied. The ROC can also be represented equivalently by plotting the fraction of true positives (TPR=true positive rate) vs. the fraction of false positives (FPR=false positive rate). Also known as a Relative Operating Characteristic curve, because it is a comparison of two operating characteristics (TPR & FPR) as the criterion changes. ROC analysis provides tools to select possibly optimal models and to discard suboptimal ones independently from (and prior to specifying) the cost context or the class distribution. Methods of using in the context of the disclosure will be clear to those skilled in the art.
As used herein, the phrase “combining the first clinical risk assessment and the genetic risk assessment” refers to any suitable mathematical analysis relying on the results of the assessments. For example, the results of the first clinical risk assessment and the genetic risk assessment may be added, more preferably multiplied.
As used herein, the terms “routinely screening for a severe response to a Coronavirus infection” and “more frequent screening” are relative terms, and are based on a comparison to the level of screening recommended to a subject who has no identified risk of developing a severe response to a Coronavirus infection.
In an aspect, a method for assessing the risk of a human subject developing a severe response to a Coronavirus infection of the invention involves detecting the presence of a polymorphism provided in any one of Tables 1 to 3, 5a or 6, or Tables 1 to 6, 8, 19 or 22, or a polymorphism in linkage disequilibrium therewith. In another aspect, a method of the invention involves a genetic risk assessment performed by analysing the genotype of the subject at two or more loci for polymorphisms associated with a severe response to a Coronavirus infection. Various exemplary polymorphisms associated with a severe response to a Coronavirus infection are discussed in the present disclosure. These polymorphisms vary in terms of penetrance and many would be understood by those of skill in the art to be low penetrance polymorphisms.
The term “penetrance” is used in the context of the present disclosure to refer to the frequency at which a particular polymorphism manifests itself within human subjects with a severe response to a Coronavirus infection. “High penetrance” polymorphisms will almost always be apparent in a human subject with a severe response to a Coronavirus infection while “low penetrance” polymorphisms will only sometimes be apparent. In an embodiment polymorphisms assessed as part of a genetic risk assessment according to the present disclosure are low penetrance polymorphisms. As the skilled addressee will appreciate, each polymorphism which increases the risk of developing a severe response to a Coronavirus infection has an odds ratio of association with a severe response to a Coronavirus infection of greater than 1.0. In an embodiment, the odds ratio is greater than 1.02. Each polymorphism which decreases the risk of developing a severe response to a Coronavirus infection has an odds ratio of association with a severe response to a Coronavirus infection of less than 1.0. In an embodiment, the odds ratio is less than 0.98. Examples of such polymorphisms include, but are not limited to, those provided in Tables 1 to 3, 5a or 6, or Tables 1 to 6, 8, 19 or 22, or a polymorphism in linkage disequilibrium with one or more thereof. In an embodiment, the genetic risk assessment involves assessing polymorphisms associated with increased risk of developing a severe response to a Coronavirus infection. In another embodiment, the genetic risk assessment involves assessing polymorphisms associated with decreased risk of developing a severe response to a Coronavirus infection. In another embodiment, the genetic risk assessment involves assessing polymorphisms associated with an increased risk of developing a severe response to a Coronavirus infection and polymorphisms associated with a decreased risk of developing a severe response to a Coronavirus infection.
In an embodiment, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 100, at least 120, at least 140, at least 160, at least 180, at least 200, at least 250, at least 300 or at least 306 polymorphisms associated with a severe response to a Coronavirus infection are analysed.
In an embodiment, the at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 100, at least 120, at least 140, at least 160, at least 180, at least 200, at least 250, at least 300 or at least 306 polymorphisms associated with a severe response to a Coronavirus infection are selected from the polymorphisms provided in Tables 1 to 3, 5a or 6, Tables 1 to 6, 8, 19 or 22, or a polymorphism in linkage disequilibrium with one or more thereof
G, A
A, G
T, C
T, C
C, T
G, T
T, G
A, G
G, C
T, C
C, G
G, C
T, C
A, G
G, A
C, T
T, C
G, A
G, A
T, G
G, T
T, C
C, T
A, G
T, A
T, A
G, T
C, T
T, C
A, G
G, A
G, T
C, G
A, G
A, G
C, T
G, A
G, C
T, C
In an embodiment, the at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 100, at least 120, at least 140, at least 160, at least 180, at least 200, at least 250, at least 300 or at least 306 polymorphisms associated with a severe response to a Coronavirus infection are selected from the polymorphisms provided Table 1 and Table 6a or a polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 100, at least 120, at least 140, at least 160, at least 180, at least 200, at least 250, at least 300 polymorphisms or at least 306 associated with a severe response to a Coronavirus infection are selected from the polymorphisms provided Table 1 or a polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 100, at least 120, at least 140, at least 160, at least 180, at least 200, at least 250, at least 300 or at least 306 polymorphisms associated with a severe response to a Coronavirus infection are selected from the polymorphisms provided Table 2 and Table 6a or a polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40 or at least 50 polymorphisms associated with a severe response to a Coronavirus infection are selected from polymorphisms provided in Table 2 or Table 6a or a polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40 or at least 50 polymorphisms associated with a severe response to a Coronavirus infection are selected from polymorphisms provided in Table 2 or a polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40 or at least 50 polymorphisms associated with a severe response to a Coronavirus infection are selected from polymorphisms provided in Table 3 or a polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50 or at least 60 polymorphisms associated with a severe response to a Coronavirus infection are selected from polymorphisms provided in Table 4 or a polymorphism in linkage disequilibrium with one or more thereof.
In embodiment, the method of the invention involves detecting the presence of each of the polymorphisms provided in Table 2 or a polymorphism in linkage disequilibrium with one or more thereof.
In embodiment, the method of the invention involves detecting the presence of each of the polymorphisms provided in Table 3 or a polymorphism in linkage disequilibrium with one or more thereof.
In embodiment, the method of the invention involves detecting the presence of each of the polymorphisms provided in Table 4 or a polymorphism in linkage disequilibrium with one or more thereof.
In embodiment, the method of the invention involves detecting the presence of each of the polymorphisms provided in Table 19 or a polymorphism in linkage disequilibrium with one or more thereof.
In embodiment, the method of the invention involves detecting the presence of each of the polymorphisms provided in Table 22 or a polymorphism in linkage disequilibrium with one or more thereof.
Polymorphisms in linkage disequilibrium with those specifically mentioned herein are easily identified by those of skill in the art. Table 6a provides examples of linked loci for the polymorphisms listed in Table 3. Table 6b provides examples of linked loci for the polymorphisms listed in Table 4 which are not provided in Table 6a. Such linked polymorphisms for the other polymorphisms listed in Table 1 can very easily be identified by the skilled person using the HAPMAP database.
Where relevant in each Table, the A1 or Allele 1 is the risk (minor allele) associated allele. The risk allele may be associated with a decreased or increased risk as described herein. As used herein, the terms “A1” and “Allele 1” are used interchangeably. As used herein, the terms “A2” and “Allele 2” are used interchangeably.
In an embodiment, if the method includes the analysis of rs11385942 and/or rs657152 the method further comprises detecting at least one other polymorphism provided in any one of Tables 1 to 6, 8, 19 or 22, or a polymorphism in linkage disequilibrium therewith.
Calculating Composite Relative Risk “Genetic Risk”
An individual's “genetic risk” can be defined as the product of genotype relative risk values for each polymorphism assessed. A log-additive risk model can then be used to define three genotypes AA, AB and BB for a polymorphism having relative risk values of 1, OR, and OR2, under a rare disease model, where OR is the previously reported disease odds ratio for the high-risk allele, B, vs the low-risk allele, A. If the B allele has frequency (p), then these genotypes have population frequencies of (1−p)2, 2p(1−p), and p2, assuming Hardy-Weinberg equilibrium. The genotype relative risk values for each polymorphism can then be scaled so that based on these frequencies the average relative risk in the population is 1. Specifically, given the unscaled population average relative risk for each SNP:
(μ)=(1−p)2+2p(1−p)OR+p2OR2
Adjusted risk values 1/μ, OR/μ, and OR2/μ are used for AA, AB, and BB genotypes for each SNP. Missing genotypes are assigned a relative risk of 1. The following formula can be used to define the genetic risk:
SNP1×SNP2×SNP3×SNP4×SNP5×SNP6×SNP7,×SNP8, etc.
Similar calculations can be performed for non-SNP polymorphisms or a combination thereof.
An alternate method for calculating the composite risk is described in Mavaddat et al. (2015). In this example, the following formula is used;
PRS=β1x1+β2x2+ . . . βκxκ+βnxn
where βκ is the per-allele log odds ratio (OR) for the minor allele for SNP κ, and xκ the number of alleles for the same SNP (0, 1 or 2), n is the total number of SNPs and PRS is the polygenic risk score (which can also be referred to as composite SNP risk). Similar calculations can be performed for non-SNP polymorphisms or a combination thereof.
In an alternate embodiment, the magnitude of effect of each risk allele is not used when calculating the genetic risk score. More specifically, allele counting as generally described in WO 2005/086770 is used. For example, in one embodiment if the subject was homozygous for the risk allele they were scored as 2, if they were heterozygous for the risk allele they were scored as 1, and if they were homozygous for the risk allele they were scored as 0. As the skilled person would appreciate, alternate values such as 1, 0.5 and 0 respectively, could be used.
In an embodiment, the percent of risk alleles present out of the total possible number of loci analysed is used to produce the genetic risk score. For example, in the 64 allele panel described in Example 5 the subject may have at most 128 risk alleles. If a subject had 64 out of these 128 alleles, they would have 50% of the total possible alleles which can be expressed as 0.5.
The genetic risk score can be expressed as:
ln_risk=−8.4953 (i.e. the model intercept)+0.1496×SNP %. Then, risk=exp(ln_risk).
In this example, the risk is the relative risk for severe disease (e.g. a person with risk=3.5 is at 3.5 times increased risk compared with a person with the average number of risk alleles). exp(β) is the odds ratio for an increase of 1% in risk alleles. So, exp(0.1496)=1.16, which means that risk increases by 16% for a 1% increase in SNP %. In an embodiment, the β coefficient (model intercept) is between −10.06391 to −6.926615, or −9.5 to −7.5, or −9 to −8. In an embodiment of the above formula, the adjustment of the starting ln(risk) for the percentage of risk alleles is 0.1237336 to 0.1755347, or 0.16 to 0.14.
In an embodiment, the genetic risk is the SNP Risk Factor (SNF). In one embodiment, SNF=Σ(No of Risk Alleles×SNP β coefficient).
The “risk” of a human subject developing a severe response to a Coronavirus infection can be provided as a relative risk (or risk ratio).
In an embodiment, the genetic risk assessment obtains the “relative risk” of a human subject developing a severe response to a Coronavirus infection. Relative risk (or risk ratio), measured as the incidence of a disease in individuals with a particular characteristic (or exposure) divided by the incidence of the disease in individuals without the characteristic, indicates whether that particular exposure increases or decreases risk. Relative risk is helpful to identify characteristics that are associated with a disease, but by itself is not particularly helpful in guiding screening decisions because the frequency of the risk (incidence) is cancelled out.
In an embodiment, a threshold value(s) is set for determining a particular action such as the need for routine diagnostic testing, the need for prophylactic anti-Coronavirus therapy, selection of a person for a vaccine or the need to administer an anti-Coronavirus therapy. For example, a score determined using a method of the invention is compared to a pre-determined threshold, and if the score is higher than the threshold a recommendation is made to take the pre-determined action. Methods of setting such thresholds have now become widely used in the art and are described in, for example, US 20140018258.
In an embodiment, the method further comprises performing a clinical risk assessment of the human subject; and combining the clinical risk assessment and the genetic risk assessment to obtain the risk of a human subject developing a severe response to a Coronavirus infection. The clinical risk assessment procedure can include obtaining clinical information from a human subject. In other embodiments, these details have already been determined (such as in the subject's medical records).
Examples of factors which can be used to produce the clinical risk assessment include, but are not limited to, obtaining information from the human on one or more of the following: age, family history of a severe response to a Coronavirus infection, race/ethnicity, gender, body mass index, total cholesterol level, systolic and/or diastolic blood pressure, smoking status, does the human have diabetes, does the human have a cardiovascular disease, is the subject on hypertension medication, loss of taste, loss of smell and white blood cell count.
In an embodiment, the clinical risk assessment is based only one or more or all of age, body mass index, loss of taste, loss of smell and smoking status.
In another embodiment, the clinical risk assessment is based only one or more or all of age, loss of taste, loss of smell and smoking status.
In an embodiment, the clinical risk assessment includes obtaining information from the subject on one or more or all of age, gender, race/ethnicity, blood type, does the human have or has had an autoimmune disease, does the human have or has had an haematological cancer, does the human have or has had an non-haematological cancer, does the human have or has had diabetes, does the human have or has had hypertension and does the human have or has had a respiratory disease (other than asthma).
In an embodiment, the clinical risk assessment at least includes age and gender.
The present inventors have also found that a severe response to a Coronavirus infection risk model that relies solely on clinical factors provides useful risk discrimination for assessing a subject's risk of developing a severe response to a Coronavirus infection such as a SARS-CoV-2 infection. Such a test may be particularly useful in circumstances where a rapid decision needs to be made and/or when genetic testing is not readily available. Thus, in another aspect the present invention provides a method for assessing the risk of a human subject developing a severe response to a Coronavirus infection, the method comprising performing a clinical risk assessment of the human subject, wherein the clinical risk assessment comprises obtaining information from the subject on two, three, four, five or more or all of age, gender, race/ethnicity, height, weight, blood type, does the human have or has had an cerebrovascular disease, does the human have or has had a chronic kidney disease, does the human have or has had an autoimmune disease, does the human have or has had an haematological cancer, does the human have or has had an immunocompromised disease, does the human have or has had an non-haematological cancer, does the human have or has had diabetes, does the human have or has had liver disease, does the human have or has had hypertension and does the human have or has had a respiratory disease (other than asthma).
In an embodiment, the method comprises obtaining information from the subject on age and gender.
In an embodiment, the method comprises obtaining information from the subject on age, gender, race/ethnicity, height, weight, does the human have or has had an cerebrovascular disease, does the human have or has had a chronic kidney disease, does the human have or has had diabetes, does the human have or has had an haematological cancer, does the human have or has had hypertension, does the human have or has had an non-haematological cancer, and does the human have or has had a respiratory disease (other than asthma).
In an embodiment, the method comprises obtaining information from the subject on age, gender, race/ethnicity, blood type, height, weight, does the human have or has had an cerebrovascular disease, does the human have or has had a chronic kidney disease, does the human have or has had diabetes, does the human have or has had an haematological cancer, does the human have or has had hypertension, does the human have or has had an immunocompromised disease, does the human have or has had an haematological cancer, does the human have or has had liver disease, does the human have or has had an non-haematological cancer, and does the human have or has had a respiratory disease (other than asthma).
Examples of respiratory diseases which are included in the test are chronic obstructive pulmonary disease, chronic bronchitis and emphysema.
The diabetes can be any type of diabetes.
In an embodiment, the clinical risk assessment is conducted using the following formula:
In an embodiment, the clinical risk assessment is conducted using the following formula:
Using the above formulae the relative risk of a human subject developing a severe response to a Coronavirus infection is: risk=.
In one example, the clinical risk assessment is conducted using the following formula:
In an embodiment of the above formula, the starting ln(risk) (model intercept) is −0.5284 to 1.5509, or −0.16 to −0.36.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for ages 18 to 29 is −1.5 to −1, or −1.4 to −1.2.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for ages 30 to 39 is −1 to −0.7, or −0.9 to −0.8.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for ages 40 to 49 is −0.6 to −0.2, or −0.45 to −0.35.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for ages 60 to 69 is −0.4021263 to 0.2075385, or −0.19 to 0.09.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for ages 70+ is 0.1504677 to 0.73339, or 0.34 to 0.54.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for males is −0.140599 to 0.3115929, or −0.3 to 0.19.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for non-Caucasians is −0.3029713 to 0.3837958, or −0.06 to 0.14.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for A blood type is −0.3018427 to 0.1791056, or −0.16 to 0.04.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for B blood type is −0.1817567 to 0.5895909, or 0.1 to 0.3.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for AB blood type is −1.172319 to 0.0641862, or −0.45 to −0.65.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for a human who has, or has had, rheumatoid arthritis, lupus or psoriasis is −0.0309265 to 1.115784, or 0.44 to 0.64.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for a human who has, or has had, a haematological cancer is 0.1211918 to 1.899663, or 0.9 to 1.1.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for a human who has, or has had, a non-haematological cancer is −0.0625866 to 0.5498824, or 0.14 to 0.34.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for a human who has, or has had, diabetes is 0.0624018 to 0.7101834, or 0.28 to 0.48.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for a human who has, or has had, hypertension is 0.0504567 to 0.5623362, or 0.1 to 0.3.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for a human who has, or has had, a respiratory disease (excluding asthma) is 0.9775684 to 1.550944, or 1.16 to 1.36.
The present invention provides a method for assessing the risk of a human subject developing a severe response to a Coronavirus infection, the method comprising performing a clinical risk assessment of the human subject, wherein the clinical risk assessment involves determining at least the age and sex of the subject and producing a score. In an embodiment, the method further comprises comparing the score to a predetermined threshold, wherein if the score is at, or above, the threshold the subject is assessed at being at risk of developing a severe response to a Coronavirus infection.
In one embodiment, the subject is between 50 and 84 years of age and is asked their age and their sex.
In an embodiment, the method comprises determining the Log odds (LO). For example, the LO can be calculated using the formula:
LO=X+Σ Clinical β coefficients
In an embodiment, X is −2.25 to −1.25 or −2 or −1.5. In an embodiment, X is −1.749562.
In an embodiment, the relative risk is determined. In an embodiment, the relative risk is determined using the formula:
relative risk=eLO
In an embodiment, the probability is determined. In an embodiment, the probability is determined using the formula:
probability=eLO/(1+eLO)
“e” is the mathematical constant that is the base of the natural logarithm.
In an embodiment, the probability obtained by the above formula is multiplied by 100 to obtain a percent chance of a severe response to a Coronavirus infection such as hospitalisation being required.
In an embodiment, if the subject is between 50 and 64 years of age they are assigned a β coefficient of −0.5 to 0.5, or −0.25 to 0.25 or 0.
In an embodiment, if the subject is between 65 and 69 years of age they are assigned a β coefficient of 0 to 1, or 0.25 to 0.75 or 0.4694892.
In an embodiment, if the subject is between 70 and 74 years of age they are assigned a β coefficient of 0.5 to 1.5, or 0.75 to 1.25 or 1.006561.
In an embodiment, if the subject is between 75 and 79 years of age they are assigned a β coefficient of 0.9 to 1.9, or 1.15 to 1.65 or 1.435318.
In an embodiment, if the subject is between 80 and 84 years of age they are assigned a β coefficient of 1.1 to 2.1, or 1.35 to 1.85 or 1.599188.
In an embodiment, if the subject is female they are assigned a β coefficient of −0.5 to 0.5, or −0.25 to 0.25 or 0.
In an embodiment, if the subject is male they are assigned a β coefficient of −0.1 to 0.9, or 0.15 to 0.65 or 0.3911169.
In an embodiment, the last value provided above in each criteria is used.
In an embodiment, the clinical risk assessment includes obtaining information from the subject on one or more or all of age, gender, race/ethnicity, height, weight, does the human have or has had an cerebrovascular disease, does the human have or has had a chronic kidney disease, does the human have or has had diabetes, does the human have or has had an haematological cancer, does the human have or has had hypertension, does the human have or has had an non-haematological cancer, and does the human have or has had a respiratory disease (other than asthma).
In an embodiment, each of the above factors are assessed and
In an embodiment, the last value provided above in each criteria is used.
In an embodiment, the clinical risk assessment includes obtaining information from the subject on one or more or all of age, gender, race/ethnicity, blood type, height, weight, does the human have or has had an cerebrovascular disease, does the human have or has had a chronic kidney disease, does the human have or has had diabetes, does the human have or has had an haematological cancer, does the human have or has had hypertension, does the human have or has had an immunocompromised disease, does the human have or has had an haematological cancer, does the human have or has had liver disease, does the human have or has had an non-haematological cancer, and does the human have or has had a respiratory disease (other than asthma).
In an embodiment, each of the above factors are assessed and
In an embodiment, the last value provided above in each criteria is used.
In an embodiment, the subject's body mass index is determined using their height and weight.
In an embodiment, if any of the clinical factors are unknown, or the subject is unwilling to supply the relevant details, that factor(s) is assigned a β coefficient of 0.
In an embodiment, one or more or all of the clinical factors are self-assessed (self-reported). In an embodiment, the race/ethnicity is self-assessed (self-reported). In an embodiment, one or more or all of current or previous disease status, such as an autoimmune disease, an haematological cancer, an non-haematological cancer, diabetes, hypertension or a respiratory disease, is self-assessed (self-reported).
In an embodiment, the clinical assessment comprises determining the blood type of the subject. This will typically comprise obtaining a sample comprising blood from the subject. The detection method used can any be any suitable method known in the art. In embodiment, a genetic test as described in the Examples is used, preferably concurrently with a genetic analysis for assessing the risk of a human subject developing a severe response to a coronavirus infection.
For instance, ABO blood type can be imputed using three SNPs, namely rs505922, rs8176719 and rs8176746) in the ABO gene on chromosome 9q34.2. An rs8176719 deletion (or for those with no result for rs8176719, a T allele at rs505922) indicates haplotype O. At rs8176746, haplotype A is indicated by the presence of the G allele and haplotype B is indicated by the presence of the T allele (see Table 7).
In an embodiment, whether a subject has or has had (also referred to herein as “has ever been diagnosed”) with a particular disease state, the disease is classified using the international Classification of Disease (ICD) system. Thus,
In an embodiment, to obtain the “risk” of a human subject developing a severe response to a Coronavirus infection, the following formula can be used:
In an embodiment, to obtain the “risk” of a human subject developing a severe response to a Coronavirus infection, the following formula can be used:
Using the above formulae the relative risk of a human subject developing a severe response to a Coronavirus infection is: risk=.
In one example, to obtain the “risk” of a human subject developing a severe response to a Coronavirus infection, the following formula can be used:
Using this formula the relative risk of a human subject developing a severe response to a Coronavirus infection is: risk=.
In an embodiment of the above formula, the starting ln(risk) (model intercept) is −12.5559 to −8.9755, or −12 to −8, or −11 to −10.5.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for the percentage of risk alleles is 0.142 to 0.2006, or 0.16 to 0.18.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for ages 18 to 29 is −1.5 to −1, or −1.4 to −1.2.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for ages 30 to 39 is −1 to −0.7, or −0.9 to −0.8.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for ages 40 to 49 is −0.6 to −0.2, or −0.45 to −0.35.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for ages 60 to 69 is −0.3819 to 0.2619, or −0.1 to 0.1.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for ages 70+ is 0.2213 to 0.8438, or 0.43 to 0.63.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for males is −0.1005 to 0.3779, or 0.03 to 0.23.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for non-Caucasians is −0.0084 to 0.7167, or 0.25 to 0.45.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for A blood type is −0.4726 to 0.0397, or −0.11 to −0.31.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for B blood type is −0.2348 to 0.5773, or 0.07 to 0.27.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for AB blood type is −1.5087 to −0.2404, or −0.77 to −0.97.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for a human who has, or has had, rheumatoid arthritis, lupus or psoriasis is 0.1832 to 1.3920, or 0.68 to 0.88.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for a human who has, or has had, a haematological cancer is 0.0994 to 1.9756, or 0.93 to 1.13.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for a human who has, or has had, a non-haematological cancer is 0.0401 to 0.6933, or 0.26 to 0.46.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for a human who has, or has had, diabetes is 0.1450 to 0.8330, or 0.39 to 0.59.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for a human who has, or has had, hypertension is 0.0313 to 0.5756, or 0.2 to 0.4.
In an embodiment of the above formula, the adjustment of the starting ln(risk) for a human who has, or has had, a respiratory disease (excluding asthma) is 0.9317 to 0.1535, or 1.13 to 1.33.
In an alternate embodiment, and as outlined above, the method comprises determining the Log odds (LO). For example, the LO can be calculated using the formula:
LO=X+SRF+Σ Clinical β coefficients
In an embodiment, the SRF is the SNP Risk Factor which is: (No of Risk Alleles×SNP β coefficient).
In an embodiment, the relative risk is determined. In an embodiment, the relative risk is determined using the formula:
relative risk=eLO
In an embodiment, the probability is determined. In an embodiment, the probability is determined using the formula:
probability=eLO/(1+eLO)
“e” is the mathematical constant that is the base of the natural logarithm.
In an embodiment, the probability obtained by the above formula is multiplied by 100 to obtain a percent chance of a severe response to a Coronavirus infection such as hospitalisation being required.
In an embodiment, the genetic risk assessment involves the analysis of rs10755709, rs112317747, rs112641600, rs118072448, rs2034831, rs7027911 and rs71481792. In an embodiment, X is −1.8 to −0.8 or −1.6 or −1.15. In an embodiment, X is −1.36523. In an embodiment, the subject is assigned a β coefficient of −0.08 to 0.32, or 0.02 to 0.22 or 0.124239 for each G (risk) allele present at rs10755709. Thus, for example, if the subject is homozygous for the risk allele they can be assigned a β coefficient of 0.248478, if they are heterozygous can be assigned a β coefficient of 0.124239, and if they is homozygous for the non-risk allele (C at rs10755709) they can be assigned a β coefficient of 0.248478. In an embodiment, the subject is assigned a β coefficient of 0.07 to 0.47, or 0.17 to 0.37 or 0.2737487 for each C (risk) allele present at rs112317747. In an embodiment, the subject is assigned a β coefficient of −0.43 to −0.03, or −0.33 to −0.13 or −0.2362513 for each T (risk) allele present at rs112641600. In an embodiment, the subject is assigned a β coefficient of −0.4 to 0, or −0.3 to −0.1 or −0.1995879 for each C (risk) allele present at rs118072448. In an embodiment, the subject is assigned a β coefficient of 0.04 to 0.44, or 0.14 to 0.34 or 0.2371955 for each C (risk) allele present at rs2034831. In an embodiment, the subject is assigned a β coefficient of −0.1 to 0.3, or 0 to 0.2 or 0.1019074 for each A (risk) allele present at rs7027911. In an embodiment, the subject is assigned a β coefficient of −0.3 to 0.1, or −0.2 to 0 or −0.1058025 for each T (risk) allele present at rs71481792. In an embodiment, the Clinical β coefficients is determined as above such as factoring in β coefficients for each of age, gender, race/ethnicity, height, weight, does the human have or has had an cerebrovascular disease, does the human have or has had a chronic kidney disease, does the human have or has had diabetes, does the human have or has had an haematological cancer, does the human have or has had hypertension, does the human have or has had an non-haematological cancer, and does the human have or has had a respiratory disease (other than asthma).
In an embodiment, the genetic risk assessment involves the analysis of rs10755709, rs112317747, rs112641600, rs118072448, rs2034831, rs7027911, rs71481792, rs115492982 and rs1984162. In an embodiment, X is −2 to −1.5 or −1.75 or −1.25. In an embodiment, X is −1.469939. In an embodiment, the subject is assigned a β coefficient of −0.08 to 0.32, or 0.02 to 0.22 or 0.1231766 for each G (risk) allele present at rs10755709. Thus, for example, if the subject is homozygous for the risk allele they can be assigned a β coefficient of 0.2463532, if they are heterozygous can be assigned a β coefficient of 0.1231766, and if they is homozygous for the non-risk allele (C at rs10755709) they can be assigned a β coefficient of 0.248478. In an embodiment, the subject is assigned a β coefficient of 0.06 to 0.46, or 0.16 to 0.36 or 0.2576692 for each C (risk) allele present at rs112317747. In an embodiment, the subject is assigned a β coefficient of −0.43 to −0.03, or −0.33 to −0.13 or −0.2384001 for each T (risk) allele present at rs112641600. In an embodiment, the subject is assigned a β coefficient of −0.4 to 0, or −0.3 to −0.1 or −0.1965609 for each C (risk) allele present at rs118072448. In an embodiment, the subject is assigned a β coefficient of 0.04 to 0.44, or 0.14 to 0.34 or 0.2414792 for each C (risk) allele present at rs2034831. In an embodiment, the subject is assigned a β coefficient of −0.1 to 0.3, or 0 to 0.2 or 0.0998459 for each A (risk) allele present at rs7027911. In an embodiment, the subject is assigned a β coefficient of −0.3 to 0.1, or −0.2 to 0 or −0.1032044 for each T (risk) allele present at rs71481792. In an embodiment the subject is assigned a β coefficient of 0.21 to 0.61, or 0.31 to 0.51 or 0.4163575 for each A (risk) allele present at rs115492982. In an embodiment the subject is assigned a β coefficient of −0.1 to 0.3, or 0 to 0.2 or 0.1034362 for each A (risk) allele present at rs1984162. In an embodiment, the Clinical β coefficients is determined as above such as factoring in β coefficients for each of age, gender, race/ethnicity, blood type, height, weight, does the human have or has had an cerebrovascular disease, does the human have or has had a chronic kidney disease, does the human have or has had diabetes, does the human have or has had an haematological cancer, does the human have or has had hypertension, does the human have or has had an haematological cancer, does the human have or has had an immunocompromised disease, does the human have or has had an haematological cancer, does the human have or has had liver disease, does the human have or has had an non-haematological cancer, and does the human have or has had a respiratory disease (other than asthma).
Any of the above calculations can be performed for non-SNP polymorphisms or a combination thereof.
In another embodiment, when combining the clinical risk assessment with the genetic risk assessment to obtain the “risk” of a human subject developing a severe response to a Coronavirus infection, the following formula can be used:
[Risk (i.e. Clinical Evaluation×SNP risk)]=[Clinical Evaluation risk]×SNP1×SNP2×SNP3×SNP4×SNP5×SNP6×SNP7,×SNP8,×SNPN etc.
Where Clinical Evaluation is the risk provided by the clinical evaluation, and SNP1 to SNPN are the relative risk for the individual SNPs, each scaled to have a population average of 1 as outlined above. Because the SNP risk values have been “centred” to have a population average risk of 1, if one assumes independence among the SNPs, then the population average risk across all genotypes for the combined value is consistent with the underlying Clinical Evaluation risk estimate.
In an embodiment, the genetic risk assessment is combined with the clinical risk assessment to obtain the “relative risk” of a human subject developing a severe response to a Coronavirus infection.
A threshold(s) can be set as described above when genetic risk is assessed alone. In one example, the threshold could be set to be at least 5, at least 6, at least 7, at least 8, at least 9 or at least 10, when using the embodiment of the test described in Example 5. If set at 5 in this example, about 10% of the UK biobank population have a risk score over 5.0 resulting in the following performance characteristics for the test:
Positive predictive value 91.78%
Negative predictive value 45.76%
As the skilled person would understand, various different thresholds could be set altering performance depending on the level of risk the entity conducting the test is willing accept.
Depending upon the end-usage of the test, a threshold may be altered to the most appropriate values.
Amplification primers for amplifying markers (e.g., marker loci) and suitable probes to detect such markers or to genotype a sample with respect to multiple marker alleles, can be used in the disclosure. For example, primer selection for long-range PCR is described in U.S. Ser. No. 10/042,406 and U.S. Ser. No. 10/236,480; for short-range PCR, U.S. Ser. No. 10/341,832 provides guidance with respect to primer selection. Also, there are publicly available programs such as “Oligo” available for primer design. With such available primer selection and design software, the publicly available human genome sequence and the polymorphism locations, one of skill can construct primers to amplify the polymorphisms to practice the disclosure. Further, it will be appreciated that the precise probe to be used for detection of a nucleic acid comprising a polymorphism (e.g., an amplicon comprising the polymorphism) can vary, e.g., any probe that can identify the region of a marker amplicon to be detected can be used in conjunction with the present disclosure. Further, the configuration of the detection probes can, of course, vary. Thus, the disclosure is not limited to the sequences recited herein.
Examples of primer pairs for detecting some of the SNP's disclosed herein include: rs11549298 (ACCTGGTATCAGTGAAGAGGATCAG (SEQ ID NO:1) and TCTTGATACAACTGTAAGAAGTGGT (SEQ ID NO:2)), rs112317747 (TATTTCTTTGTTGCCCTCTATCTCT (SEQ ID NO:3) and GAAAGAGATGGGTTGGCATTATTAT (SEQ ID NO:4)), rs2034831 (TAAAATTAGAACTGGAGGGCTGGGT (SEQ ID NO:5) and TGGCATTATAAACACTCACTGAAGT (SEQ ID NO: 6)), rs112641600 (AATGCCATCTGATGAGAGAAGTTTT (SEQ ID NO:7) and TACAGTTTTAAAAATGGGCGTTTCT (SEQ ID NO:8)), rs10755709 (TATAATAACACGTGGAAGTGAAAAT (SEQ ID NO:9) and TTGTTTGTATGTGTGAAATGATTCT (SEQ ID NO:10)), rs118072448 (AAGCAAACTATTCTTCAGGAATCCA (SEQ ID NO:11) and ATTTCTGCATTTCACTTTGTGTGGT (SEQ ID NO:12)), rs7027911 (GTAAATGCTGCTAACAGAGCTCTTT (SEQ ID NO:13) and GAAGAGAGTTTATTAGCAAGGCCTC (SEQ ID NO:14)), rs71481792 (CATTTGGGAAAAGCCACTGAATGGA (SEQ ID NO:15) and AGATTGACTAGCCGTTGAGAGTAGA (SEQ ID NO:16)), and rs1984162 (ACTGACTCCTGACACTCTTGAAGCG (SEQ ID NO:17) and GACTCTTCTCTGGCATCTTCTCATG (SEQ ID NO:18)).
Indeed, it will be appreciated that amplification is not a requirement for marker detection, for example one can directly detect unamplified genomic DNA simply by performing a Southern blot on a sample of genomic DNA.
Typically, molecular markers are detected by any established method available in the art, including, without limitation, allele specific hybridization (ASH), detection of extension, array hybridization (optionally including ASH), or other methods for detecting polymorphisms, amplified fragment length polymorphism (AFLP) detection, amplified variable sequence detection, randomly amplified polymorphic DNA (RAPD) detection, restriction fragment length polymorphism (RFLP) detection, self-sustained sequence replication detection, simple sequence repeat (SSR) detection, and single-strand conformation polymorphisms (SSCP) detection.
Some techniques for detecting genetic markers utilize hybridization of a probe nucleic acid to nucleic acids corresponding to the genetic marker (e.g., amplified nucleic acids produced using genomic DNA as a template). Hybridization formats, including, but not limited to: solution phase, solid phase, mixed phase, or in situ hybridization assays are useful for allele detection. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology—Hybridization with Nucleic Acid Probes Elsevier, New York, as well as in Sambrook et al. (supra).
PCR detection using dual-labelled fluorogenic oligonucleotide probes, commonly referred to as “TaqMan™” probes, can also be performed according to the present disclosure. These probes are composed of short (e.g., 20-25 base) oligodeoxynucleotides that are labelled with two different fluorescent dyes. On the 5′ terminus of each probe is a reporter dye, and on the 3′ terminus of each probe a quenching dye is found. The oligonucleotide probe sequence is complementary to an internal target sequence present in a PCR amplicon. When the probe is intact, energy transfer occurs between the two fluorophores and emission from the reporter is quenched by the quencher by FRET. During the extension phase of PCR, the probe is cleaved by 5′ nuclease activity of the polymerase used in the reaction, thereby releasing the reporter from the oligonucleotide-quencher and producing an increase in reporter emission intensity. Accordingly, TaqMan™ probes are oligonucleotides that have a label and a quencher, where the label is released during amplification by the exonuclease action of the polymerase used in amplification. This provides a real time measure of amplification during synthesis. A variety of TaqMan™ reagents are commercially available, e.g., from Applied Biosystems (Division Headquarters in Foster City, Calif.) as well as from a variety of specialty vendors such as Biosearch Technologies (e.g., black hole quencher probes). Further details regarding dual-label probe strategies can be found, e.g., in WO 92/02638.
Other similar methods include e.g. fluorescence resonance energy transfer between two adjacently hybridized probes, e.g., using the “LightCycler®” format described in U.S. Pat. No. 6,174,670.
Array-based detection can be performed using commercially available arrays, e.g., from Affymetrix (Santa Clara, Calif.) or other manufacturers. Reviews regarding the operation of nucleic acid arrays include Sapolsky et al. (1999); Lockhart (1998); Fodor (1997a); Fodor (1997b) and Chee et al. (1996). Array based detection is one preferred method for identification markers of the disclosure in samples, due to the inherently high-throughput nature of array based detection.
The nucleic acid sample to be analysed is isolated, amplified and, typically, labelled with biotin and/or a fluorescent reporter group. The labelled nucleic acid sample is then incubated with the array using a fluidics station and hybridization oven. The array can be washed and or stained or counter-stained, as appropriate to the detection method. After hybridization, washing and staining, the array is inserted into a scanner, where patterns of hybridization are detected. The hybridization data are collected as light emitted from the fluorescent reporter groups already incorporated into the labelled nucleic acid, which is now bound to the probe array. Probes that most clearly match the labelled nucleic acid produce stronger signals than those that have mismatches. Since the sequence and position of each probe on the array are known, by complementarity, the identity of the nucleic acid sample applied to the probe array can be identified.
Markers and polymorphisms can also be detected using DNA sequencing. DNA sequencing methods are well known in the art and can be found for example in Ausubel et al, eds., Short Protocols in Molecular Biology, 3rd ed., Wiley, (1995) and Sambrook et al, Molecular Cloning, 2nd ed., Chap. 13, Cold Spring Harbor Laboratory Press, (1989). Sequencing can be carried out by any suitable method, for example, dideoxy sequencing, chemical sequencing, or variations thereof.
Suitable sequencing methods also include Second Generation, Third Generation, or Fourth Generation sequencing technologies, all referred to herein as “next generation sequencing”, including, but not limited to, pyrosequencing, sequencing-by-ligation, single molecule sequencing, sequence-by-synthesis (SBS), massive parallel clonal, massive parallel single molecule SBS, massive parallel single molecule real-time, massive parallel single molecule real-time nanopore technology, etc. A review of some such technologies can be found in (Morozova and Marra, 2008), herein incorporated by reference. Accordingly, in some embodiments, performing a genetic risk assessment as described herein involves detecting the at least two polymorphisms by DNA sequencing. In an embodiment, the at least two polymorphisms are detected by next generation sequencing.
Next generation sequencing (NGS) methods share the common feature of massively parallel, high-throughput strategies, with the goal of lower costs in comparison to older sequencing methods (see, Voelkerding et al., 2009; MacLean et al., 2009).
A number of such DNA sequencing techniques are known in the art, including fluorescence-based sequencing methodologies. In some embodiments, automated sequencing techniques are used. In some embodiments, parallel sequencing of partitioned amplicons is used (WO2006084132). In some embodiments, DNA sequencing is achieved by parallel oligonucleotide extension (See, e.g., U.S. Pat. Nos. 5,750,341 and 6,306,597). Additional examples of sequencing techniques include the Church polony technology (Mitra et al., 2003; Shendure et al., 2005; U.S. Pat. Nos. 6,432,36; 6,485,944; 6,511,803), the 454 picotiter pyrosequencing technology (Margulies et al., 2005; US 20050130173), the Solexa single base addition technology (Bennett et al., 2005; U.S. Pat. Nos. 6,787,308; 6,833,246), the Lynx massively parallel signature sequencing technology (Brenner et al., 2000; U.S. Pat. Nos. 5,695,934; 5,714,330), and the Adessi PCR colony technology (Adessi et al., 2000).
These correlations can be performed by any method that can identify a relationship between an allele and a phenotype, or a combination of alleles and a combination of phenotypes. For example, alleles defined herein can be correlated with a severe response to Coronavirus infection phenotypes. The methods can involve referencing a look up table that comprises correlations between alleles of the polymorphism and the phenotype. The table can include data for multiple allele-phenotype relationships and can take account of additive or other higher order effects of multiple allele-phenotype relationships, e.g., through the use of statistical tools such as principle component analysis, heuristic algorithms, etc.
Correlation of a marker to a phenotype optionally includes performing one or more statistical tests for correlation. Many statistical tests are known, and most are computer-implemented for ease of analysis. A variety of statistical methods of determining associations/correlations between phenotypic traits and biological markers are known and can be applied to the present disclosure (Hartl et al., 1981). A variety of appropriate statistical models are described in Lynch and Walsh (1998). These models can, for example, provide for correlations between genotypic and phenotypic values, characterize the influence of a locus on a phenotype, sort out the relationship between environment and genotype, determine dominance or penetrance of genes, determine maternal and other epigenetic effects, determine principle components in an analysis (via principle component analysis, or “PCA”), and the like. The references cited in these texts provides considerable further detail on statistical models for correlating markers and phenotype.
In addition to standard statistical methods for determining correlation, other methods that determine correlations by pattern recognition and training, such as the use of genetic algorithms, can be used to determine correlations between markers and phenotypes. This is particularly useful when identifying higher order correlations between multiple alleles and multiple phenotypes. To illustrate, neural network approaches can be coupled to genetic algorithm-type programming for heuristic development of a structure-function data space model that determines correlations between genetic information and phenotypic outcomes.
In any case, essentially any statistical test can be applied in a computer implemented model, by standard programming methods, or using any of a variety of “off the shelf” software packages that perform such statistical analyses, including, for example, those noted above and those that are commercially available, e.g., from Partek Incorporated (St. Peters, Mo.; www.partek.com), e.g., that provide software for pattern recognition (e.g., which provide Partek Pro 2000 Pattern Recognition Software).
Systems for performing the above correlations are also a feature of the disclosure. Typically, the system will include system instructions that correlate the presence or absence of an allele (whether detected directly or, e.g., through expression levels) with a predicted phenotype.
Optionally, the system instructions can also include software that accepts diagnostic information associated with any detected allele information, e.g., a diagnosis that a subject with the relevant allele has a particular phenotype. This software can be heuristic in nature, using such inputted associations to improve the accuracy of the look up tables and/or interpretation of the look up tables by the system. A variety of such approaches, including neural networks, Markov modelling, and other statistical analysis are described above.
The disclosure provides methods of determining the polymorphic profile of an individual at the polymorphisms outlined in the present disclosure (e.g. Tables 1 to 3, 5a or 6, or Tables 1 to 6, 8, 19 or 22) or polymorphisms in linkage disequilibrium with one or more thereof.
The polymorphic profile constitutes the polymorphic forms occupying the various polymorphic sites in an individual. In a diploid genome, two polymorphic forms, the same or different from each other, usually occupy each polymorphic site. Thus, the polymorphic profile at sites X and Y can be represented in the form X (x1, x1), and Y (y1, y2), wherein x1, x1 represents two copies of allele x1 occupying site X and y1, y2 represent heterozygous alleles occupying site Y.
The polymorphic profile of an individual can be scored by comparison with the polymorphic forms associated with resistance or susceptibility to a severe response to a Coronavirus infection occurring at each site. The comparison can be performed on at least, e.g., 1, 2, 5, 10, 25, 50, or all of the polymorphic sites, and optionally, others in linkage disequilibrium with them. The polymorphic sites can be analysed in combination with other polymorphic sites.
Polymorphic profiling is useful, for example, in selecting agents to affect treatment or prophylaxis of a severe response to a Coronavirus infection in a given individual. Individuals having similar polymorphic profiles are likely to respond to agents in a similar way.
Polymorphic profiling is also useful for stratifying individuals in clinical trials of agents being tested for capacity to treat a severe response to a Coronavirus infection or related conditions. Such trials are performed on treated or control populations having similar or identical polymorphic profiles (see EP 99965095.5), for example, a polymorphic profile indicating an individual has an increased risk of developing a severe response to a Coronavirus infection. Use of genetically matched populations eliminates or reduces variation in treatment outcome due to genetic factors, leading to a more accurate assessment of the efficacy of a potential drug.
Polymorphic profiling is also useful for excluding individuals with no predisposition to a severe response to a Coronavirus infection from clinical trials. Including such individuals in the trial increases the size of the population needed to achieve a statistically significant result. Individuals with no predisposition to a severe response to a Coronavirus infection can be identified by determining the numbers of resistances and susceptibility alleles in a polymorphic profile as described above. For example, if a subject is genotyped at ten sites of the disclosure associated with a severe response to a Coronavirus infection, twenty alleles are determined in total. If over 50% and alternatively over 60% or 75% percent of these are resistance genes, the individual is unlikely to develop a severe response to a Coronavirus infection and can be excluded from the trial.
The methods of the present disclosure may be implemented by a system such as a computer implemented method. For example, the system may be a computer system comprising one or a plurality of processors which may operate together (referred to for convenience as “processor”) connected to a memory. The memory may be a non-transitory computer readable medium, such as a hard drive, a solid state disk or CD-ROM. Software, that is executable instructions or program code, such as program code grouped into code modules, may be stored on the memory, and may, when executed by the processor, cause the computer system to perform functions such as determining that a task is to be performed to assist a user to determine the risk of a human subject developing a severe response to a Coronavirus infection; receiving data indicating the clinical risk assessment and the genetic risk assessment of the human subject developing a severe response to a Coronavirus infection, wherein the genetic risk was derived by detecting at least two polymorphisms known to be associated with a severe response to a Coronavirus infection; processing the data to combine the clinical risk assessment and the genetic risk assessment to obtain the risk of a human subject developing a severe response to a Coronavirus infection; outputting the risk of a human subject developing a severe response to a Coronavirus infection.
For example, the memory may comprise program code which when executed by the processor causes the system to determine at least two polymorphisms known to be associated with a severe response to a Coronavirus infection; process the data to combine the clinical risk assessment and the genetic risk assessment to obtain the risk of a human subject developing a severe response to a Coronavirus infection; report the risk of a human subject developing a severe response to a Coronavirus infection.
In another embodiment, the system may be coupled to a user interface to enable the system to receive information from a user and/or to output or display information. For example, the user interface may comprise a graphical user interface, a voice user interface or a touchscreen.
In an embodiment, the program code may causes the system to determine the “Polymorphism risk”.
In an embodiment, the program code may causes the system to determine CombinedClinical Risk×Genetic Risk (for example Polymorphism risk).
In an embodiment, the system may be configured to communicate with at least one remote device or server across a communications network such as a wireless communications network. For example, the system may be configured to receive information from the device or server across the communications network and to transmit information to the same or a different device or server across the communications network. In other embodiments, the system may be isolated from direct user interaction.
In another embodiment, performing the methods of the present disclosure to assess the risk of a human subject developing a severe response to a Coronavirus infection, enables establishment of a diagnostic or prognostic rule based on the clinical risk assessment and the genetic risk assessment of the human subject developing a severe response to a Coronavirus infection. For example, the diagnostic or prognostic rule can be based on the Combined Clinical Risk×Genetic Risk score relative to a control, standard or threshold level of risk.
In another embodiment, the diagnostic or prognostic rule is based on the application of a statistical and machine learning algorithm. Such an algorithm uses relationships between a population of polymorphisms and disease status observed in training data (with known disease status) to infer relationships which are then used to determine the risk of a human subject developing a severe response to a Coronavirus infection in subjects with an unknown risk. An algorithm is employed which provides an risk of a human subject developing a severe response to a Coronavirus infection. The algorithm performs a multivariate or univariate analysis function.
In an embodiment, the present disclosure provides a kit comprising at least two sets of primers for amplifying two or more nucleic acids, wherein the two or more nucleic acids comprise a polymorphism selected from any one of Tables 1 to 3, 5a or 6, or Tables 1 to 6, 8, 19 or 22, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 100, at least 120, at least 140, at least 160, at least 180, at least 200, at least 250, at least 300 or at least 306 sets of the primers for amplifying nucleic acids comprising a polymorphism selected from any one of Tables 1 to 3, 5a or 6, or Tables 1 to 6, 8, 19 or 22, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 100, at least 120, at least 140, at least 160, at least 180, at least 200, at least 250, at least 300 or at least 306 sets sets of the primers for amplifying nucleic acids comprising a polymorphism selected from Table 2 and Table 3, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, or at least 60 sets of the primers for amplifying nucleic acids comprising a polymorphism selected from Table 4 or Table 6, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, or at least 60 sets of the primers for amplifying nucleic acids comprising a polymorphism selected from Table 4, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40 or at least 50, sets of the primers for amplifying nucleic acids comprising a polymorphism selected from Table 3 or Table 6a, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40 or at least 50, sets of the primers for amplifying nucleic acids comprising a polymorphism selected from Table 3, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises sets of primers for amplifying nucleic acids comprising one or more or all of the polymorphisms provided in Table 19, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises sets of primers for amplifying nucleic acids comprising one or more or all of the polymorphisms provided in Table 22, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
As would be appreciated by those of skill in the art, once a polymorphism is identified, primers can be designed to amplify the polymorphism as a matter of routine. Various software programs are freely available that can suggest suitable primers for amplifying polymorphisms of interest.
Again, it would be known to those of skill in the art that PCR primers of a PCR primer pair can be designed to specifically amplify a region of interest from human DNA. Each PCR primer of a PCR primer pair can be placed adjacent to a particular single-base variation on opposing sites of the DNA sequence variation. Furthermore, PCR primers can be designed to avoid any known DNA sequence variation and repetitive DNA sequences in their PCR primer binding sites.
The kit may further comprise other reagents required to perform an amplification reaction such as a buffer, nucleotides and/or a polymerase, as well as reagents for extracting nucleic acids from a sample.
Array based detection is one preferred method for assessing the polymorphisms of the disclosure in samples, due to the inherently high-throughput nature of array based detection. A variety of probe arrays have been described in the literature and can be used in the context of the present disclosure for detection of polymorphisms that can be correlated to a severe response to a Coronavirus infection. For example, DNA probe array chips are used in one embodiment of the disclosure. The recognition of sample DNA by the set of DNA probes takes place through DNA hybridization. When a DNA sample hybridizes with an array of DNA probes, the sample binds to those probes that are complementary to the sample DNA sequence. By evaluating to which probes the sample DNA for an individual hybridizes more strongly, it is possible to determine whether a known sequence of nucleic acid is present or not in the sample, thereby determining whether a marker found in the nucleic acid is present.
Thus, in another embodiment, the present disclosure provides a genetic array comprising at least two sets of probes for hybridising to two or more nucleic acids, wherein the two or more nucleic acids comprise a polymorphism selected from any one of Tables 1 to 3, 5a or 6, or Tables 1 to 6, 8, 19 or 22, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 100, at least 120, at least 140, at least 160, at least 180, at least 200, at least 250, at least 300 or at least 306 sets of probes for hybridising a polymorphism selected from any one of Tables 1 to 3, 5a or 6, or Tables 1 to 6, 8, 19 or 22, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 100, at least 120, at least 140, at least 160, at least 180, at least 200, at least 250, at least 300 or at least 306 sets of probes for hybridising a polymorphism selected from Table 2 and Table 3, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, or at least 60 sets of probes for hybridising a polymorphism selected from Table 4 or Table 5, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40, at least 50, or at least 60 sets of probes for hybridising a polymorphism selected from Table 4, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40 or at least 50, sets of probes for hybridising a polymorphism selected from Table 3 or Table 6a, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 20, at least 30, at least 40 or at least 50, sets of probes for hybridising a polymorphism selected from Table 3, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprises a probe(s) for hybridising one or more or all of the polymorphisms provided in Table 19, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
In an embodiment, the kit comprising a probe(s) for hybridising one or more or all of the polymorphisms provided in in Table 22, or a single nucleotide polymorphism in linkage disequilibrium with one or more thereof.
Primers and probes for other polymorphisms can be included with the above exemplified kits. For example, primers and/or probes may be included for detecting a Coronavirus, such as a SARS-CoV-2 viral, infection.
Approximately 11 million SNP results were analysed. These were sorted by p-value, from lowest to highest and the top one million of these were utilised for further pruning. This equated to all variants p<0.0969. A p-value threshold of p<0.001 was then applied, as was a beta value window between −1 to 1 and an average pooled allele frequency of 0.01-0.99.
These were then further pruned for linkage disequilibrium using the online tool LDLink, snpclip (ldlink.nci.nih.gov) using the EUR populations as reference, set to threshold at R2 of <0.5. Non-single nucleotide variants were excluded if no linked surrogate/proxy SNP was available.
Informative polymorphisms derived from publicly available pooled genome-wide association study (GWAS) results from 716 cases (confirmed COVID-19 (severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)) diagnosis and hospitalised) and 616 controls (confirmed COVID-19 diagnosis and non-hospitalised are provided in Table 2.
Informative polymorphisms derived from 2,863 patients within the UK Biobank Study, of which 825 were hospitalized for severe response to the infection. GWAS results were sorted by p-value. A p-value threshold of p<0.00001 was applied, as was an allele frequency threshold set at a minor allele frequency >0.01. The identified polymorphisms are provided in Table 8.
SNP-based (relative) risk score was calculated using estimates of the odds ratio (OR) per allele and risk allele frequency (p) assuming independent and additive risks on the log OR scale. For each SNP, the unscaled population average risk was calculated as μ=(1−p)2+2p(1−p) OR+p2OR2. Adjusted risk values (with a population average risk equal to 1 were calculated as 1/μ, OR/μ and OR2/μ for the three genotypes defined by number of risk alleles (0, 1, or 2). The overall SNP-based risk score was then calculated by multiplying the adjusted risk values for each of the 108 SNPs (Tables 9 and 10).
Thus, a polygenic risk score can discriminate between patients with a confirmed Covid-19 infection who developed a severe response to that infection, requiring hospitalization, and those who did not require hospitalization.
The present inventors have found that a polygenic risk score can discriminate between patients with a confirmed Covid-19 infection who developed a severe response to that infection, requiring hospitalization, and those who did not require hospitalization.
The model has been developed using 2,863 patients within the UK Biobank Study, of which 825 were hospitalized for severe response to the infection.
SNP-based (relative) risk score was calculated using estimates of the odds ratio (OR) per allele and risk allele frequency (p) assuming independent and additive risks on the log OR scale. For each SNP, the unscaled population average risk was calculated as μ=(1−p)2+2p(1−p) OR+p2OR2. Adjusted risk values (with a population average risk equal to 1 were calculated as 1/μ, OR/μ and OR2/μ for the three genotypes defined by number of risk alleles (0, 1, or 2). The overall SNP-based risk score was then calculated by multiplying the adjusted risk values for each of the 58 SNPs (Table 11). The 58 SNPs analysed are provided in Table 3.
Thus, a polygenic risk score can discriminate between patients with a confirmed Covid-19 infection who developed a severe response to that infection, requiring hospitalization, and those who did not require hospitalization. Due to the higher OR, this panel performed better than the 108 SNP panel described in Example 2.
The present specification provides methods for a Covid-19 risk model which combines a clinical risk assessment and a genetic risk assessment which can be used discriminate between cases with a severe response to Covid-19 infection, versus controls without a severe response.
The clinical risk factors incorporated into a combined model are assigned a relative risk, which indicates the magnitude of association with the severity of a Covid-19 infection, the clinical factors are combined with the polygenic risk score by multiplication. For example clinical risk factor A is assigned the relative risk RRa and clinical risk factor B is assigned the relative risk RRb. The full risk score is then calculated as Polygenic Risk Score×RRa×RRb=Combined Risk.
The inventors extracted COVID-19 testing and hospital records from the UK Biobank COVID-19 data portal on 15 Sep. 2020. At the time of data extraction, primary care data was only available for just over half of the identified participants and was therefore not used in these analyses.
Eligible participants were those who had tested positive for COVID-19 and for whom SNP genotyping data and linked hospital records were available. Of the 18,221 participants with COVID-19 test results, 1,713 had tested positive and 1,582 of those had both SNP and hospital data available.
The inventors used source of test result as a proxy for severity of disease: outpatient representing non-severe disease and inpatient representing severe disease. For participants with multiple test results, the disease was considered to be severe if at least one result came from an inpatient setting.
The inventors identified 62 SNPs from the publicly available (release 2) results of the meta-analysis of non-hospitalised versus hospitalised cases of COVID-19 conducted by the COVID-19 Host Genetics Initiative consortium (COVID-19 Host Genetics Initiative (2020) and COVID-19 Host Genetics Initiative: results. 2020 accessed May 13, 2020, at www.covid19hg.org/results). ( ). P<0.0001 was used as the threshold for loci selection and variants that were associated with hospitalisation in only one of the five studies included in the meta-analysis were removed. Variants that had a minor allele frequency of <0.01 and beta coefficients from −1 to 1 were then discarded (Dayem et al., 2018). Linkage disequilibrium pruning was performed using an r2 threshold of 0.5 against the 1000 Genomes European populations (CEU, TSI, FIN, GBR, IBS) representing the ethnicities of the submitted populations (Machiela et al., 2015). Where possible, SNP variants were chosen over insertion—deletion variants to facilitate laboratory validation testing.
The two lead SNPs from the loci found by Ellinghaus et al. (2020) that reached genome-wide significance were also included. Therefore, a panel of 64 SNPs for severe COVID-19 was used.
For the SNPs identified from the COVID-19 Host Genetics Initiative, the odds ratios for severe disease ranged from 1.5 to 2.7 (Table 4). While the inventors would normally construct a SNP relative risk score by using published odds ratios and allele frequencies to calculate adjusted risk values (with a population average of 1) for each SNP and then multiplying the risks for each SNP (Mealiffe et al., 2020), the size of the odds ratios for each SNP meant that this approach could result in relative risk SNP scores of several orders of magnitude. Therefore, to construct the SNP score for this study, the inventors calculated the percentage of risk alleles present in the genotyped SNPs for each participant as generally described in WO 2005/086770. More specifically, for each of the 64 SNPs, if the subject was homozygous for the risk allele they were scored as 2, if they were heterozygous for the risk allele they were scored as 1, and if they we homozygous for the risk allele they were scored as 0. The total number was then converted to a percentage for use in determining risk.
Percentage rather than a count was used because some of the eligible participants had missing data for some SNPs (9% had all SNPs genotyped, 82% were missing 1-5 SNPs and 9% were missing 6-15 SNPs).
Blood type was imputed for genotyped UK Biobank participants using three SNPs (rs505922, rs8176719 and rs8176746) in the ABO gene on chromosome 9q34.2. A rs8176719 deletion (or for those with no result for rs8176719, a T allele at rs505922) was considered to indicate haplotype O. At rs8176746, haplotype A was indicated by the presence of the G allele and haplotype B was indicated by the presence of the T allele (Melzer et al., 2008; Wolpin et al., 2010).
Risk factors for severe COVID-19 were identified from large epidemiological studies of electronic health records (Williamson et al., 2020; Petrilli et al., 2020) and advice posted on the Centers for Disease Control and Prevention website. Rare monogenic diseases (thalassemia, cystic fibrosis and sickle cell disease) were not considered in these analyses.
Age was classified as 50-59 years, 60-69 year and 70+ years. This was based on the participants' approximate age at the peak of the first wave of infections (April 2020) and was calculated using the participants' month and year of birth. Self-reported ethnicity was classified as white and other (including unknown). The Townsend deprivation score at baseline was classified into quintiles defined by the distribution of the score in the UK Biobank as a whole. Body mass index and smoking status were also obtained from the baseline assessment data. Body mass index was inverse transformed and then rescaled by multiplying by 10. Smoking status was defined as current versus past, never or unknown. The other clinical risk factors were extracted from hospital records by selecting records with ICD9 or ICD10 codes for the disease of interest.
Logistic regression was used to examine the association of risk factors with severity of COVID-19 disease. To develop the final model, the inventors began with a base model that included SNP score, age group and gender. They then included all of the candidate variables and used step-wise backwards selection to remove variables with p-values of >0.05. The final model was refined by considering the addition of the removed candidate variables one at a time. Model selection was informed by examination of the Akaike information criterion and the Bayesian information criterion, with a decrease of >2 indicating a statistically significant improvement.
Model calibration was assessed using the Pearson-Windmeijer goodness-of-fit test and model discrimination was measured using the area under the receiver operating characteristic curve (AUC). To compare the effect sizes of the variables in the final model, the inventors used the odds per adjusted standard deviation (Hopper, 2015) using dummy variables for age group and ABO blood type. The intercept and beta coefficients from the final model were used to calculate the COVID-19 risk score for all UK Biobank participants.
Stata (version 16.1) (StataCorp LLC: College Station, Tex., USA) was used for analyses; all statistical tests were two-sided, and p-values of less than 0.05 were considered nominally statistically significant.
Of the 1,582 UK Biobank participants with a positive SARS-CoV-2 test result and hospital and SNP data available, 564 (35.7%) were from an outpatient setting and considered not to have severe disease (controls), while 1,018 (64.4%) were from an inpatient setting and considered to have severe disease (cases). Cases ranged in age from 51 to 82 years with a mean of 69.1 (standard deviation [SD]=8.8) years. Controls ranged in age from 50 to 82 years with a mean of 65.0 (SD=9.0) years. Mean body mass index was 29.0 kg/m2 (SD=5.4) for cases and 28.5 (SD=5.4) for controls. Body mass index was transformed to the inverse multiplied by 10 for all analyses and ranged from 0.2 to 0.6 for both cases and controls. The percentage of risk alleles in the SNP score ranged from 47.6 to 73.8 for cases and from 43.7 to 72.5 for controls. The distributions of the variables of interest for cases and controls and the unadjusted odd ratios and 95% confidence intervals (CI) are shown in Table 12.
The model selected included SNP score, age group, gender, ethnicity, ABO blood type, and a history of autoimmune disease (rheumatoid arthritis, lupus or psoriasis), haematological cancer, non-haematological cancer, diabetes, hypertension or respiratory disease (excluding asthma) and was a good fit to the data (Windmeijer's H=0.02, p=0.9) (Table 13). The SNP score was strongly associated with severity of disease, increasing risk by 19% per percentage increase in risk alleles. A negative impact of age was only evident in the group aged 70 years and over, and while gender was not statistically significant (p=0.3), it was retained because it was one of the three variables considered the base model to which other variables were added. Ethnicity showed a 43% increase in risk for non-whites but was only marginally statistically significant (p=0.06). The AB blood type was protective (p=0.007), but the protective effect of blood type A and the increased risk for blood type B were not statistically significant (p=0.1 and p=0.4, respectively).
The SNP score was, by far, the strongest predictor followed by respiratory disease and age 70 years or older.
The receiver operating characteristic curves for the final model and for alternative models with clinical factors only; SNP score only; and age and gender are shown in
To further improve the method of the invention the inventors downloaded an updated results file on 8 Jan. 2021 from the UK Biobank. Eligible participants were active UK Biobank participants with a positive SARS-CoV-2 test result and who had SNP and hospital data available. Of the 47,990 UK Biobank participants with a SARS-CoV-2 test result available, 8,672 (18.1%) had a positive test result, and of these, 7,621 met the eligibility criteria.
The inventors used source of test result as a proxy for severity of disease, where inpatient results were considered severe disease (cases) and outpatient results were considered non-severe disease (controls). If a participant had more than one test result, they were classified as having severe disease if at least one of their results was from an inpatient setting. Of the 7,621 eligible participants, 2,205 were cases and 5,416 were controls.
The inventors identified a further 40 SNPs from the publicly available (release 4) results of the meta-analysis of non-hospitalised versus hospitalised cases of COVID-19 conducted by the COVID-19 Host Genetics Initiative consortium (COVID-19 Host Genetics Initiative (2020) and COVID-19 Host Genetics Initiative: results. 2020 accessed Jan. 7, 2020, at www.covid19hg.org/results). P<0.0001 was used as the threshold for loci selection and variants that were associated with hospitalisation in only one of the five studies included in the meta-analysis were removed. Variants that had a minor allele frequency of <0.01 and beta coefficients from −1 to 1 were then discarded (Dayem et al., 2018). Linkage disequilibrium pruning was performed using an r2 threshold of 0.5 against the 1000 Genomes European populations (CEU, TSI, FIN, GBR, IBS) representing the ethnicities of the submitted populations (Machiela et al., 2015). Where possible, SNP variants were chosen over insertion—deletion variants to facilitate laboratory validation testing. A further 12 SNPs were identified from publicly available meta-analysis of Covid-19 data (Pairo-Castineira et al., 2020).
The above identified SNPs were combined with the 64 identified in our original study to provide a test SNP panel of 116 SNPs.
To develop a new model to predict risk of severe COVID-19, the inventors used all of the available data and randomly divided it into a 70% training dataset and a 30% validation dataset (ensuring that it was balanced for origin of test result). Because the missing data is assumed to be missing at random (if not missing completely at random), a multiple imputation with 20 imputations was used to address the missing data for body mass index (linear regression) and the SNP data (predictive mean matching) for the development of the new model in the training dataset. To more closely reflect the availability of data in the real world, the inventors did not use imputed data in the validation dataset.
The clinical variables considered for inclusion in the new model were age, sex, BMI, ethnicity, ABO blood type and the following chronic health conditions: asthma, autoimmune disease (rheumatoid arthritis, lupus or psoriasis), haematological cancer, non-haematological cancer, cerebrovascular disease, diabetes, heart disease, hypertension, immunocompromised, kidney disease, liver disease and respiratory disease (excluding asthma). Dummy variables were used for the categorical classifications of age and ABO blood type.
The SNPs selected for the development of the new model came from three sources: (i) from Tables 2 to 4, (ii) the 40 SNPs newly selected from the (release 4) results of the COVID-19 Host Genetics Initiative meta-analysis of non-hospitalised versus hospitalised cases of COVID-191 2 and (iii) the 12 SNPs from the paper by Pairo-Castineira et al. (2020). The inventors used unadjusted logistic regression in the testing dataset to identify SNPS that were associated with risk of severe COVID-19 with P<0.05 (see Table 14).
Stata (version 16.1) was used for analyses; all statistical tests were two-sided and P<0.05 was considered nominally statistically significant.
The inventors used multivariable logistic regression in the multiple imputation training dataset to develop the new model to predict risk of severe COVID-19. The inventors began with a model that included all the clinical variables and the SNPs with unadjusted associations with severe COVID-19 and used backwards stepwise selection to develop the most parsimonious model. For the removed variables a final determination was made on their inclusion or exclusion by adding them one at a time to the parsimonious model. To directly compare the effect sizes of the variables in the final model, regardless of the scale on which they were measured, the odds per adjusted standard deviation was used. The intercept and beta coefficients from the new model to calculate the COVID-19 risk score was used for all eligible UK Biobank participants.
The inventors assessed the performance of the new model in the imputed development dataset and in the non-imputed validation dataset. The association between the risk score and severe COVID-19 was assessed using logistic regression to estimate the odds ratio per quintile of risk score. It was assessed model discrimination using the area under the receiver operating characteristic curve (AUC). For models that showed good discrimination, calibration was assessed using logistic regression of the log of the risk score to estimate the intercept and the slope (beta coefficient). An intercept close to 0 indicated good calibration, while an intercept less than 0 indicated overall overestimation of risk and an intercept greater than 0 indicated overall underestimation of risk. A slope of close to 1 indicated good dispersion with a slope of less than 1 indicating over-dispersion and slope of greater than 1 indicating under-dispersion.
The best performing tests are detailed below.
Three models were developed for assessing the risk of a human subject developing a severe response to a Coronavirus infection. In particular, the methods can be used to determine the probability the subject would require hospitalisation if infected with a Coronavirus. The first model is based solely on sex and age (referred to herein as the “age and sex model”), the second model (referred to herein as the “full model”) includes numerous clinical factors and genetic factors, whereas the third model (referred to herein as the “expanded model”) includes additional clinical factors and genetic factors to those in the full model.
Inputs of the age and sex model are provided in Table 15 and the β-coefficients provided in Table 16.
The long odds is calculated using: Log odds (LO)=−1.749562+Σ Clinical β coefficients.
The age and sex relative risk=eLO.
Age and sex probability=eLO/(1+eLO).
If any of the clinical factors are unknown, or the subject is unwilling to supply the relevant details, that factor(s) is assigned a β coefficient of 0.
Inputs of the full model are provided in Table 17 and the β-coefficients provided in Tables 18 and 19.
The SNP risk factor (SRF) is determined using: (SRF)=Σ (No of Risk Alleles×SNP β coefficient).
The long odds is calculated using: Log odds (LO)=−1.36523+SRF+Σ Clinical β coefficients.
The age and sex relative risk=eLO.
Age and sex probability=eLO/(1+eLO).
If any of the clinical factors are unknown, or the subject is unwilling to supply the relevant details, that factor(s) is assigned a β coefficient of 0.
Inputs of the expanded model are provided in Table 20 and the β-coefficients provided in Tables 21 and 22.
The SNP risk factor (SRF) is determined using: (SRF)=Σ (No of Risk Alleles×SNP β coefficient).
The long odds is calculated using: Log odds (LO)=−1.469939+SRF+Σ Clinical β coefficients.
The age and sex relative risk=eLO.
Age and sex probability=eLO/(1+eLO).
If any of the clinical factors are unknown, or the subject is unwilling to supply the relevant details, that factor(s) is assigned a β coefficient of 0.
In terms of discrimination between cases and controls, the age and sex model had an AUC of 0.671 (95% CI=0.646, 0.696) but the full model with an AUC of 0.732 (95% CI=0.708, 0.756) was a substantial improvement (χ2=41.23, df=1, P<0.001). The receiver operating characteristic curves for both models are shown in
The models were well calibrated with no evidence of overall overestimation or underestimation for the age and sex model (α=−0.02; 95% CI=−0.18, 0.13; P=0.7) or the full model (α=−0.08; 95% CI=−0.21, 0.05; P=0.3). There was also no evidence of under or over dispersion for the age and sex model (β=0.96, 95% CI=0.81, 1.10, P=0.6) and for the full model (β=0.90, 95% CI=0.80, 1.00, P=0.06). Calibration plots for both models are shown in
The inventors calculated the probability of severe COVID-19 for all UK Biobank participants who met our eligibility criteria for this study; the distributions are shown in
The expanded model provided a slight improvement in discrimination in this dataset (Table 23).
The algorithm to calculate the risk of developing severe Covid-19 has been modified to enable a risk calculation to be provided for patients aged 18-85 years (previously 50-85 years). More specifically, the look-up tables providing the age-related risk values have been modified to include three additional values for the following age ranges: 18-29, 30-39, 40-49 (Tables 24).
For people aged under 50 years, the probability of severe disease is adjusted using data on risk of hospitalization due to Covid-19 which were obtained from the United States Centers for Disease Control and Prevention (www.cdc.gov).
The SNPs analysed, and the methods used for analysis, are the same as used in Example 6.
The present application claims priority from AU 2020901739 filed 27 May 2020, AU 2020902052 filed 19 Jun. 2020, AU 2020903536 filed 30 Sep. 2020, and AU 2021900392 filed 17 Feb. 2021, the entire contents of each of which are incorporated herein by reference.
It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.
All publications discussed and/or referenced herein are incorporated herein in their entirety.
Any discussion of documents, acts, materials, devices, articles or the like which has been included in the present specification is solely for the purpose of providing a context for the present invention. It is not to be taken as an admission that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present invention as it existed before the priority date of each claim of this application.
Number | Date | Country | Kind |
---|---|---|---|
2020901739 | May 2020 | AU | national |
2020902052 | Jun 2020 | AU | national |
2020903536 | Sep 2020 | AU | national |
2021900392 | Feb 2021 | AU | national |
This application is a continuation of PCT International Application No. PCT/AU2021/050507, filed May 26, 2021, which claims the priority of each of Australian Application No. 2020901739, filed May 27, 2020, Australian Application No. 2020902052, filed Jun. 19, 2020, Australian Application No. 2020903536, filed Sep. 30, 2020, and Australian Application No. 2021900392, filed Feb. 17, 2021 the contents of each of which are hereby incorporated by reference in their entirety into this application.
Number | Date | Country | |
---|---|---|---|
Parent | 17368471 | Jul 2021 | US |
Child | 17667282 | US | |
Parent | PCT/AU2021/050507 | May 2021 | US |
Child | 17368471 | US |