The present invention relates to identification of an abnormal splice site. In particular, provided are methods of identifying an abnormal splice site. Methods of classifying the risk of abnormal splicing of a splice site are also provided. Databases for use in the methods provided herein are also disclosed.
Any discussion of the prior art throughout the specification should in no way be considered as an admission that such prior art is widely known or forms part of common general knowledge in the field.
Splicing of pre-mRNA in eukaryotes involves recognition of exons and introns. During splicing, the borders of introns are recognized, cleaved, and exons are then ligated together. A splicing event requires the assembly of splicing machinery in spliceosome complexes on consensus elements present in the splice site (e.g., the donor splice site, the branch site, the acceptor splice site). Genetic variants affecting a splice site (an abnormal splice site) disrupt splicing processes leading to aberrant splicing and causing diseases, including inherited diseases (genetic disorders) and cancer.
Many abnormal splice sites remain unclassified (variant of unknown significance (VUS)), meaning their clinical significance also remains unclassified. Thus, patients with, for example, an inherited disease (genetic disorder) may not receive a genetic diagnosis. An understanding of the genetic cause of a disease is important to guide clinical management and enable personalised and precision medicine. Accordingly, determining the clinical significance of an abnormal splice site may lead to a genetic diagnosis to direct the clinical care and application and development of therapies.
It is an object of the present invention to overcome or ameliorate at least one of the disadvantages of the prior art, or to provide a useful alternative.
The inventors recognized that variants of splice sites, which are not present in any splice site of the human genome, have a high likelihood of exhibiting abnormal splicing (eg reducing splicing, non-splicing, exon skipping, or any splicing event associated with a pathogenic phenotype) and are referred to herein as abnormal splice sites. Thus, herein provided are methods of identifying an abnormal splice site based on a determination of the presence or absence of a sample splice site, or a portion thereof, in any splice site in a reference human genome. This determination may be referred to herein as Native Intron Frequency. Thereby a risk of abnormal splicing of a sample splice site may be determined. A sample splice site that is absent from the human genome has a high risk of abnormal splicing. A sample splice site that is infrequently used in the human genome may have a high risk of abnormal splicing. The inventors recognized that the relative shift in frequency of a sample splice site, as determined by a comparison of frequency of a sample splice site with the frequency of the originating splice site (the spice site correlating to the sample splice site in the human genome (referred to herein as a reference splice site sequence)), may be used to determine a risk of abnormal splicing. The relative shift in frequency may be compared to a reference dataset comprising variant splice sites (with their corresponding relative shift in frequency in comparison to a reference human genome) and their classification (abnormal splice site or benign variant splice site). Thereby, a risk of abnormal splicing of a sample splice site may be determined.
Other factors may be used in conjunction with the measure of frequency of a splice site in the human genome to determine a risk of abnormal splicing of a sample splice site. One additional factor, which may be referred to as a previous classification factor, considers whether the splice site, or a portion thereof, has previously been classified clinically as an abnormal splice site or a benign variant splice site. A previous classification factor may be determined by comparing a sample splice site to a reference dataset of splice sites with a known clinical classification (e.g., abnormal splice site or benign variant splice site). Another additional factor, which may be referred to as a similar splice site frequency shift factor or (similar NIF-shift factor), considers the clinical classification (e.g., abnormal splice site or benign variant splice site) of variant splice sites having similar relative shifts in Native Intron Frequency to a sample splice site.
It will be appreciated that in the method herein described identification of an abnormal splice site in a sample splice site from a subject may comprise or consist of a determination of a risk of abnormal splicing of the sample splice site. Thereby, a risk of abnormal splicing of a sample splice site may be considered as a risk that a sample splice site is an abnormal splice site.
In a first embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
In further embodiments related to the first embodiment, the sample splice site may be a donor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 12 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, or 12 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 15 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or up to 15 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 9 consecutive nucleotides of a donor splice site. In certain embodiments, the splice site is a donor splice site, steps (a) and (b) are repeated with a second sample splice site sequence comprised in the same sample splice site, and NIFvar-2 is determined, wherein a NIFvar of 0 (zero) for any sample splice site sequence indicates that the sample splice site is abnormal. In certain embodiments, the sample splice site is a donor splice site, and steps (a) and (b) are repeated with up to five additional sample donor splice site sequences comprised in the same sample splice site, and NIFvar-2, NIFvar-3, NIFvar-4, NIFvar-5, up to NIFvar-6 are determined and correspond to the NIFvar for each of the second, third, fourth, fifth, and up to the sixth sample donor splice site sequence, respectively, wherein a NIFvar of 0 (zero) for any sample donor splice site sequence indicates that the sample donor splice site is abnormal. In certain embodiments, the sample splice site is a donor splice site, and steps (a) and (b) are repeated with up to five additional sample donor splice site sequences, wherein each sample donor splice site sequence comprises 9 non-identical consecutive nucleotides of the same sample donor splice site, and wherein one or more of the sample donor splice site sequences may comprise overlapping consecutive nucleotides of the donor splice site. In a related embodiment comprising at least six sample splice site sequences from the same sample splice site, the sample splice site sequences correspond to at least nucleotide positions E−5 to D+4, E−4 to D+5, E−3 to D+6, E−2 to D+7, E−1 to D+8, and D+1 to D+9 of a donor splice site. In a related embodiment comprising at least four sample splice site sequences from the same sample splice site, the sample splice site sequences correspond to at least nucleotide positions E−4 to D+5, E−3 to D+6, E−2 to D+7 and E−1 to D+8 of a donor splice site, wherein the nomenclature E−4 to E−1 corresponds to the last four nucleotides of an exon and D+1 to D+8 correspond the first eight nucleotides of the intron.
In further embodiments related to the first embodiment, the sample splice site is a donor splice site. In certain embodiments, the sample splice site sequence comprises 6 to 15 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 12 consecutive nucleotides of a donor splice site that is analysed as a collective of multiple, overlapping donor reference splice site sequences, wherein the median of NIFvar-1, NIFvar-2, NIFvar-3, NIFvar-4 and up to NIFvar-6, corresponding to NIFvar for each of the first, second, third, fourth and up to sixth sample donor splice site sequences is determined. In certain embodiments, the sample splice site is a donor splice site of 12 nucleotides divided into four sample splice site sequences comprised of 9 non-identical sequences of consecutive nucleotides corresponding to nucleotide positions E−4 to D+5, E−3 to D+6, E−2 to D+7 and E−1 to D+8 of a donor splice site. The median NIFvar-x is calculated as median (NIFvar-1; NIFvar-2; NIFvar-3; NIFvar-4) wherein a median NIFvar-x of 0 (zero) for any sample donor splice site sequence indicates that the sample donor splice site is abnormal.
In further embodiments related to the first embodiment, the sample splice site is a donor splice site. In certain embodiments, the sample splice site sequence comprises 6 to 15 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 12 consecutive nucleotides of a donor splice site that is analysed as a collective of multiple, overlapping donor reference splice site sequences, wherein the percentile for each of NIFvar-1, NIFvar-2, NIFvar-3, NIFvar-4 and up to NIFvar-6, corresponding to NIFvar for each of the first, second, third, fourth and up to sixth sample donor splice site sequences is determined. In certain embodiments, the sample splice site is a donor splice site of 12 nucleotides divided into four sample splice site sequences comprised of 9 non-identical sequences of consecutive nucleotides corresponding to nucleotide positions E−4 to D+5, E−3 to D+6, E−2 to D+7 and E−1 to D+8 of a donor splice site. The median percentile NIFvar-x is calculated as median (NIFvar-1 percentile; NIFvar-2 percentile; percentile of NIFvar-3 percentile; NIFvar-4 percentile) wherein a median percentile NIFvar-x of 0 (zero) for any sample donor splice site sequence indicates that the sample donor splice site is abnormal.
In further embodiments related to the first embodiment, the sample splice site sequence comprises 12 consecutive nucleotides of a donor splice site that is analysed as a collective of multiple, overlapping donor reference splice site sequences, wherein the median NIFvar-x is converted to a percentile value. For example, a sample splice site with a median NIFvar-x of 0 (zero) lies within the zeroth percentile of a frequency distribution of median NIFref-x among all donor splice sites in the reference human genome. A sample donor splice site with median NIFvar-x in the zeroth percentile indicates that the sample donor splice site is abnormal
In related embodiments, the use of median NIFvar-x described in Section [0012] may be substituted for mean NIFvar-x calculated as mean (NIFvar-1; NIFvar-2; NIFvar-3; NIFvar-4) and a mean NIFvar-x of 0 (zero) for any sample donor splice site sequence indicates that the sample donor splice site is abnormal.
In related embodiments, the use of median NIFvar-x converted to a percentile value described in Section [0013] may be substituted for mean (percentile of NIFvar-1; percentile of NIFvar-2; percentile of NIFvar-3; percentile of NIFvar-4) wherein a median percentile NIFvar-x of 0 (zero) for any sample donor splice site sequence indicates that the sample donor splice site is abnormal.
In a second embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
In a further embodiment related to the second embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
In embodiments related to the second embodiment, the sample splice site may be a donor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 12 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, or 12 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 15 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or up to 15 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 9 consecutive nucleotides of a donor splice site. In certain embodiments, the method is repeated with one or more sample splice site sequences comprised in the same sample splice site; wherein a risk of abnormal splicing is determined by comparing each NIFvar-x with a corresponding NIFref-x against a CSP reference database. In certain embodiments the sample splice site is a donor splice site, the method is repeated with a second sample donor splice site sequence comprised in the same sample splice site and a corresponding second reference donor splice site sequence, and NIFvar-2 and NIFref-2 are determined. In certain embodiments, the sample splice site is a donor splice site, the method is repeated with up to five additional sample donor splice site sequences comprised in the same sample splice site, and five respective donor reference splice site sequences, wherein NIFvar-2, NIFvar-3, NIFvar-4, NIFvar-5, up to NIFvar-6, corresponding to NIFvar for each of the second, third, fourth, fifth, and up to sixth sample donor splice site sequence, and NIFref-2, NIFref-3, NIFref-4, NIFref-5, and up to NIFref-6, corresponding to NIFref for each of the second, third, fourth, fifth, and up to sixth reference donor splice site sequences. In certain embodiments, the splice site is a donor splice site, and the steps are repeated with up to five additional sample donor splice site sequences comprised in the same sample splice site, wherein each sample donor splice site sequence comprises 9 non-identical consecutive nucleotides of the donor splice site, and wherein the sample donor splice site sequences may comprise overlapping consecutive nucleotides of the sample donor splice site. In a related embodiment comprising at least six sample splice site sequences from a sample splice site, the sample splice site sequences correspond to at least nucleotide positions E−5 to D+4, E−4 to D+5, E−3 to D+6, E−2 to D+7, E−1 to D+5, and D+1 to D+9 of a donor splice site. In a related embodiment comprising at least four sample splice site sequences from a sample splice site, the sample splice site sequences correspond to at least nucleotide positions E−4 to D+5, E−3 to D+6, E−2 to D+7 and E−1 to D+8 of a donor splice site.
In embodiments related to the second embodiment, the sample splice site is a donor splice site. In certain embodiments, the sample splice site sequence comprises 6 to 15 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 12 consecutive nucleotides of a donor splice site that is analysed as a collective of multiple, overlapping donor reference splice site sequences, wherein the median of NIFvar-1, NIFvar-2, NIFvar-3, NIFvar-4 and up to NIFvar-6, corresponding to NIFvar for each of the first, second, third, fourth and up to sixth sample donor splice site sequences, is compared with the median of NIFref-1, NIFref-2, NIFref-3, NIFref-4 and up to NIFref-6, corresponding to NIFref for each of the first, second, third, fourth and up to sixth reference donor splice site sequences. In certain embodiments, the sample splice site is a donor splice site of 12 nucleotides divided into four sample splice site sequences comprised of 9 non-identical sequences of consecutive nucleotides corresponding to nucleotide positions E−4 to D+5, E−3 to D+6, E−2 to D+7 and E−1 to D+8 of a donor splice site. The median NIFvar-x is calculated as median (NIFvar-1; NIFvar-2; NIFvar-3; NIFvar-4) and the median NIFref-x is calculated as median (NIFref-1; NIFref-2; NIFref-3; NIFref-4), wherein each analagous variant and reference donor splice site sequence NIFvar-1 and NIFref-1, NIFvar-2 and NIFref-2, NIFvar-3 and NIFref-3, NIFvar-4 and NIFref-4 originate from the same corresponding region of a gene and respectively encompass nucleotide positions E−4 to D+5, E−3 to D+6, E−2 to D+7 and E−1 to D+8.
In further embodiments related to the second embodiment, the sample splice site is a donor splice site. In certain embodiments, the sample splice site sequence comprises 6 to 15 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 12 consecutive nucleotides of a donor splice site that is analysed as a collective of multiple, overlapping donor reference splice site sequences, wherein the median percentile NIFvar-x is calculated as median (NIFvar-1 percentile; NIFvar-2 percentile; percentile of NIFvar-3 percentile; NIFvar-4 percentile) wherein a median percentile NIFvar-x of 0 (zero) for any sample donor splice site sequence indicates that the sample donor splice site is abnormal. For example, a hypothetical site with percentile NIFvar-1=0.2499, percentile NIFvar-2=0.5904, percentile NIFvar-3=0.7172, percentile NIFvar-4=0.9065 has a median percentile NIFvar-x of 0.6538. For the same hypothetical example, a site with percentile NIFvar-1=0.0077, percentile NIFvar-2=0.0295, percentile NIFvar-3=0.0493, percentile NIFvar-4=0.0635 has a median percentile NIFvar-x of 0.0394 Therefore, the net percentile change in median NIF for the hypothetical sample splice site is 0.0602 (0.0394/0.6538).
In embodiments related to the second embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
In further embodiments related to the second embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
In further embodiments related to the second embodiment, the use of median NIFvar-x described in Section [0019] and Section [0021] may be substituted for mean NIFvar-x calculated as mean (NIFvar-1; NIFvar-2; NIFvar-3; NIFvar-4).
In further embodiments related to the second embodiment, the use of median NIFvar-x converted to a percentile value described in Section [0020] and Section [0022] may be substituted for mean percentile NIFvar-x.
In a third embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
(a) obtaining a first sample splice site sequence comprised in the sample splice site from the subject;
(b) determining a clinical classification(s) associated with the nucleotide sequence of the first sample splice site sequence;
(c) determining a clinical classification(s) associated with the nucleotide sequence of the first reference splice site sequence; and
(d) determining a risk of abnormal splicing for the sample splice site by assessing the clinical classification(s) associated with the nucleotide sequence of the first sample splice site sequence determined in step (c).
In an embodiment related to the third embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
(a) obtaining a first sample splice site sequence comprised in the sample splice site from the subject;
(b) obtaining a first reference splice site sequence; wherein the first reference splice site sequence and the first sample splice site sequence each originate from the same corresponding region of a gene;
(c) determining a clinical classification(s) associated with the nucleotide sequence of the first sample splice site sequence;
(d) determining a clinical classification(s) associated with the nucleotide sequence of the first reference splice site sequence; and
(e) determining a risk of abnormal splicing for the sample splice site by assessing the clinical classification(s) associated with the nucleotide sequence of the first sample splice site sequence determined in step (c) and the clinical classification(s) associated with the nucleotide sequence of the first reference splice site sequence determined in step (d).
In further embodiments related to the third embodiment, the sample splice site may be a donor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 12 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7. 8, 9, 10, 11, or 12 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 15 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or up to 15 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 9 consecutive nucleotides of a donor splice site. In certain embodiments comprising determining a clinical classification(s) associated with a sample splice site sequence, and (optionally) a reference splice site sequence, the sample splice site is a donor splice site, the steps are repeated with up to five sample splice site sequences comprised in the same sample splice site and (optionally) corresponding respective reference splice site sequences, and determining a risk of abnormal splicing for the sample splice site includes assessing the clinical classification(s) associated with the nucleotide sequence of each sample splice site sequence and (optionally) each corresponding reference splice site sequence. In embodiments related to the third embodiment, a clinical classification(s) as recited may be determined by querying a CSP database for the respective nucleotide sequence of the sample splice site sequence and/or the nucleotide sequence of the corresponding reference splice site sequence. A risk of abnormal splicing for a sample splice site may be determined by considering the number of times the nucleotide sequence of each sample splice site sequence has been identified as an abnormal splice site.
In an embodiment related to the third embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
In a further embodiment related to the third embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
In further embodiments, calculation of the median NIFvar-x described in Section [0028] may be substituted for the mean NIFvar-x.
In further embodiments, calculation of the median percentile NIFvar-x in Section [0029 may be substituted for the mean percentile NIFvar-x.
In a fourth embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
(a) obtaining a first sample splice site sequence comprised in the sample splice site from the subject;
(b) determining a measure of Native Intron Frequency of the first sample splice site sequence (NIFvar-1);
(c) determining a Percentile (NIFvar-1) of the first sample splice site sequence;
(d) determining a measure of Native Intron Frequency of a first reference splice site sequence (NIFref-1); wherein the first reference splice site sequence and the first sample splice site sequence originate from the same corresponding region of a gene;
(e) determining a Percentile (NIFref-1) of the first reference splice site sequence;
(f) calculating a lower bound and an upper bound for Percentile (NIFvar-1) and calculating a lower bound and an upper bound for Percentile (NIFref-1);
(g) determining a range of NIF-shift by comparing the lower and upper bounds for Percentile (NIFvar-1) with the lower and upper bounds for Percentile (NIFref-1) calculated in (f);
(h) identifying (a) similar NIF-shift variant(s), wherein a similar NIF-shift variant refers to a splice site sequence with a NIF-shift within the range of NIF-shift determined in (g);
(i) determining (a) clinical classification(s) associated with each similar NIF-shift variant identified in step (h); and
(j) determining a risk of abnormal splicing for the sample splice site by assessing the clinical classification(s) determined in step (i) for each similar NIF-shift variant identified in step (h).
In an embodiment related to the fourth embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
(a) obtaining a first sample splice site sequence comprised in the sample splice site from the subject;
(b) determining a measure of Native Intron Frequency of the first sample splice site sequence (NIFvar-1);
(c) determining a measure of Native Intron Frequency of a first reference splice site sequence (NIFref-1); wherein the first reference splice site sequence and the first sample splice site sequence originate from the same corresponding region of a gene;
(d) calculating a lower bound and an upper bound for NIFvar-1 and calculating a lower bound and an upper bound for NIFref-1;
(e) determining a range of NIF-shift by comparing the lower and upper bounds for NIFvar-1 with the lower and upper bounds for NIFref-1 calculated in (d);
(f) identifying (a) similar NIF-shift variant(s), wherein a NIF-shift variant refers to a splice site sequence with a NIF-shift within the range of NIF-shift determined in (e);
(g) determining (a) clinical classification(s) associated with each similar NIF-shift variant identified in step (f); and
(h) determining the risk of abnormal splicing for the sample splice site by assessing the clinical classification(s) determined in step (g) for each similar NIF-shift variant identified in step (f).
In embodiments related to the fourth embodiment, the sample splice site may be a donor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 12 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, or 12 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 15 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or up to 15 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 9 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site is a donor splice site, the steps are repeated with up to five sample splice site sequences comprised in the same sample splice site and corresponding reference splice site sequences, and the method includes assessing the clinical classification(s) associated with each similar NIF-shift variant identified. In certain embodiments, the sample splice site is a donor splice site, and the steps are repeated with up to five additional sample donor splice site sequences, wherein each sample donor splice site sequence comprises 9 non-identical consecutive nucleotides of the same sample donor splice site, and wherein the sample donor splice site sequences may comprise overlapping consecutive nucleotides of the donor splice sites. In a related embodiment comprising at least six sample splice site sequences from the same sample splice site, the sample splice site sequences correspond to at least nucleotide positions E−5 to D+4, E−4 to D+5, E−3 to D+6, E−2 to D+7, E−1 to D+8, and D+1 to D+9 of a donor splice site. In a related embodiment comprising at least four sample splice site sequences from the same sample splice site, the sample splice site sequences correspond to at least nucleotide positions E−4 to D+5, E−3 to D+6, E−2 to D+7 and E−1 to D+8 of a donor splice site.
In embodiments related to the fourth embodiment, suitable upper and lower bounds of a NIF or Percentile (NIF) may be calculated based on a percentage (e.g., 10%, 5%, 2.5%, 2%) of a logarithmic distribution of NIF or Percentile (NIF), median NIF or Percentile median NIF, mean NIF or Percentile mean NIF, wherein the upper and lower bounds are whole numbers rounded to the nearest whole numbers.
In a fifth embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising
(a) obtaining a first sample splice site sequence comprised in the sample splice site from the subject;
(b) determining a measure of Native Intron Frequency of the first sample splice site sequence (NIFvar-1);
(c) determining a Percentile (NIFvar-1) of the first sample splice site sequence;
(d) determining a measure of Native Intron Frequency of a first reference splice site sequence (NIFref-1); wherein the first reference splice site sequence and the first sample splice site sequence each originate from the same corresponding region of a gene;
(e) determining a Percentile (NIFref-1) of the first reference splice site sequence; (f) determining (a) clinical classification(s) associated with the nucleotide sequence of the first sample splice site sequence;
(g) optionally determining (a) clinical classification(s) associated with the nucleotide sequence of the first reference splice site sequence;
(h) calculating a lower bound and an upper bound for Percentile (NIFvar-1) and calculating a lower bound and an upper bound for Percentile (NIFref-1);
(i) determining a range of NIF-shift by comparing the lower and upper bounds for Percentile (NIFvar-1) and the lower and upper bounds for Percentile (NIFref-1) calculated in (h);
(j) identifying (a) similar NIF-shift variant(s), wherein a similar NIF-shift variant refers to a splice site sequence with a NIF-shift within the range of NIF-shift determined in (i);
(k) determining (a) clinical classification(s) associated with each similar NIF-shift variant identified in step (j); and
(l) determining a risk of abnormal splicing for the sample splice site by (1) comparing the Percentile (NIFvar-1) with the Percentile (NIFref-1) against a CSP reference database, (2) assessing the clinical classification(s) associated with the nucleotide sequence of the first sample splice site sequence determined in step (f); and (3) assessing the clinical classification determined in step (k) for each NIF-shift variant identified in step (j).
In a related embodiment, step (g) is carried out; and step (l) may further comprise as part of (2), analysing the clinical classification(s) associated with the nucleotide sequence of the first reference splice site sequence determined in step (g).
In an embodiment related to the fifth embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising
(a) obtaining a first sample splice site sequence comprised in the sample splice site from the subject;
(b) determining a measure of Native Intron Frequency of the first sample splice site sequence (NIFvar-1);
(c) determining a measure of Native Intron Frequency of a first reference splice site sequence (NIFref-1); wherein the first reference splice site sequence and the first sample splice site sequence each originate from the same corresponding region of a gene;
(d) determining (a) clinical classification(s) associated with the nucleotide sequence of the first sample splice site sequence;
(e) optionally determining (a) clinical classification(s) associated with the nucleotide sequence of the first reference splice site sequence;
(f) calculating a lower bound and an upper bound for NIFvar-1 and calculating a lower bound and an upper bound for NIFref-1;
(g) determining a range of NIF-shift by comparing the lower and upper bounds for NIFvar-1 and the lower and upper bounds for NIFref-1 calculated in (f);
(h) identifying (a) similar NIF-shift variant(s), wherein a similar NIF-shift variant refers to a splice site sequence with a NIF-shift within the range of NIF-shift determined in (g);
(i) determining (a) clinical classification(s) associated with each similar NIF-shift variant identified in step (h); and
(j) determining a risk of abnormal splicing for the sample splice site by (1) comparing the NIFvar-1 with the NIFref-1 against a CSP reference database, (2) assessing the clinical classification(s) associated with the nucleotide sequence of the first sample splice site sequence determined in step (d); and (3) assessing the clinical classification determined in step (i) for each similar NIF-shift variant identified in step (h).
In a related embodiment, step (e) is carried out; and step (j) may further comprise as part of (2), analysing the clinical classification(s) associated with the nucleotide sequence of the first reference splice site sequence determined in step (e).
In further embodiments related to the fifth embodiment, the sample splice site may be a donor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 12 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, or 12 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 15 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or up to 15 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 9 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site is a donor splice site, and the method is repeated with up to five sample splice site sequences comprised in the same sample splice site and corresponding respective reference splice site sequences. In certain embodiments, the splice site is a donor splice site, and the steps are repeated with up to five additional sample donor splice site sequences comprised in the same sample splice site, wherein each sample donor splice site sequence comprises 9 non-identical consecutive nucleotides of the donor splice site, and wherein the sample donor splice site sequences may comprise overlapping consecutive nucleotides of the donor splice site. In a related embodiment comprising at least six sample splice site sequences from the same sample splice site, the sample splice site sequences correspond to at least nucleotide positions E−5 to D+4, E−4 to D+5, E−3 to D+6, E−2 to D+7, E−1 to D+8, and D+1 to D+9 of a donor splice site. In a related embodiment comprising at least four sample splice site sequences from the same sample splice site, the sample splice site sequences correspond to at least nucleotide positions E−4 to D+5, E−3 to D+6, E−2 to D+7 and E−1 to D+8 of a donor splice site.
In an embodiment related to the fifth embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
In an embodiment related to the fifth embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
In a sixth embodiment provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
In an embodiment related to the sixth embodiment is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
Methods of identifying an abnormal splice site in a sample splice site further relate to combinations of any method or any embodiment herein disclosed, including combinations of embodiments related to the first, second, and third embodiments or embodiments related to the first, second and fourth embodiments. Combinations of embodiments related to the first, second, third, and/or fourth embodiments are also envisioned. Certain embodiments relate to a combination of the second, third, fourth, fifth and sixth embodiments. Certain embodiments relate to a combination of the second and fourth embodiments. It will be appreciated that in relation to combinations of embodiments, there is no requirement to carry out the combination of embodiments and/or steps of an embodiment in any particular order. Methods comprising determining a measure of frequency of a sample splice site in combination with a previous classification factor and/or similar splice site frequency shift factor (similar NIF-shift factor) and/or competitive cryptic splice site factor are envisioned.
Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise”, “comprising” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to”.
As used herein, the term “about” can mean within 1 or more standard deviation per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 10%, or up to 5%. In certain embodiments, “about” can mean to 5%.
As used herein and in the appended claims, the singular form of “a”, “an”, and “the” may include the plural referents unless the context clearly dictates otherwise. It is further noted that the claims may be drafted to exclude any optional element.
As used herein, the term “splice site” refers to a consensus element in an exon and/or an intron of genomic DNA, including, but not limited to, a donor splice site, a branch site, and an acceptor splice site.
As used herein, the term “splice site sequence” refers to a region of nucleotides in a splice site. A splice site sequence may comprise one or more regions of consecutive nucleotides of a sample splice site. In certain embodiments, a splice site sequence may comprise one or more regions of consecutive nucleotides with one or more groups consisting of a single nucleotide. A splice site sequence may comprise nucleotides from an exon, an intron, or both an exon and an intron. In one embodiment, a splice site sequence comprises or consists of nucleotides of an intron. In one embodiment, a splice site sequence is a donor splice site sequence comprising nucleotides of an exon and intron.
As used herein, the term “donor splice site” refers to a consensus element located near the 5′ end of an intron and also referred to as an “exon-intron boundary”. In one embodiment, a donor splice site comprises or consists of nucleotides of an intron. In one embodiment, a donor splice site comprises nucleotides of an exon-intron boundary comprising at least one nucleotide from the 3′ end of an exon and at least 4 nucleotides of the 5′ end of an intron. In one embodiment, a “donor splice site” comprises the five-3′end nucleotides of the exon (E−5 to E−1) and the eight-5′end nucleotides of the intron (D+1 to D+8). In one embodiment, a “donor splice site” comprises the five-3′end nucleotides of the exon (E−5 to E−1) and the nine-5′end nucleotides of the intron (D+1 to D+9). In certain embodiments, the GT (or GC) nucleotides corresponding to the essential splice site that encompass the first two nucleotides of the intron, are denoted as positions D+1 and D+2 of the donor splice site.
As used herein, the term “donor splice site sequence” refers to nucleotides comprised in a donor splice site. In certain embodiments, a donor splice site sequence comprises 4 to 12 nucleotides of a donor splice site. In one embodiment, a donor splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, or 12 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 15 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or up to 15 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more consecutive nucleotides of a donor splice site. In certain embodiments, a donor splice site sequence comprises 9 consecutive nucleotides of a donor splice site. In certain embodiments, a donor splice site sequence comprises or consists of nucleotides of an intron. In certain embodiments, a donor splice site sequence comprises at least one nucleotide of an exon. In certain embodiments, a donor splice site sequence comprises nucleotides of an exon and nucleotides of an intron.
As used herein, the term “essential donor splice site” refers to the first two nucleotides of the intron, denoted as positions D+1 (first nucleotide of the intron) and D+2 (second nucleotide of the intron). The skilled person will be familiar that the essential donor splice site is comprised of GT (guanine, thymine) nucleotides at the first and second position of the intron for ˜99% of human introns.
As used herein, the term “branch site” refers to a consensus element located near the 3′ end of an intron and is upstream of the polypyrimidine tract.
As used herein, the term polypyrimidine tract refers to a consensus element located near the 3′ end of an intron that is enriched in pyrimidine nucleotides cytosine (C) and thymine (T).
As used herein, the term “branch site sequence” refers to nucleotides comprised in a branch site. In certain embodiments, a branch site sequence comprises 6 to 9 nucleotides of a branch site that includes the branchpoint A (adenosine or adenine). In certain embodiments, a branch site sequence comprises 6, 7, 8, or 9 consecutive nucleotides of a branch site. In certain embodiments, a branch splice site sequence comprises 7 consecutive nucleotides of a branch site.
As used herein, the term “acceptor splice site” refers to a consensus element located near the 3′ end of an intron also referred to as the “intron-exon boundary”. In one embodiment, an acceptor splice site comprises nucleotides of an intron-exon boundary comprising at least two nucleotides from the 3′ end of an intron and at least one nucleotide of the 5′ end of an exon.
As used herein, the term “acceptor essential splice site” refers to the last two nucleotides of the intron, denoted as positions A−2 (second to last nucleotide of the intron) and A−1 (last nucleotide of the intron). The skilled person will be familiar that the essential acceptor splice site is comprised of AG (adenine, guanine) nucleotides at the second last and last nucleotides of the intron, respectively, for ˜99% of human introns.
As used herein, the term “acceptor splice site sequence” refers to nucleotides comprised in an acceptor splice site. The skilled person will be familiar that the acceptor splice site sequence encompasses the branchpoint, the polypyrimidine tract and the acceptor essential splice site. In certain embodiments, an acceptor splice site sequence comprises 6 to 60 nucleotides of an acceptor splice site. In one embodiment, an acceptor splice site sequence comprises 6, 7, 8, or 9 consecutive nucleotides of an acceptor splice site. In certain embodiments, an acceptor splice site sequence comprises 9 consecutive nucleotides of an acceptor splice site.
As used herein, the term “cryptic donor splice site sequence” refers to a cryptic donor splice site sequence that is defined by any GT (or GC) that may constitute the consensus nucleotides of a donor essential splice site, wherein the cryptic donor splice site is not positioned correctly at the exon-intron junction. The skilled person will be familiar that abnormal splicing due to use of cryptic donor splice sites can occur in subjects with variants affecting the authentic reference donor splice site. The skilled person will also be familiar that abnormal splicing due to use of cryptic donor splice sites can occur in subjects with variants affecting (e.g. strengthening) cryptic donor splice sites. In certain embodiments, a cryptic donor splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, 12 or up to 15 consecutive nucleotides of a cryptic donor splice site. In certain embodiments, a cryptic donor splice site sequence consists of 12 nucleotides comprised of four overlapping sequences of nine consecutive nucleotides, corresponding to nucleotide positions E−4 to D+5, E−3 to D+6, E−2 to D+7 and E−1 to D+8, wherein the GT (or GC) represent the nucleotides comprising the essential splice site at positions D+1 and D+2 of the cryptic donor splice site;
As used herein the term “sample splice site” refers to a sample from the genome of a subject. The skilled person will be familiar with sequencing of the genome of a subject, including but not limited to a human adult, juvenile, infant, foetus, embryo, or gamete. A sample splice site may comprise a splice site comprising a splice site sequence obtained from the genome of a subject. It will be understood that a single gene may comprise multiple splice sites. It will be understood that a sample splice site may be derived from an identified region of an identified gene. In one embodiment, a sample splice site may be obtained from whole genome sequencing. In one embodiment, a sample splice site may be obtained from whole exome sequencing. In one embodiment, a sample splice site may be obtained from sequencing a panel of genes. In one embodiment, a sample splice site may be obtained from sequencing a single gene. Exemplary sample splice sites, include, but are not limited to, a donor splice site, a branch site, and an acceptor splice site.
As used herein, the term “subject”, includes, but is not limited to, a human suspected of suffering from or carrying a genetic disorder (autosomal dominant, autosomal recessive, X-linked dominant, X-linked recessive, Y-linked, mitochondrial, or somatic), a human at risk of cancer, or a human suspected of having an abnormal splice site.
As used herein, the term “sample splice site sequence” refers to nucleotides comprised in a sample splice site. A sample splice site sequence may comprise one or more regions of consecutive nucleotides of a sample splice site. In certain embodiments, a sample splice site sequence may comprise one or more regions of consecutive nucleotides with one or more groups consisting of a single nucleotide. In one embodiment, a sample splice site sequence comprises 4 to 12 nucleotides of a sample splice site. In one embodiment, a sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, or 12 consecutive nucleotides of a sample splice site. In certain embodiments, the sample splice site sequence comprises 4 to 15 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or up to 15 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more consecutive nucleotides of a donor splice site. In certain embodiments, a sample splice site sequence comprises 9 consecutive nucleotides of a sample splice site. In one embodiment, a sample splice site sequence comprises nucleotides comprised in a donor splice site, a branch site, or an acceptor site. In certain embodiments, a sample splice site sequence comprises 4 to 12 nucleotides comprised in a donor splice site. In certain embodiments, a sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, or 12 consecutive nucleotides of a donor splice site. In certain embodiments, a sample splice site sequence comprises 8, 9, or 10 consecutive nucleotides of a donor splice site. In certain embodiments, a sample splice site sequence comprises 9 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 15 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or up to 15 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more consecutive nucleotides of a donor splice site.
In certain embodiments, more than one sample splice site sequence(s) from a sample splice site are analysed in determining a risk of abnormal splicing of a sample splice site, wherein the sample splice site sequences are each comprised in the same sample splice site. The terms “non-identical” or “not identical” may be used with reference to two or more sample splice site sequences that are obtained from different regions of the same sample splice site and refer to the respective nucleotide positions of the sample splice site. For example, the consecutive nucleotide sequences of E−5 to D+4 and E−4 to D+5 of a sample donor splice site are non-identical or not identical nucleotide positions of a sample donor splice site sequence, the consecutive nucleotide sequences of E−5 to D+4, E−4 to D+5, and E−3 to D+6 of a sample donor splice site are non-identical or not identical nucleotide positions of a sample donor splice site sequence, and so on. In other words, non-identical or not identical refers to the sample splice site sequence as a whole, considering each nucleotide comprised in each sample splice site sequence. The term “overlapping” may be used with reference to two or more sample splice site sequences obtained from different regions of the same sample splice site and refers to sample splice site sequences comprising non-identical or not identical nucleotide positions, wherein at least one nucleotide of each of the two or more sample splice site sequences corresponds to the same nucleotide position from the sample splice site. For example, the consecutive nucleotide sequences of E−5 to D+4 and E−4 to D+5 of a sample donor splice site are non-identical or not identical nucleotide positions of a sample donor splice site sequence and also comprise overlapping nucleotide positions of the sample donor splice site sequence. Likewise, each of the consecutive nucleotide sequences of E−5 to D+4, E−4 to D+5, and E−3 to D+6 of a sample donor splice site are non-identical or not identical nucleotide positions of a sample donor splice site sequence and also comprise overlapping nucleotide position of the sample donor splice site sequence. In certain embodiments, comprising two or more sample splice site sequences from the same sample splice site, each sample splice site sequence may be envisioned as derived from a window sliding along a sample splice site. Various embodiments of sample splice site sequences derived from the same sample splice site considering a sliding window are depicted in Table 1 (below). In certain embodiments comprising two or more sample splice site sequences from the same sample splice site, each sample splice site sequence comprises a different number of nucleotides. In certain embodiments comprising two or more sample splice site sequences from the same sample splice site, each sample splice site sequence comprises the same number of nucleotides. In certain embodiments, a sliding window comprises 9 consecutive nucleotides along a sample splice site. In certain embodiments, the sample splice site sequence corresponds to nucleotide position E−5 to D+4, E−4 to D+5, E−3 to D+6, E−2 to D+7, E−1 to D+8, or D+1 to D+9 of a donor splice site. In certain embodiments, the sample splice site sequence corresponds to nucleotide position E−4 to D+5, E−3 to D+6, E−2 to D+7 and E−1 to D+8 of a donor splice site. In certain embodiments, the method comprises one or more sample splice site sequence(s) from a sample splice site wherein the one or more sample splice site sequence(s) corresponds to one or more of the nucleotide positions E−5 to D+4, E−4 to D+5, E−3 to D+6, E−2 to D+7, E−1 to D+8, or D+1 to D+9 of a donor splice site. In certain embodiments, the method comprises one or more sample splice site sequence(s) from a sample splice site wherein the one or more sample splice site sequence(s) corresponds to one or more of the nucleotide positions E−4 to D+5, E−3 to D+6, E−2 to D+7 and E−1 to D+8 of a donor splice site. Four exemplary embodiments relating to embodiments comprising at least six sample donor splice site sequences from a sample donor splice site are depicted below in Table 1 wherein the nucleotides of a sample donor splice site are indicated as nucleotide positions E−5 to D+9 and an “x” indicates that that nucleotide is included in a sample donor splice site sequence and wherein the left most column in the table is the arbitrary number assigned the sample splice site sequence (1 is the first sample splice site sequence, 2 is the second splice site sequence, and so on).
As used herein, the term “reference splice site sequence” refers to a splice site sequence from a sequenced human genome, referred to herein as a reference human genome sequence. Exemplary reference human genome sequences include, but are not limited to, the “Genome Reference Consortium Build 37” also referred to as “hg19” (<https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13>), Genome Reference Consortium Human Build 38 patch release 12 (GRCh38.p12) (<https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.38>), or any sequenced human genome from an individual or individuals not exhibiting or carrying a genetic disorder. In one embodiment, a reference human genome is the human genome sequence of the “Genome Reference Consortium Build 37” also referred to as “hg19” (<https://www.ncbi.nlm.nih.goviassembly/GCF_000001405.13>). In one embodiment, a reference human genome is the human genome sequence of the Genome Reference Consortium Human Build 38 patch release 12 (GRCh38.p12) (<https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.38>). In one embodiment, a reference human genome is a combination of the human genome sequence of the “Genome Reference Consortium Build 37” also referred to as “hg19” (<https://www.ncbi.nlm.nih.goviassembly/GCF_000001405.13>) and the human genome sequence of the Genome Reference Consortium Human Build 38 patch release 12 (GRCh38.p12) (<https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.38>).
As used herein, the term “corresponding” with regard to the terms “corresponding gene”, “same corresponding region of a gene”, “corresponding reference splice site”, and “corresponding reference splice site sequence”, and variations thereof, are used to denote that a sample splice site and a corresponding reference splice site are derived from the same region of the same gene, wherein the sample splice site comprises nucleotide sequences obtained from genomic sequencing of a subject and the corresponding reference splice site comprises nucleotides from a reference human genome sequence. For example, when the sample splice site comprises nucleotides E−5 to D+5 of the exon-intron boundary of exon 5 of gene X from a subject, the reference splice site comprises nucleotides E−5 to D+5 of the exon-intron boundary of exon 5 of gene X from a reference human genome sequence. Likewise, for example, a sample splice site sequence of nucleotides D+1 to D+5 of the exon-intron boundary of exon 5 of gene X from a subject will have a reference splice site of nucleotides D+1 to D+5 of the exon-intron boundary of exon 5 of gene X from a reference human genome sequence.
As used herein, the term “Native Intron Frequency” refers to frequency a particular nucleotide sequence appears in a splice site in a reference human genome sequence. One measure of Native Intron Frequency is the number of times a particular nucleotide sequence appears in a splice site in a reference human genome sequence, which may be represented by NIFvar or NIF (count). In certain embodiments, a measure of Native Intron Frequency of a reference splice site sequence (NIFref) refers to the number of times the nucleotide sequence of the reference splice site sequence appears in splice sites in a reference human genome sequence; a measure of Native Intron Frequency of the sample splice site sequence (NIFvar) refers to the number of times the nucleotide sequence of the sample splice site sequence appears in a splice site in a reference human genome sequence; a NIF equal to 0 (zero) (NIF=0) means that the nucleotide sequence does not appear in any splice site in a reference human genome sequence; a NIF equal to one (NIF=1) means that the nucleotide sequence appears in one splice site in a reference human genome sequence; an NIF equal to two (NIF=2) means that the nucleotide sequence appears in two splice sites in a reference human genome sequence, wherein each of the two splice sites is a unique splice site in the reference human genome; an NIF equal to three (NIF=3) means that the nucleotide sequence appears in three splice sites in a reference human genome sequence, wherein each of the three splice sites is a unique splice site in the reference human genome; and so on. “Unique” as used in this context refers to each splice sequence appearing in a different splice site in one gene or two different genes. For example, a sample donor splice site sequence having an NIF=2 means that the nucleotide sequence of the sample donor splice site sequence appears in two different donor splice sites (different exon-intron boundaries), wherein the two different splice sites may be from two splice sites within the same gene or two splice sites from two different genes. The symbol NIFvar-x, where “x” is a whole number integer (1, 2, 3, 4, 5, and so on) refers to the measure of Native Intron Frequency determination for a sample splice site where more than one sample splice site sequence from the same sample splice site is analysed. For example, where two sample splice site sequences are analysed from the same splice site, an NIFvar for the first sample splice site sequence may be referred to as NIFvar-1 and an NIFvar for the second sample splice site sequence may be referred to as NIFvar-2; and so on. The corresponding two NIFref for each reference splice site sequence, one for the first splice site sequence and two for the second splice site sequence, may be referred to as NIFref-1 and NIFref-2, respectively; and so on.
As used herein, the term “abnormal splice site” refers to the characterization of splice site as a genetic variant of the corresponding splice site of a reference human genome sequence, wherein the genetic variant exhibits aberrant splicing. Aberrant splicing includes, but is not limited to, reduced splicing, non-splicing, exon-skipping, intron retention, and the like. Aberrant splicing associated with an abnormal splice site may be causative of a pathogenic phenotype. An abnormal splice site may be further characterized as a pathogenic splice site wherein aberrant splicing associated with an abnormal splice site is causative of a pathogenic phenotype. An abnormal splice site may be characterized with a risk of abnormal splicing. In one embodiment, a risk of abnormal splicing is characterized by a value from 0 to 1, wherein the risk of abnormal splicing increases as the value approaches 1.
As used herein, the term “abnormal splice site sequence” refers to a splice site sequence that comprises a different nucleotide sequence when compared with the splice site sequence in the corresponding region of a gene in a reference human genome sequence. An abnormal splice site sequence may be further characterized as a pathogenic splice site sequence, wherein aberrant splicing associated with the abnormal splice site sequence is causative of a pathogenic phenotype. A genetic variant may comprise an abnormal splice site comprising an abnormal splice site sequence.
As used herein, the term “benign variant splice site” refers to a splice site sequence that comprises a different nucleotide sequence when compared with the splice site sequence in the corresponding region of a gene in a reference human genome sequence, and does not result in aberrant splicing.
As used herein, the term “clinical classification” refers to the classification assigned to a splice site. Clinical classification for a splice site may be determined from any available source wherein a genetic variant is assigned a clinical classification. Exemplary sources of variant splice sites with clinical classifications include, but are not limited to, ClinVar (<https://www.ncbi.nlm.nih.gov/clinvar/>) and the Human Gene Mutation Database (HGMD) (<http://www.hgmd.cf.ac.uk/ac/index.php>). The skilled person will be familiar with clinical classifications assigned to variant genes, variant splice sites, and variant splice site sequences. See, e.g., Richards et al, Genetics in Medicine (2015) 17(5): 405-424. Clinical classifications in ClinVar include pathogenic, likely pathogenic, benign, and likely benign among others. Entries included in the HMGD may be identified as gene lesions responsible for human inherited diseases and as such are classified as pathogenic. A region of a splice site, for example 4, 5, 6, 7, 8, 9, 10, 11, or 12 nucleotides of a splice site sequence, may appear in more than one splice site, with each appearance represents a genetic variant and each appearance may be assigned a clinical classification. A region of a splice site, for example 4, 5, 6, 7, 8, 9, 10, 11, 12 or up to 15 nucleotides of a splice site sequence, may appear in more than one splice site, with each appearance represents a genetic variant and each appearance may be assigned a clinical classification. A region of a splice site, for example up to 15 nucleotides or more of a splice site sequence, may appear in more than one splice site, with each appearance represents a genetic variant and each appearance may be assigned a clinical classification. A region of a splice site, for example up to 30 nucleotides or more of a splice site sequence, may appear in more than one splice site, with each appearance represents a genetic variant and each appearance may be assigned a clinical classification. A clinical classification associated with a nucleotide sequence of a splice site sequence (eg a sample splice site sequence or a reference splice site sequence) includes any clinical classification assigned to the nucleotide sequence in any splice site in any gene. A clinical classification of a splice site as pathogenic or likely pathogenic may be interpreted as an abnormal splice site (also referred to herein a pathogenic splice site). A clinical classification of a splice site as benign or likely benign may be interpreted as a benign variant splice site.
As used herein, the term “Percentile (NIF)” (alternatively herein referred to as “NIF percentile”) refers to the percentile within the percentile distribution of the frequency of a splice site sequence in a reference human genome sequence. A NIFvar of 0 (zero) is assigned a 0th Percentile (NIFvar). For example, a NIFvar within the 2nd Percentile indicates that, for splice site sequences comprised in a reference human genome sequence, <2% of splice site sequences have a NIF falling within this range; an exemplary NIFref of 653 lies within the 85th percentile among a frequency distribution of splice site sequences in a reference human genome; and so on.
As used herein, median percentile NIF is calculated as median (NIFvar-1 percentile; NIFvar-2 percentile; percentile of NIFvar-3 percentile; NIFvar-4 percentile). For example, a hypothetical site with percentile NIFvar-1=0.2499, percentile NIFvar-2=0.5904, percentile NIFvar-3=0.7172, percentile NIFvar-4=0.9065 has a median percentile NIFvar-x of 0.6538. This may also be represented generically by median (NIFref-1; NIFref-2; NIFref-3; NIFref-4).
As used herein, the Percentile value for median NIF is determined through calculation of the cumulative frequency distributions of median NIFref-x for all donor splice sites in the reference human genome (180,000 donor splice sites). For example, a donor splice site of 12 nucleotides with a median NIFref1-4 of 1 lies within the first percentile of a frequency distribution of median NIFref1-4 among all donor splice sites in the reference human genome. In a second example, a donor splice site with a median NIFref1-4 of 327 lies within the fiftieth percentile of a frequency distribution of median NIFref1-4 among all donor splice sites in the reference human genome
As used herein, the term “NIF-shift” refers to a measure of the relative change in NIF for a given splice site sequence with respect to a corresponding reference human genome sequence. In one embodiment, NIF-shift may be determined by comparing a measure of NIF for a given splice site sequence with a measure of NIF for the corresponding reference splice site sequence. In one embodiment, NIF-shift of a sample splice site sequence may be determined by comparing a measure of NIF of a sample splice site sequence (NIFvar-x) with a measure of NIF of the corresponding reference splice site sequence (NIFref-x). In one embodiment, NIF-shift is determined by a comparison of Percentile (NIFvar-x) with the corresponding Percentile (NIFref-x). In a second embodiment, median NIF-shift of a sample splice site sequence may be determined by comparing a measure of median NIF of a sample splice site sequences (median NIFvar-x) with a measure of median NIF of the corresponding reference splice site sequences (median NIFref-x). In a related embodiment, percentile median NIF-shift of a sample splice site sequence may be determined by comparison of Percentile (median NIFvar-x) with the corresponding Percentile (median NIFref-x). In certain embodiments, comparing, e.g. NIFvar-x with corresponding NIFref-x or Percentile (NIFvar-x) with corresponding Percentile (NIFref-x), to determine NIF-shift comprises a ratiometric analysis, e.g. NIFvar-x/NIFref-x, Percentile (NIFvar-x)/Percentile (NIFref-x), median (NIFvar-x)/median (NIFref-x), Percentile (median NIFvar-x)/Percentile (median NIFref-x), mean (NIFvar-x)/mean (NIFref-x), Percentile (meanNIFvar-x)/Percentile (mean NIFref-x). In certain embodiments, comparing, e.g. NIFvar-x with corresponding NIFref-x or Percentile (NIFvar-x) with corresponding Percentile (NIFref-x), to determine NIF-shift comprises subtracting, e.g. subtracting NIFvar-x from NIFref-x or subtracting Percentile (NIFvar-x) from Percentile (NIFref-x).
As used herein, the term “same NIF-shift” refers to two or more splice site sequences having about the same “NIF-shift” or the same “NIF-shift”. In certain embodiments, the term “same median NIF-shift” refers to two or more splice site sequences having about the same “median NIF-shift” or the same “median NIF-shift”. In related embodiments, the term “same mean NIF-shift” refers to two or more splice site sequences having about the same “mean NIF-shift” or the same “mean NIF-shift”.
As used herein, the term “similar NIF-shift variant” refers to a splice site sequence having a relative change (or shift) in NIF (or Percentile NIF), median NIF (or Percentile median NIF) or mean NIF (or Percentile mean NIF) with respect to a corresponding reference human genome sequence (referred to herein as a NIF-shift), which is similar to a relative change (or shift) in NIF with respect to a corresponding reference human genome sequence for another splice site sequence. Two or more splice site sequences are considered “similar NIF-shift variants”, when two or more splice site sequences have the same relative change (or shift) in NIF or fall within the same range of values around a NIF-shift of a sample splice site sequence. In certain embodiments, a range of values around a NIF-shift is ±about 2%, ±about 2.5%, ±about 5%, or ±about 10%. For example, for sample splice site sequence with median NIFvar-x of 0 and a corresponding median NIFref-x of 653, similar median NIF-shift variants can have a NIFvar of 0 and a corresponding NIFref of from 472-903. For a sample splice site sequence and its corresponding reference splice site sequence having Percentile (median NIFvar-x)=0 and Percentile (median NIFref-x)=0.85 (85th percentile), a similar NIF-shift variant(s) would include, but would not be limited to, a splice site sequence and its corresponding reference splice site sequence having Percentile median NIFvar-x=0 and a range of values around Percentile median NIFref=0.85. In certain embodiments, a range of median NIF-shift values may be calculated, wherein a lower bound and an upper bound may be determined for each median NIFvar-x and corresponding median NIFref-x or Percentile (median NIFvar-x) and corresponding Percentile (median NIFref-x), or calculated from a median NIF-shift, eg, ratiometric or subtraction of median NIF-shift, to calculate a range of median NIF-shift. For example, a ±about 2% NIF-shift range could be calculated considering ±about 2% NIFvar-x and ±about 2% NIFref-x; and a similar NIF-shift variant will have a have a NIFvar and NIFref with the calculated ranges. In certain embodiments, the range of NIF-shift may be determined by considering exponential upper and lower bounds. For example, a lower bound (e((log(NIFvar))*(1−NIF_shift percentage))) and an upper bound (e((log(NIFvar))*(1+NIF_shift percentage))) for NIFvar and a lower bound (e((log(NIFref))*(1−NIF_shift_percentage))) and an upper bound (e((log(NIFref))*(1+NIF_shift percentage))) for NIFref may be used to calculate a range of NIF-shift for identifying similar NIF-shift variants. In this context, suitable NIF-shift percentages include about 2%, about 2.5%, about 5%, and about 10%.
As used herein, the term “Clinical Splice Predictor (CSP) reference database” refers to a database of variant splice sites with clinical classifications, for example abnormal splice site or benign variant splice site. Clinical classification for a splice site may be determined from any available source wherein a genetic variant is assigned a clinical classification. Exemplary sources of variant splice sites with clinical classifications include, but are not limited to, ClinVar (<https://www.ncbi.nlm.nih.gov/clinvar/>) and the Human Gene Mutation Database (HGMD) (<http://www.hgmd.cf.ac.uk/ac/index.php>). The skilled person will be familiar with clinical classifications assigned to variant genes, variant splice sites, and variant splice site sequences. See, eg, Richards et al, Genetics in Medicine (2015) 17(5): 405-424. Clinical classifications in ClinVar include pathogenic, likely pathogenic, benign, and likely benign among others. Entries included in the HMGD may be identified as genes lesions responsible for human inherited diseases and as such are classified as pathogenic. A clinical classification of a variant splice site as pathogenic or likely pathogenic may be interpreted as an abnormal splice site. A clinical classification of a variant splice site as benign or likely benign may be interpreted as a benign variant splice site. In one embodiment, a CSP reference database includes variant splice sites clinically classified as an abnormal splice site or a benign variant splice site. In certain embodiments, a CSP reference database comprises variants, wherein a variant splice site clinically classified as “pathogenic” or “likely pathogenic” is assigned as an “abnormal splice variants” and wherein a variant splice site clinically classified as “benign” or “likely benign” is assigned as a “benign variant splice site”. A CSP reference database may comprise variants affecting only a donor splice site, including exonic variants that are are non-code changing variants (synonymous exonic variants).
As used herein, the term “genetic disorder” includes a disorder that reflects inheritance of a single causative gene. Exemplary sources of genes underlying a genetic disorder include, but are not limited to, Online Genetic Inheritance in Man (OMIM, found at <https://www.omim.org/>. See Appendix A for a list of OMIM genes.
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings as follows.
Table 1: (above) Four exemplary embodiments relating to embodiments comprising at least six sample donor splice site sequences from a sample donor splice site are depicted in Table 1 wherein the nucleotides of a sample donor splice site are indicated as nucleotide positions E−5 to D+9 and an “x” indicates that that nucleotide is included in a sample donor splice site sequence.
Table 2: Blinded trial of Clinical Splice Predictor (V3) for BRCA1 or BRCA2 variants identified in individuals with breast cancer, with experimental confirmation of splicing outcomes. Clinical Splice Predictor reports were analysed blinded for thirty putative splice variants identified in cancer oncogenes BRCA1 and BRCA2. Genomic variants were classified according to defined criteria (see Table 4). Unblinding to published experimental outcomes reveals 100% predictive accuracy for BRCA1 and BRCA2 True Positive (abnormal splice sites) variant splice sites and True Negative (benign variant splice sites) variant splice sites.
Overall Predictive accuracy:
Table 3: Blinded trial of Clinical Splice Predictor (V3) for putative splice variants across all fields of genomic medicine, with RNA-sequencing providing confirmation of splicing outcomes. Clinical Splice Predictor reports were analysed blinded for thirty-nine putative splice variants identified in a range of OM IM genes associated with different Mendelian disorders. Genomic variants were classified according to defined criteria (see Table 4). Unblinding to RNA-sequencing experimental outcomes reveals 100% predictive accuracy for True Positive (abnormal splice sites) variant splice sites and True Negative (benign variant splice sites) variant splice sites. See also
Appendix A. A list of Mendelian genes with clinically relevant phenotypes. This list has been filtered to exclude OMIM genes associated with traits and non-clinically relevant phenotypes such as eye colour, curly hair etc.
Appendix B. A compiled list of genes determined to induce developmental lethality with recessive knock-out in a murine mouse model via Mouse Genome Informatics (http://www.informatics.jax.org/downloads/reports/index.html) and the 8th release of IMPC mouse phenotype data (ftp://ftp.ebi.ac.uk/pub/databases/impc/).
Appendix C. A compiled list of genes determined to induce human prenatal, perinatal or infantile lethality were derived from http://www.omim.org. OMIM phenotypic search terms were used to query text fields for terms associated with lethality before birth or shortly after birth.
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings.
In an embodiment related to the first embodiment, disclosed are methods of identifying an abnormal splice site in a sample splice site from a subject. Disclosed are methods relating to comparing a sample splice site from a subject with splice sites from a reference human genome sequence. The comparison comprises determining a measure of Native Intron Frequency of a splice site sequence from a subject relative to a reference human genome sequence, wherein Native Intron Frequency refers to a measure of the frequency of the splice site sequence from a subject in a reference human genome sequence. In certain embodiments, a measure of Native Intron Frequency refers to the number of times a splice site sequence from a subject appears in a reference human genome sequence. In certain embodiments, a measure of Native Intron Frequency refers to Percentile (NIF). In certain embodiments, the sample splice site from the subject is a donor splice site, a branch site, or an acceptor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 12 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11 or 12 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 15 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or up to 15 consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 9 consecutive nucleotides of a donor splice site. In certain embodiments related to the first embodiment, the sample splice site is a donor splice site, and the method comprises more than one sample splice site sequence comprised in the same donor splice site, wherein each sample donor splice site sequence comprises 9 non-identical consecutive nucleotides of the donor splice site, and wherein the sample donor splice site sequences may comprise overlapping consecutive nucleotides of the donor splice site. In a related embodiment comprising at least six sample splice site sequences comprised in the same sample splice site, the sample splice site sequences correspond to at least nucleotide positions E−5 to D+4, E−4 to D+5, E−3 to D+6, E−2 to D+7, E−1 to D+8, and D+1 to D+3 of a donor splice site. In a related embodiment comprising at least four sample splice site sequences comprised in the same sample splice site, the sample splice site sequences correspond to at least nucleotide positions E−4 to D+5, E−3 to D+6, E−2 to D+7 and E−1 to D+8 of a donor splice site.
In embodiments related to the first embodiment, the method of identifying an abnormal splice site in a sample splice site from a subject comprises (a) obtaining a first sample splice site sequence comprised in the sample splice site from the subject; and (b) determining a Native Intron Frequency of the first sample splice site sequence (NIFvar-1); wherein an NIFvar-1 of 0 indicates that the sample splice site is abnormal. In certain embodiments, the sample splice site from a subject is a donor splice site and the first sample donor splice site sequence comprises 9 consecutive nucleotides of the sample donor splice site. In certain embodiments, the sample splice site from a subject is a donor splice site and the method comprises determining a NIFvar for more than one sample donor splice site sequence comprised in the same sample splice site, and the method of comprises (a) obtaining first and second sample donor splice site sequences; first, second, and third sample donor splice site sequences; first, second, third, and fourth sample donor splice site sequences; first, second, third, fourth, and fifth sample donor splice site sequences, or first, second, third, fourth, fifth, and sixth sample donor splice site sequences; wherein each sample donor splice site sequence is comprised in the sample donor splice site from the subject, wherein each sample donor splice site sequence comprises a non-identical set of 9 nucleotide positions of the sample donor splice site; and (b) determining a measure of Native Intron Frequency of the each sample donor splice site sequence; wherein a Native Intron Frequency of 0 (zero) for any sample donor splice site sequence indicates that the sample donor splice site is abnormal.
In an embodiment related to the second embodiment, methods of identifying an abnormal splice site in a sample splice site relate to comparing a measure of Native Intron Frequency of a sample splice site sequence with a measure of Native Intron Frequency of a reference splice site sequence, wherein the sample splice site sequence and reference splice site sequence originate from the same corresponding region of a gene. A change (or shift) in a measure of Native Intron Frequency of the sample splice site sequence in comparison to the Native Intron Frequency of a corresponding reference splice site sequence provides a measure of the risk of abnormal splicing for the sample splice site; the change (or shift) may be referred to herein as NIF-shift or shift in NIF for a sample splice site sequence. In certain embodiments, a measure of Native Intron Frequency of sample splice site sequence and a measure of Native Intron Frequency of a corresponding reference splice site sequence are determined, and a risk of abnormal splicing for the sample splice site is determined by comparing NIF-shift against a CSP reference database. In certain embodiments, a NIF-shift is determined for the sample splice site sequence from the measure of Native Intron Frequency of sample splice site sequence and a measure of Native Intron Frequency of a corresponding reference splice site sequence. NIF-shift may be determined by a ratiometric analysis of the measure of Native Intron Frequency of sample splice site sequence and the measure of Native Intron Frequency of a corresponding reference splice site sequence; or subtracting the measure of Native Intron Frequency of sample splice site sequence from the measure of Native Intron Frequency of a corresponding reference splice site sequence: or the like calculations. In certain embodiments, NIF-shift for the sample splice site is compared against a CSP reference database, wherein the CSP reference database comprises NIF-shift for variant splice sites clinically classified as abnormal splice sites or benign variant splice sites, and wherein the comparison comprises assessing a clinical classification(s) assigned to (a) variant splice site(s) having about the same NIF-shift as the sample splice site sequence. A risk of abnormal splicing may then be derived from the clinical classification(s) of each variant splice site having about the same NIF-shift as the sample splice site sequence. Given a CSP reference dataset comprising, e.g. NIF-shift with a known classification for each variant splice site, a machine learning or regression algorithm can be applied to calculate the risk of abnormal splicing for a sample splice site sequence. Given the input dataset, various techniques can be used to produce an indicator of the risk of abnormal splicing for the sample site sequence. Whilst a simple method is to apply a regression calculation to the data set to produce a regression equation, other techniques can be used. These can include applying support vector machines to the data set, and in the further alternative applying deep neural network learning techniques to the data set. In one embodiment, the risk of abnormal splicing is a number from 0 to 1, wherein 0 represents no risk of abnormal splicing and 1 represents highest risk of abnormal splicing. Exemplary embodiments related to the second embodiment are depicted in
In an embodiment related to the second embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
In embodiments related to the second embodiment, Percentile (NIFvar-1) and Percentile (NIFref-1) are used in conjunction to infer the risk of abnormal splicing. In certain embodiments, a NIF-shift is determined for the sample splice site sequence from Percentile (NIFvar-1) and Percentile (NIFref-1). NIF-shift may be determined by a ratiometric analysis of Percentile (NIFvar-1) and Percentile (NIFref-1); or subtracting Percentile (NIFvar-1) from Percentile (NIFref-1); or the like calculations. In certain embodiments, NIF-shift for the sample splice site sequence is compared against a CSP reference database, wherein the CSP reference database comprises NIF-shift for variant splice sites clinically classified as abnormal splice sites or benign variant splice sites, and wherein the comparison comprises assessing a clinical classification(s) assigned to (a) variant splice site(s) having about the same NIF-shift as the sample splice site sequence. A risk of abnormal splicing may then be derived from the clinical classification of each variant splice site with a clinical classification having about the same NIF-shift as the sample splice site sequence. Exemplary embodiments related to the second embodiment are depicted in
Given a dataset, e.g. a CSP reference database, comprising, e.g. a Percentile (NIFvar), a Percentile (NIFref), and a known classification for each genetic variant, a machine learning or regression algorithm can be applied to calculate the risk of abnormal splicing for a sample splice site sequence. Given the input dataset, various techniques can be used to produce an indicator of the risk of abnormal splicing for the sample site sequence. Whilst a simple method is to apply a regression calculation to the data set to produce a regression equation, other techniques can be used. These can include applying support vector machines to the data set, and in the further alternative applying deep neural network learning techniques to the data set.
It will be understood that in any embodiments comprising Percentile (NIF), a measure of NIF (eg NIF or NIF (count) may be used instead.
An exemplary machine learning dataset suitable for embodiments related to any embodiment described herein, may comprise one or more datasets related to non-identical nucleotide positions of a sample splice site as shown below. It will be appreciated that the number of sample splice site sequences from the same sample splice site may vary in total nucleotide composition and nucleotide position with respect to the sample splice site.
In the above exemplary table, the first column indicates the nucleotide position of a sample splice site in which a variation from a corresponding reference splice site sequence occurs. For example, for a sample splice site variant that resides in the −1 position of a donor splice site, a NIFvar and corresponding NIFref (and/or a Percentile (NIFvar) and corresponding Percentile (NIFref)) for sample splice site sequences corresponding to nucleotide position E−5˜D+4 through to E−1˜D+5 of the sample donor splice site may be analysed, and so on.
In certain embodiments related to the second embodiment, the sample splice site may be a donor splice site and the donor splice site sequence comprises 4 to 12 nucleotides of the sample donor splice site. In certain embodiments related to the second embodiment, the sample splice site is a donor splice site and the donor splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, or 12 consecutive nucleotides of the sample donor splice site. In certain embodiments related to the second embodiment, the sample splice site is a donor splice site and the donor splice site sequence comprises 4 to 15 nucleotides of a donor splice site. In certain embodiments related to the second embodiment, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or up to 15 consecutive nucleotides of a donor splice site. In certain embodiments related to the second embodiment, the sample splice site sequence comprises 30 or more nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more consecutive nucleotides of a donor splice site. In certain embodiments related to the second embodiment, the sample splice site is a donor splice site and the donor splice site sequence comprises 9 consecutive nucleotides of the sample donor splice site. In further embodiments related to the second embodiment, the sample splice site from a subject is a donor splice site and the method comprises analysing more than one donor splice site sequence comprised in the same sample donor splice site, wherein said method comprises, for example, obtaining first and second sample donor splice site sequences; first, second, and third sample donor splice site sequences; first, second, third, and fourth sample donor splice site sequences; first, second, third, fourth, and fifth sample donor splice site sequences; first, second, third, further, fifth, and sixth sample donor splice site sequence, and so on; wherein each sample donor splice site sequence is comprised in the sample donor splice site from the subject. Each Percentile (NIFvar-1) and corresponding Percentile (NIFref-1) are used in conjunction, e.g. by calculating a respective NIF-shift, against a CSP reference database to infer the risk of abnormal splicing. A risk of abnormal splicing may then be derived from the clinical classification of each variant splice site with a clinical classification having about the same NIF-shift as the sample splice site sequences. An increasing number of sample splice site sequences characterised as abnormal, increases the risk of abnormal splicing.
In an embodiment related to the third embodiment, provided are methods of identifying an abnormal splice site in a sample splice site from a subject related to comparing the clinical classification(s) of the nucleotide sequence of a sample splice site sequence in relation to any variant splice site comprising the same nucleotide sequence. The method comprises assessing the clinical classification(s), if available, of each appearance of a nucleotide sequence of a sample splice site sequence in any variant splice site in any gene, e.g. a splice site comprised in the same gene as the sample splice site but at another intron/exon location; a splice site comprised in a gene different from the gene comprising the sample splice site, and so on. In certain embodiments, the method further comprises assessing the clinical classification(s), if available, of each appearance of the nucleotide sequence of the reference splice site in any variant splice site in any gene. Collections of variant genes and/or variant splice sites relating to a disorder with an associated clinical classification, including for example, pathogenic, likely pathogenic, likely benign, likely benign, are available, including for example the collections available as ClinVar, HGMD, etc. A nucleotide sequence comprised in a sample splice site from a subject and/or a nucleotide sequence comprised in a corresponding reference splice site can be searched in such a collection for its appearance and the associated clinical classification of each appearance of the searched nucleotide sequence can be determined. In certain embodiments, a CSP reference database comprises variant wherein a variant clinically classified as “pathogenic” or “likely pathogenic” is assigned as an “abnormal splice site” and a variant clinically classified as “benign” or “likely benign” is assigned as a “benign variant splice site”. It will be appreciated that the same nucleotide sequence may be classified as an abnormal splice site in the context of one variant splice site comprised in a CSP database and may be classified as a benign variant splice site in the context of a different variant splice site comprised in the CSP database. A CSP reference database may comprise variants affecting only a donor splice site, including exonic variants that are non-code changing variants (synonymous exonic variants). For example, part ii of each of
In an embodiment related to the third embodiment, the method of identifying an abnormal splice site in a sample splice site from a subject, said method comprises:
(a) obtaining a first sample splice site sequence comprised in the sample splice site from the subject;
(b) determining a clinical classification(s) associated with the nucleotide sequence of the first sample splice site sequence;
(c) determining a risk of abnormal splicing for the sample splice site by assessing the clinical classification(s) of the nucleotide sequence of the first sample splice site sequence determined in step (b).
In an embodiment related to the third embodiment, the method of identifying an abnormal splice site in a sample splice site from a subject, said method comprises:
(a) obtaining a first sample splice site sequence comprised in the sample splice site from the subject;
(b) obtaining a first reference splice site sequence; wherein the first reference splice site sequence and the first sample splice site sequence each originate from the same corresponding region of a gene;
(c) determining a clinical classification(s) associated with the nucleotide sequence of the first sample splice site sequence;
(d) determining a clinical classification(s) associated with the nucleotide sequence of the first reference splice site sequence; and
(e) determining a risk of abnormal splicing for the sample splice site by assessing the clinical classification(s) of the nucleotide sequence of the first sample splice site sequence determined in step (c) and the clinical classification(s) of the nucleotide sequence of the first reference splice site sequence determined in step (d).
In embodiments related to the third embodiment, clinical classification(s) of a nucleotide sequence of a splice site sequence (eg, sample splice site sequence, reference splice site sequence) may be determined from a data base comprising known genetic variants with an associated clinical classification (eg, abnormal splice site, benign variant splice site). A clinical classification of a nucleotide sequence of a splice site sequence may be determined from a CSP reference database, wherein the CSP reference database comprises nucleotide sequences of variant splice sites with corresponding clinical classifications (eg, abnormal splice site, benign variant splice site).
In certain embodiments related to the third embodiment, the sample splice site may be a donor splice site and the donor splice site sequence may comprise 4 to 12 nucleotides of the sample donor splice site. In certain embodiments related to the third embodiment, the sample splice site is a donor splice site and the donor splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, or 12 consecutive nucleotides of the sample donor splice site. In certain embodiments related to the third embodiment, the sample splice site is a donor splice site and the donor splice site sequence comprises 4 to 15 nucleotides of a donor splice site. In certain embodiments related to the third embodiment, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or up to 15 consecutive nucleotides of a donor splice site. In certain embodiments related to the third embodiment, the sample splice site sequence comprises 30 or more nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more consecutive nucleotides of a donor splice site. In certain embodiments related to the third embodiment, the sample splice site is a donor splice site and the donor splice site sequence comprises 9 consecutive nucleotides of the sample donor splice site. In further embodiments related to the third embodiment, the sample splice site from a subject is a donor splice site and the method comprises analysing more than one donor splice site sequences comprised in the same sample donor splice site, wherein said method comprises, for example, obtaining first and second sample donor splice site sequences; first, second, and third sample donor splice site sequences; first, second, third, and fourth sample donor splice site sequences; first, second, third, fourth, and fifth sample donor splice site sequences; first, second, third, fourth, fifth, and sixth sample donor splice site sequences, and so on; wherein each sample donor splice site sequence is comprised in the sample donor splice site from the subject. A clinical classification(s) associated with the nucleotide sequence of each sample splice site sequence is determined and, optionally, a clinical classification(s) associated with the nucleotide sequence of each corresponding reference splice site sequence is determined.
Embodiments related to the third embodiment, a risk of abnormal splicing for a sample splice site may be determined by assessing the clinical classifications associated with the nucleotide sequence(s) of one or more sample splice site sequences comprised in a sample splice site. The risk of abnormal splicing increases with increasing instances of abnormal splice sites comprising the nucleotide sequence of a sample splice site sequence, e.g. the number of variant splice sites comprised in a CSP reference database, wherein the variant splice site comprises the nucleotide sequence of the sample splice site sequence, and wherein the variant splice site is clinically classified as an abnormal splice site. A risk of abnormal splicing may be assigned a value from 0 to 1, wherein 0 represents no risk of abnormal splicing and 1 represents highest risk of abnormal splicing. In embodiments comprising more than one sample splice site sequence, a risk of abnormal splicing comprises analysing the clinical classification(s) of the nucleotide sequences corresponding to each sample splice site sequence.
For example, in a method of the third embodiment, wherein the sample splice site is a donor splice site, the sample donor splice site sequence comprises 9 consecutive nucleotide of the donor splice site, and the method is repeated with six non-identical donor splice site sequences comprised in the same sample splice site (E−5 to D+4, E−4 to D+5, E−3 to D+6, E−2 to D+7, E−1 to D+8, and D+1 to D+9) it is possible to create a series of 11 data sets, as follows:
A machine learning set is thus comprised of 11 data sets. Each dataset is specialised at summarizing the patterns of abnormal splicing site/benign variant splice site that occurs within that window. The number of abnormal splicing site/benign variant splice site are used to infer the risk of abnormal splicing of a splice site. The dataset is then used as the foundation for regression or machine learning to calculate the risk of abnormal splicing for a sample splice site from a subject. Given the input dataset, various techniques can be used to produce an indicator of the risk of abnormal splicing for the sample site sequence. Whilst a simple method is to apply a regression calculation to the data set to produce a regression equation, other techniques can be used. These can include applying support vector machines to the data set, and in the further alternative applying deep neural network learning techniques to the data set.
It will be understood that in a method related to the third embodiment, alternative compilations of data may be used to create a machine learning dataset. For example, an alternative approach with regard to the E−5 to D+9 donor sample site and having six unique donor sample site sequence each with 9 consecutive nucleotides of the donor sample site can be applied as follows:
Again, the data set can be utilised as an input to standard machine learning techniques to provide for a descriptive output of a subsequent test subject.
In an embodiment related to the fourth embodiment, methods of identifying an abnormal splice site in a sample splice site from a subject relate to assessing the clinical classification of a splice site determined to be similar to a sample splice site from the subject. In one embodiment, a splice site is determined to be similar to a sample splice site from the subject by determining a relative shift in NIF (NIF-shift) of a sample splice site sequence, calculating a range of values around the NIF-shift of the sample splice site sequence, and querying a database comprising NIF-shift for variant splice sites and corresponding clinical classifications (eg abnormal splice site or benign variant splice site) for variants splice sites having a NIF-shift within the calculated range of NIF-shift for the sample splice site sequence. Variant splice sites identified as having NIF-shift within the calculated range of NIF-shift for the sample splice site sequence may be referred to as “similar NIF-shift variants”. A risk of abnormal splicing may be determined by analysing the clinical classification of similar NIF-shift variants. The risk of abnormal splicing increases with increasing instances of similar NIF-shift variants that are clinically classified as abnormal splice sites, e.g. the number of variant splice sites comprised in a CSP reference database, wherein the variant splice site has an NIF-shift within the range of NIF-shift for the sample splice site, and wherein the variant splice site is clinically classified as an abnormal splice site. A risk of abnormal splicing may be assigned a value from 0 to 1, wherein 0 represents no risk of abnormal splicing and 1 represents highest risk of abnormal splicing. It will be appreciated that for embodiments comprising more than one sample splice site sequence from the sample sample splice site, a risk of abnormal splicing is considered from all similar NIF-shift variants with respect to each range of NIF-shift for each sample splice site sequence.
An embodiment related to the fourth embodiment is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
(a) obtaining a first sample splice site sequence comprised in the sample splice site from the subject;
(b) determining a measure of Native Intron Frequency of the first sample splice site sequence (NIFvar-1);
(c) determining a Percentile (NIFvar-1) of the first sample splice site sequence;
(d) determining a measure of Native Intron Frequency of a first reference splice site sequence (NIFref-1); wherein the first reference splice site sequence and the first sample splice site sequence originate from the same corresponding region of a gene;
(e) determining a Percentile (NIFref-1) of the first reference splice site sequence;
(f) calculating a lower and an upper bound for Percentile (NIFvar-1) and calculating a lower and an upper bound for Percentile (NIFref-1);
(g) determining a range of NIF-shift by comparing the lower and upper bounds for Percentile (NIFvar-1) with the lower and upper bounds for Percentile (NIFref-1) calculated in (f);
(h) identifying (a) similar NIF-shift variant(s), wherein a similar NIF-shift variant refers to a splice site sequence with a NIF-shift within the range of NIF-shift determined in (g);
(i) determining (a) clinical classification(s) associated with each similar NIF-shift variant identified in step (h); and
(j) determining a risk of abnormal splicing for the sample splice site by assessing the clinical classification determined in step (i) for each similar NIF-shift variant identified in step (h).
In embodiments related to the fourth embodiment, the sample splice site is a donor splice site, steps (a) to (i) are repeated with up to five sample splice site sequences and corresponding respective reference splice site sequences, and step (j) includes assessing the clinical classification associated with each similar NIF-shift variant identified in each step (h).
In embodiments related to the fourth embodiment, Percentile (NIFvar-x) and Percentile (NIFref-x) may be used in combination to determine a measure of NIF-shift and a range of NIF-shift may be calculated. In one embodiment, a range of NIF-shift of the sample splice site sequence is compared to a dataset comprising variant splice sites with known clinical classification (eg, abnormal splice site or benign variant splice site) and a corresponding NIF-shift is determined from a combination of Percentile (NIFvar) and a corresponding Percentile (NIFref) for each variant splice site included in the dataset. In embodiments related to the fourth embodiment, NIFvar-x and NIFref-x may be used in combination to determine a measure of NIF-shift and a range of NIF-shift may be calculated. In one embodiment, a range of NIF-shift of the sample splice site sequence is compared to a dataset comprising genetic variants of splice sites with known clinical classification (eg, abnormal splice site or benign variant splice site) and a corresponding NIF-shift is determined from a combination of NIFvar and a corresponding NIFref for each genetic variant included in the dataset. Given a dataset comprising NIF-shift and a known classification for each variant splice site included in the dataset, a machine learning or regression algorithm can be applied to identify genetic variants comprised in the dataset that are similar to the sample splice site of the subject.
An embodiment related to the fourth embodiment is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising:
(a) obtaining a first sample splice site sequence comprised in the sample splice site from the subject;
(b) determining a measure of Native Intron Frequency of the first sample splice site sequence (NIFvar-1);
(c) determining a measure of Native Intron Frequency of a first reference splice site sequence (NIFref-1); wherein the first reference splice site sequence and the first sample splice site sequence originate from the same corresponding region of a gene;
(d) calculating a lower and an upper bound for NIFvar-1 and calculating a lower and an upper bound for NIFref-1;
(e) determining a range of NIF-shift by comparing the lower and upper bounds for NIFvar-1 with the lower and upper bounds for NIFref-1 calculated in (d);
(f) identifying (a) similar NIF-shift variants, wherein a similar NIF-shift variant refers to a splice site sequence with a NIF-shift within the range of NIF-shift determined in (e);
(g) determining a clinical classification associated with each similar NIF-shift variant identified in step (f); and
(h) determining the risk of abnormal splicing for the sample splice site by assessing the clinical classification determined in step (g) for each similar NIF-shift variant identified in step (f).
In embodiments related to the fourth embodiment, identification of similarity is based on a comparison of relative shift in NIF, which is a measure of the shift in NIF of a reference splice site sequence in comparison to NIF of a variant splice site sequence. The determination of similarity is independent of nucleotide sequence. A variant splice site sequence comprised in a dataset with a clinical classification (eg, abnormal splice site or benign variant splice site) and a corresponding NIF-shift may be identified as similar to a sample splice site sequence when the NIF-shift of the variant splice site sequence falls within a range of NIF-shift values centred about a NIF-shift of the sample splice site sequence.
A range of NIF-shift for a sample splice site sequence may be calculated by
(a) determining a measure of Native Intron Frequency of a sample splice site sequence, eg, NIFvar-x or Percentile (NIFvar-x), and determining a measure of Native Intron Frequency of a corresponding reference splice site sequence, e.g. NIFref-x or Percentile (NIFref-x); wherein the reference splice site sequence and the sample splice site sequence each originate from the same corresponding region of a gene;
(b) determining an upper and a lower bound for each measure recited in step (a), e.g. NIFvar-x and NIFref-x, wherein NIFvar-x lower bound is (e((log(NIFvar))*(1−NIF_shift percentage))), NIFvar-x upper bound is (e((log(NIFvar))*(1+NIF_shift percentage))), NIFref-x lower bound is (e((log(NIFref))*(1−NIF_shift percentage))), NIFref-x upper bound is (e((log(NIFref))*(1+NIF_shift percentage)))f;
wherein the respective upper and lower bounds provide a range of NIF-shift for a sample splice site sequence. NIF-shift percentage may be about 2%, about 2.5%, about 5%, or about 10%. A machine learning dataset may be created comprising a NIF shift for each variant splice site with a clinical classification (eg, abnormal splice site or benign variant splice site). This dataset may be used for regression or machine learning to calculate the risk of abnormal splicing for a sample splice site on the basis of a range of NIF-shift of a sample splice site sequence.
In further embodiments related to the fourth embodiment, the sample splice site may be a donor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 12 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, or 12 consecutive nucleotides of a donor splice site. In certain embodiments related to the fourth embodiment, the sample splice site is a donor splice site and the donor splice site sequence comprises 4 to 15 nucleotides of a donor splice site. In certain embodiments related to the third embodiment, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or up to 15 consecutive nucleotides of a donor splice site. In certain embodiments related to the third embodiment, the sample splice site sequence comprises 30 or more nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 9 consecutive nucleotides of a donor splice site.
Methods of identifying an abnormal splice site in a sample splice site further relate to combinations of any method or any embodiment herein disclosed, including combinations of embodiments related to the first, second, and third embodiments or embodiments related to the first, second and fourth embodiments. Combinations of embodiments related to the first, second, third, and/or fourth embodiments are envisioned. Combinations of embodiments related to the second, third, and fourth embodiments are envisioned. Combinations of embodiments related to the second and fourth embodiments are envisioned.
In an embodiment related to the fifth embodiment, provided is a method of identifying an abnormal splice site in a sample splice site from a subject, said method comprising
(a) obtaining a first sample splice site sequence comprised in the sample splice site from the subject;
(b) determining a measure of Native Intron Frequency of the first sample splice site sequence (NIFvar-1);
(c) determining a Percentile (NIFvar-1) of the first sample splice site sequence;
(d) determining a measure of Native Intron Frequency of a first reference splice site sequence (NIFref-1); wherein the first reference splice site sequence and the first sample splice site sequence each originate from the same corresponding region of a gene;
(e) determining a Percentile (NIFref-1) of the first reference splice site sequence;
(f) determining a clinical classification(s) associated with the nucleotide sequence of the first sample splice site sequence;
(g) optionally determining a clinical classification(s) associated with the nucleotide sequence of the first reference splice site sequence;
(h) calculating a lower and an upper bound for Percentile (NIFvar-1) and calculating a lower and an upper bound for Percentile (NIFref-1);
(i) determining a range of NIF-shift by comparing the lower and upper bounds for Percentile (NIFvar-1) with the lower and upper bounds for Percentile (NIFref-1) calculated in (h);
(j) identifying (a) similar NIF-shift variant(s), wherein a similar NIF-shift variant refers to a splice site sequence with a NIF-shift within the range of NIF-shift determined in (i);
(k) determining (a) clinical classification(s) associated with each similar NIF-shift variant identified in step (j); and
(l) determining the risk of abnormal splicing for the sample splice site by (1) comparing the Percentile (NIFvar-1) with the Percentile (NIFref-1) against a CSP reference database, (2) assessing the clinical classification(s) associated with the nucleotide sequence of the first sample splice site sequence determined in step (f); and (3) assessing the clinical classification determined in step
(k) for each similar NIF-shift variant identified in step (j).
In certain embodiments, the sample splice site is a donor splice site, steps (a) to (l) are repeated with up to five sample splice site sequences and corresponding respective reference splice site sequences, and step (l) includes assessing (1) for all sample splice site sequences, (2) for all sample splice site sequences, and (3) for all sample splice site sequences.
Machine learning and dataset analysis of step (l) may be performed in accordance with the second, third, and fourth embodiments.
In a related embodiment, step (g) is carried out; and step (l) may further comprise as part of (2), analysing the clinical classification(s) associated with the nucleotide sequence of the first reference splice site sequence determined in step (g). Embodiments may comprise determining a risk of abnormal splicing expressed as a number from 0 to 1 for each of (1), (2), and (3) comprised in step (l), wherein 0 represents no risk of abnormal splicing and 1 represents highest risk of abnormal splicing.
In further embodiments related to the fifth embodiment, the sample splice site is a donor splice site. In certain embodiments, the sample splice site sequence comprises 4 to 12 nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, or 12 consecutive nucleotides of a donor splice site. In certain embodiments related to the fifth embodiment, the sample splice site is a donor splice site and the donor splice site sequence comprises 4 to 15 nucleotides of a donor splice site. In certain embodiments related to the fifth embodiment, the sample splice site sequence comprises 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or up to 15 consecutive nucleotides of a donor splice site. In certain embodiments related to the fifth embodiment, the sample splice site sequence comprises 30 or more nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 30 or more consecutive nucleotides of a donor splice site. In certain embodiments, the sample splice site sequence comprises 9 consecutive nucleotides of a donor splice site.
Also provided in further embodiments of any of the embodiments provide herein are methods of diagnosing a subject with a known genetic disorder or cancer wherein the sample splice site originates from a gene associated with known Mendelian disorder or cancer. In the methods herein disclosed, a sample splice site obtained from the subject may be a splice site from a predetermined gene associated with known genetic disorder or cancer. Thereby identification of an abnormal splice site in a sample splice site from a subject indicates a diagnosis of a genetic disease or cancer in the subject.
Also provided in further embodiments of any of the embodiments provided herein are methods relating to providing genetic testing services, including providing a risk of abnormal splicing of a sample splice site, to an individual. In one embodiment, provided is a method of providing to an individual a risk of abnormal splicing of a sample splice site from a subject, which is directly accessible by said individual through a computer interface, said method comprising
(a) providing a mechanism for said individual to input at least one sample splice site from a subject;
(b) determining a risk of abnormal splicing of a sample splice site sequence from a subject by
In the method, step (b) may be repeated for one or more sample splice site sequence(s) comprised in the sample splice site from the subject, wherein each sample splice site sequence comprises non-identical set of nucleotides of the sample splice site, and wherein a NIFvar of 0 (zero) for any sample splice site sequence indicates that the sample site is abnormal.
In a further embodiment, provided is a method of providing to an individual a risk of abnormal splicing of sample splice site from a subject, which is directly accessible by said individual through a computer interface, said method comprising
(a) providing a mechanism for said individual to input at least one sample splice site from a subject;
(b) determining a risk of abnormal splicing of the sample splice site sequence by
In the method, step (b) may be repeated for one or more sample splice site sequence(s) comprised in the sample splice site from the subject, wherein each sample splice site sequence comprises non-identical set of nucleotides of the sample splice site, and wherein the risk of abnormal splicing for the sample splice site is determined by considering step (vi) for each sample splice site sequence together.
In a further embodiment, provided is a method of providing to an individual a risk of abnormal splicing of sample splice site from a subject, which is directly accessible by said individual through a computer interface, said method comprising
(a) providing a mechanism for said individual to input at least one sample splice site from a subject;
(b) determining a risk of abnormal splicing of the sample splice site sequence by
In the method, step (b) may be repeated for one or more sample splice site sequence(s) comprised in the sample splice site from the subject, wherein each sample splice site sequence comprises non-identical set of nucleotides of the sample splice site, and wherein the risk of abnormal splicing for the sample splice site is determined by considering step (iv) for each sample splice site sequence together.
In a further embodiment, provided is a method of providing to an individual a risk of abnormal splicing of a sample splice site from a subject, which is directly accessible by said individual through a computer interface, said method comprising
(a) providing a mechanism for said individual to input at least one sample splice site from a subject;
(b) determining a risk of abnormal splicing of the sample splice site sequence by
In the method, step (b) may be repeated for one or more sample splice site sequence(s) comprised in the sample splice site from the subject, wherein each sample splice site sequence comprises non-identical set of nucleotides of the sample splice site, and wherein the risk of abnormal splicing for the sample splice site is determined by considering step (iii) for each sample splice site sequence together.
In a further embodiment, provided is a method of providing to an individual a risk of abnormal splicing of a sample splice site from a subject, which is directly accessible by said individual through a computer interface, said method comprising
(a) providing a mechanism for said individual to input at least one sample splice site from a subject;
(b) determining a risk of abnormal splicing of the sample splice site sequence by
In one embodiment, provided is a method of providing to an individual a risk of abnormal splicing of a sample splice site from a subject, which is directly accessible by said individual through a computer interface, said method comprising
(a) providing a mechanism for said individual to input at least one sample splice site from a subject;
(b) determining a risk of abnormal splicing of the sample splice site sequence by
In one embodiment, provided is a method of providing to an individual a risk of abnormal splicing of a sample splice site from a subject, which is directly accessible by said individual through a computer interface, said method comprising
(a) providing a mechanism for said individual to input at least one sample splice site from a subject;
(b) determining a risk of abnormal splicing of the sample splice site sequence by
In a further embodiment, provided is a method of providing to an individual a risk of abnormal splicing of a sample splice, which is directly accessible by said individual through a computer interface, said method comprising
(a) providing a mechanism for said individual to input at least one sample splice site sequence from a subject;
(b) determining a risk of abnormal splicing of the sample splice site sequence by
Mechanisms to input sequence data through a computer interface are well known in the art and include, but are not limited to, keyboard, disk drive, internet connection, etc.
Methods of treatment are also further embodiments of the methods herein described. Identification of a sample splice site associated with a gene known to be associated with an inherited disease (Mendelian disorder) or cancer provides a genetic diagnosis. The genetic diagnosis will direct applicable treatments for the particular disease or cancer. For example, cancer patients with a pathogenic splice site may be resistant to certain cancer treatment. In one embodiment provided is a method of treating a Mendelian disorder, said method comprising (a) determining a risk of abnormal splicing for a sample splice site; (b) diagnosing a Mendelian disorder or risk of a Mendelian disorder in view of the risk; and (c) administering a treatment for the diagnosed Mendelian disorder. In one embodiment, provided is a method of treating cancer, said method comprising (a) determining a risk of abnormal splicing for a sample splice site from a subject suffering from cancer; and (b) administering a cancer treatment that is amenable to cancers associated with an abnormal splice site. In one embodiment, provided is a method of treating a cancer in a subject suffering from cancer or at risk of suffering from cancer, said method comprising (a) determining a risk of abnormal splicing for a sample splice site from the subject; and (b) administering a splice-related cancer therapy. In one embodiment, provided is a method of treating and/or preventing cancer or a Mendelian disorder in a subject suffering from cancer or a Mendelian disorder or at risk of suffering from cancer or a Mendelian disorder comprising (a) determining a risk of abnormal splicing for a sample splice site from the subject; and (b) treating the subject by genetically editing the splice site determined to have an abnormal splice site.
In a further embodiment, a method 200, illustrated schematically in
A second abnormal splicing factor is generated at step 206 by comparing a sample splice site sequence to pre-classified data. The pre-classified data includes variant splice sites which have been pre-classified as being either an abnormal splice site variant or benign variant splice site and is described in greater detail below with reference to
At step 208 a third abnormal splicing factor is determined based on similar NIFshift variant. The similar NIF-variants are based on pre-classified splice sites having a NIF-shift within a range of NIF-shift calculated from the NIF-shift of a sample splice site sequence and are described in detail with reference to
It will be appreciated that there is no requirement to determine the abnormal splice site factors in the order described above and that reference to the terms “first”, “second” and “third” is not a reference to required order of determination. It will be appreciated that a method 200 may comprising determining the first and second abnormal splicing factors only or, alternatively, the first and third abnormal splicing factors only.
A risk of abnormal splicing for a sample splice site may be determined by comparing the abnormal risk factors to pre-classified data. In some embodiments, the pre-classified data is generated using method as exemplified in
Pre-classified sample splice sites are taken from database comprising pre-classified data and compared to corresponding splice sites from a reference human genome sequence as exemplified in FIG. B.
Pre-classified abnormal splicing factors 204, 206 and 208 are then individually analysed 210 to produce a predictive algorithm as exemplified in
In some embodiments, exemplified in
In embodiments making use of a plurality of subsets 508, window 504 may be a sliding window 510, selecting a first subset 504 of nucleotides before sliding one nucleotide position along to generate the next subset 512 until the entire splice sample 500 is represented in subsets 508.
In a further embodiment, provided is a reference database comprising splice sites from a sequenced human genome. In certain embodiments, provide is a reference database comprising splice sites from a sequenced human genome, wherein each splice site sequence comprised in the reference data bases corresponds to a donor splice site. In certain embodiments, provide is a reference database comprising splice sites from a sequenced human genome, wherein each splice site sequence comprised in the reference data base comprises at least nucleotide positions E−5 to D+9 of a donor splice site or at least nucleotide positions E−5 to D+8 of a donor splice site.
In a further embodiment, provided is a Clinical Splice Predictor (CSP) reference database comprising variant splice sites with clinical classifications. In certain embodiments, provided is a CSP reference database comprising variant splice sites with clinical classifications, wherein each variant splice site comprised in the CSP reference database is classified as an abnormal splice site or as a benign variant splice site. In related embodiments, provided is a CSP reference database comprising variant splice sites with clinical classifications, wherein each variant splice site comprised in the CSP reference database is classified as an abnormal splice site or as a benign variant splice site and wherein a variant splice site classified as an abnormal splice site is also classified as a pathogenic splice site. In certain embodiments, provided is a CSP reference database comprising variant splice sites with clinical classifications, wherein each splice site sequence comprised in the CSP reference data bases corresponds to a donor splice site. In certain embodiments, provided is a CSP reference database comprising variant splice sites with clinical classifications, wherein each splice site sequence comprised in the CSP reference data base comprises at least nucleotide positions E−5 to D+9 of a donor splice site or at least nucleotide positions E−5 to D+8 of a donor splice site.
All references cited herein, including patents, patent applications, publications, and databases, are hereby incorporated by reference in their entireties, whether previously specifically incorporated or not.
Anonymised patient reports, which were generated subject to a confidentiality agreement. In each report, the risk of abnormal splicing of a sample splice site from a patient was assessed and the risk provided. The abnormal splicing of the splice site was confirmed by mRNA studies. In one report information under “Notes and Interpretation” was provided. In other reports, this information was not completed and while text is provided in the section, it is not associated with any information content.
Splicing Studies on mRNA
Neuronal Ceroid Lipofuscinosis (NCL)
Genetic testing of DNA extracted from blood of the affected individual identified a homozygous likely pathogenic variant in CLN5, c.320+5G>A
cDNA Studies Performed to Assess the Intronic Variant:
RT-PCR was performed on mRNA extracted from blood from the family trio (unaffected parents and affected individual). An abnormal pattern was observed for amplified cDNA products encompassing exons 1-2 and 1-3 of CLN5 in the proband (P) compared to controls (C1, C2) and the parental samples (F, M) (see
These data are suggestive of abnormal splicing of exon 1 in most CLN5 transcripts for the proband.
Possible consequences of the c.320+5G>A variant:
1) Omission of exon 1, with the mRNA beginning within exon 2
2) Abnormal Extension of exon 1 with inclusion of intron-1 sequences, and splicing from the cryptic intron-1 donor.
3) Omission of most/all of exon 1, with the mRNA beginning within the intron-1 pseudo-exon
4) Omission of part of exon 1, with inclusion of intronic sequences
No normally spliced exon 1-exon 2-exon-3 products were detected in the proband.
Inclusion of intronic sequences will induce a damaging effect for the encoded CLN5 protein.
mRNA studies confirm the homozygous CLN5 c.320+5G>A variant induces abnormal splicing of CLN5 transcripts.
All detected abnormal splicing events are likely to render the encoded CLN5 protein dysfunctional/non-functional.
No normal spliced exon 1-exon 2-exon 3 products were detected in the proband.
Collective data are consistent with likely pathogenicity of the CLN5c.320+5G>A variant.
Homozygous variants in CLN5 are consistent with the phenotype of neuronal ceroid lipofuscinosis in the affected individual.
Congenital hypotonia.
Homozygous class 4 variant in RYR1
Homozygous variant of uncertain significance in CC2D2A:
NM_001080522.2:c.438+1G>T
This variant has not previously been reported in ClinVar. This variant is not present in the Genome Aggregation Database (gnomAD).
Conclusions
RT-PCR was performed on mRNA extracted from the whole blood taken from the unaffected parent carriers of the c.438+1G>T variant.
We detected one abnormal splicing event resulting from the c.438+1G>T variant:
1. Exon-7 skipping (
We also detected normal splicing of CC2D2A transcripts in all samples (
RT-PCR of CC2D2A mRNA Isolated from Blood (
A) Using two sets of primers flanking the c.438+1G>T variant we detect one abnormally sized band in the maternal and paternal samples (Band #2). Sanger sequencing confirmed this band corresponds to exon-7 skipping. We also detect normal exon-6-7-8 splicing in all samples (Band #1), consistent with both parents being heterozygous carriers of the c.438+1G>T variant.
B) Using a forward primer in intron-7 and a reverse primer in exon-9 we were unable to detect intron retention or use of a cryptic 5′-splice site.
C) Amplification of GAPDH demonstrates cDNA loading. Replicate samples were subject to PCR for 25 or 30 cycles in order to confirm the PCR cycling conditions were sub-saturating and able to detect lower levels or quality of a specimen. Lanes: Mother (M), Father (F), Control 1 (C1) (female, 24 years), Control 2 (C2) (male, 31 years).
Sanger sequencing of RT-PCR amplicons showed the abnormally sized Band #2 in the maternal and paternal samples was due to exon-7 skipping (
Schematic of the splicing abnormality induced by the c.438+1G>T variant. (
The c.438+1G>T variant results in exon-7 skipping, an in-frame event. Exon-7 skipping removes 34 amino acids p. (Ser113 Glu146del) from the CC2D2A protein, of which 24 residues are conserved in mammals as shown in
Intellectual disability, epilepsy and cardiac arrhythmia.
Exome sequencing identified a heterozygous variant in CACNA1E gene:
Chr1(GRCh37):g.181547008G>A
p.?
This variant has not previously been reported in ClinVar. This variant is not present in the Genome Aggregation Database (gnomAD).
No evidence for abnormal splicing induced by the CACNA1E c.616+3G>A variant was found.
CACNA1E exon-4 is a canonical exon included in all RefSeq CACNA1E isoforms. Therefore splicing outcomes observed in blood RNA hold relevance to the predominant CACNA1E isoform expressed in brain.
mRNA Studies Performed to Assess the Extended Splice Site Variant:
RT-PCR was performed on mRNA extracted from the whole blood of the affected individual. We found no evidence for abnormal splicing
Sanger sequencing of RT-PCR amplicons confirmed intron-4 retention in the patient and controls. Levels of intron-4 retention from the c.616+3G>A variant containing allele may be reduced due to the predicted strengthening of the exon-4 5′ splice site. No common SNPs were amplified by our RT-PCRs to investigate allele imbalance.
Microcephaly and pontocerebellar hypoplasia.
Previous genetic testing identified a homozygous essential splice site variant in ASNS:
Chr7(GRCh37):g.97482371C>T
p.?
Conclusions
mRNA Studies to Assess the ASNS Essential Splice-Site Variant and Consequences for the Encoded Asparagine Synthetase Protein
Summary of results in blood mRNA
RT-PCR was performed on mRNA extracted from the whole blood of the proband and his unaffected parents.
RNA studies of ASNS cDNA derived from whole blood gave robust PCR results. We found no evidence of normal splicing in the patient sample using six different primer combinations. We detect four predominant abnormal splicing events (
Ectopic inclusion of at least 57 nucleotides of intron 12 including a premature termination codon (
Exon-12 skipping abnormally removes 156 nucleotides from the ASNS mRNA, deleting 52 amino acids p. (Asn441_Gln492del) from the encoded asparagine synthetase protein.
Use of the Exon 12 cryptic 5′splice-site abnormally removes 48 nucleotides from exon 12, deleting 16 amino acids p. (Lys478_Val493del) from the encoded asparagine synthetase protein.
Retention of intron 11, or intron 12, or both intron 11 and 12—results inclusion of intronic sequence into the ASNS mRNA transcript. In all cases (retention of intron 11, intron 12 or both intron 11 and 12) the resultant abnormal mRNA encodes a premature termination codon, and thus may be targeted by nonsense-mediated decay. Any ASNS transcripts escaping nonsense-mediated decay encode asparagine synthetase proteins lacking a complete asparagine synthetase enzymatic domain, and are therefore likely to be dysfunctional/non-functional.
All splicing outcomes impact the asparagine synthetase domain (p. 213-536) and are consistent with a damaging effect on the asparagine synthetase protein.
Primary ciliary dyskinesia.
Previous genetic testing identified two compound heterozygous variants in ARMC4:
Variant of Uncertain Significance
p.?
This variant has not previously been reported in ClinVar. This variant is not present in the Genome Aggregation Database (gnomAD).
Nonsense Variant
This variant has previously been reported in ClinVar. This variant is present in the Genome Aggregation Database (gnomAD) at an allele frequency of 0.000007969 (1/125486).
Conclusions
mRNA Studies Performed to Assess the c.1743+5G>C Variant:
Summary of Results in mRNA Derived from Skin
RT-PCR was performed on mRNA extracted from the skin of the unaffected father.
In the paternal and control samples we detect:
In control samples we also detect:
Intron-12 retention (
RT-PCR of ARMC4 mRNA isolated from skin.
A) Using two sets of primers flanking the c.1743+5G>C variant we detect three amplicons:
Band #1: Normal exon-11-12-13 splicing (paternal and control samples).
Band #2: Heteroduplex (controls only).
Band #3: Exon-12 skipping (paternal and control samples).
B) Using a reverse primer in intron-12 we detect intron-12 retention in control samples (Band #4)*. Intron-12 retention was not detected in the paternal sample.
C) Amplification of GAPDH demonstrates cDNA loading. Replicate samples were subject to PCR for 25 or 30 cycles in order to confirm the PCR cycling conditions were sub-saturating and able to detect lower levels or quality of a specimen. Lanes: Father (F), Control 1 (C1) (male, 48 years), Control 2 (C2) (male, 52 years)
A) In the paternal sample:
Band #1 corresponds to normal splicing
Band #3 corresponds to exon-12 skipping
B) and C) In control samples:
Band #1 corresponds to normal splicing
Band #2 is a heteroduplex of DNA consisting of normal splicing and exon-12 skipping
Band #3 corresponds to exon-12 skipping
Band #4 corresponds to intron-12 retention
We detect increased levels of ARMC4 exon-12 skipping relative to normal splicing of exons 11-12-13 in the parental carrier of the c.1743+5G>C variant, relative to controls. Exon-12 skipping removes 70 amino acids p. (Ile512 Leu581del) from the Armadillo domain of the ARMC4 protein, of which 30 residues are highly conserved between mammals, birds, fish, amphibians and insects. Evolutionary conservation of deleted residues within the Armadillo domain throughout vertebrate evolution strongly infer a functional importance.
Joubert syndrome.
Previous genetic testing identified a nonsense variant in the AHI1 gene:
This variant has previously been reported in ClinVar (RCV000002087.3) as pathogenic.
Previous genetic testing identified an extended splice site variant in the AHI1 gene:
p.?
This variant has not previously been reported in ClinVar. This variant is not present in the Genome Aggregation Database (gnomAD).
mRNA Studies Performed to Assess the Extended Splice Site Variant:
RT-PCR was performed on mRNA extracted from the family trio (unaffected parents and affected individual). Several abnormally spliced products were observed in the patient (P) and paternal (F) samples (who carries who carries the c.2492+5G>A variant) using primers in exon 16 and exon 19. A band approximately 40 bp larger than expected, and another approximately 120 bp smaller than expected were observed in the patient and paternal samples.
No splicing defects were detected in the maternal sample (carrying the nonsense variant) using any primer combination.
Sanger sequencing revealed the c.2492+5G>A variant results in:
Abnormal splicing events were confirmed in two separate experiments using two different primer pairs.
RT-PCR of AHI1 mRNA Isolated from Blood.
RT-PCR using primers in exons 16 and 19 of AHI1.
The c.2492+5G>A variant induces exon 18 skipping (yellow arrow) and use of a cryptic donor (red arrow).
Lanes: Patient (P), mother (M), father (F) control 1 (C1), control 2 (C2).
Both the c.2492+5G>A and c.1051C>T variants induce premature termination codons with a clear, damaging effect for the encoded AHI1 protein. Both premature termination codons are predicted to target AHI1 transcripts for nonsense-mediated decay. Any AHI1 transcripts escaping nonsense-mediated decay encode AHI1 proteins lacking key functional domain(s) (WD domain(s) and SH3 domain) and are therefore likely to be dysfunctional or non-functional.
mRNA studies confirm the heterozygous c.2492+5G>A variant induces abnormal splicing of AHI1 transcripts. All splicing outcomes induce a premature termination codon and are unlikely to be translated into functional protein.
The heterozygous c.1051C>T nonsense variant has been previously reported as pathogenic in ClinVar.
Collective data from RT-PCR are consistent with likely pathogenicity of the AHI1c.2492+5G>A variant.
Compound heterozygous variants in AHI1 are consistent with autosomal recessive Joubert syndrome.
Neonate in intensive care with cardiac complications. Suspected Barth syndrome.
Conclusions
1. mRNA studies confirm the hemizygous TAZ c.238G>C variant induces abnormal splicing of TAZ transcripts in blood and myocardial mRNA.
2. TAZ exon-2 is a canonical exon included in all predominant TAZ isoforms expressed in heart.
3. All detected abnormal splicing events are in-frame, though insert (use of intron-2 cryptic 5′ splice-site) or delete (exon-2 skipping) numerous amino acids within an evolutionarily conserved region of the tafazzin protein.
4. Abnormal splicing outcomes detected are consistent with a damaging effect for the encoded tafazzin protein.
cDNA Studies to Assess the Missense/5′ Splice-Site Variant (Last Base of Exon):
RT-PCR was performed on mRNA extracted from the affected individual.
Summary of Results in Blood cDNA:
Summary of Results in Myocardial cDNA:
RT-PCR was performed on mRNA extracted from the myocardium of the affected individual and two disease controls (C5, C6).
Use of Intron-2 cryptic 5′ splice-site abnormally includes 36 nt of intron-2 into the TAZ pre-mRNA, encoding 12 ectopic amino acids into the tafazzin protein.
Exon-2 skipping abnormally removes 129 nucleotides from the TAZ pre-mRNA. This event is in frame, deleting 43 (highly conserved) amino acids from the encoded tafazzin protein.
The RT-PCR results infer splicing outcomes consistent with a damaging effect for the encoded tafazzin protein.
Severe concentric hypertrophic cardiomyopathy. Proximal muscle weakness with a raised CK level.
Previous genetic testing identified a hemizygous variant of uncertain significance in LAMP2:
ChrX(GRCh37):g.119576451T>A
This variant has not previously been reported in ClinVar. This variant is not present in the Genome Aggregation Database (gnomAD).
Conclusions
The most likely outcome for the encoded LAMP2 protein is protein deficiency, due to nonsense mediated decay of mis-spliced transcripts that will preclude translation of LAMP2 protein. A possible outcome is expression of a truncated, dysfunctional LAMP2 (which lack a transmembrane anchor) through translation of mis-spliced LAMP2 transcripts that escape nonsense-mediated decay.
mRNA Studies Performed to Assess the Extended Splice Site Variants:
Summary of Results in mRNA Derived from Whole Blood
RT-PCR was performed on mRNA extracted from the whole blood of the proband and affected male sibling.
We detect one abnormal splicing event resulting from the c.928+3A>T variant (
1. Exon-7 skipping (
We did not detect normal splicing of LAMP2 transcripts in the proband and affected sibling (
A) Using two sets of primers flanking the c.928+3A>T variant we detect a single band corresponding to exon-7 skipping in the proband and affected sibling mRNA (Band #1). In two controls we detect a single band corresponding to normal exon-6-7-8-splicing (Band #2).
B) Using a forward primer in exon-4 and a reverse primer in exon-7 we are unable to detect any transcripts containing exon-7 in the proband or affected sibling.
C) Using a reverse primer in intron-7, designed to detect use of a potential cryptic 5′ splice site upstream of the native exon-7 5′ splice site, we found no evidence of abnormal splicing.
D) Amplification of GAPDH demonstrates cDNA loading. Lanes: Proband (P), Sibling (S) (male, 3 years), Control 1 (C1) (male, 7 months), Control 2 (C2) (male, 5 years). Replicate samples were subject to PCR for 25 or 30 cycles in order to confirm the PCR cycling conditions were sub-saturating and able to detect lower levels or quality of a specimen.
The c.928+3A>T variant induces exon-7 skipping (p.Lys289Phefs*36) causing a frameshift and encoding premature termination codon. These mis-spliced transcripts are predicted to be targeted for nonsense-mediated decay. Any LAMP2 transcripts escaping nonsense-mediated decay encode LAMP2 proteins lacking the C-terminal transmembrane domain and are likely to be dysfunctional/non-functional.
Mental Retardation, ataxia, distinct facial features.
Previous genetic testing identified a variant of uncertain significance in the OPHN1 gene:
p.?
This variant has not previously been reported in ClinVar. This variant is not present in the Genome Aggregation Database (gnomAD).
mRNA Studies Performed to Assess the Extended Splice Site Variant:
RT-PCR was performed on mRNA extracted from the whole blood of the affected individual and his unaffected mother
No evidence for normal splicing in the patient sample was identified (
Exon-8 skipping abnormally removes 105 nucleotides from the OPHN1 pre-mRNA. This event is in frame, deleting 35 amino acids p. (Val200_Asn234del) from the encoded OPHN1 protein.
Our RT-PCR results infer splicing outcomes consistent with a damaging effect for the encoded Oligophrenin-1 protein.
Conclusions:
Hemizygous variants in OPHN1 are consistent with X-linked recessive mental retardation MIM #300486
Perrault syndrome.
A clinical exome analysis identified two heterozygous variants in HSD17B4:
Pathogenic Missense Variant
Previously reported as likely pathogenic/pathogenic in ClinVar (RCV000415821.5, RCV000008094.5, RCV000688945.1). This variant is present in the Genome Aggregation Database (gnomAD) at an allele frequency of 0.0002025 (57/281472).
Variant of Uncertain Significance
p.?
This variant has no previous reports in ClinVar. This Variant is absent from the Genome Aggregation Database (gnomAD).
Conclusions
RT-PCR was performed on mRNA extracted from a transformed lymphoblast cell line derived from the affected individual.
In the absence of appropriate lymphoblast cell control RNA samples, we used mRNA from peripheral blood mononuclear cells (PBMCs) and primary human fibroblasts (PHF) as controls. It must be noted that HSD17B4 transcripts may be spliced differently between these tissues and consequently mRNA studies from PBMCs and fibroblasts may not accurately reflect splicing in the transformed lymphoblast cell line from the proband.
The c.1333+1G>C variant induces exon-15 skipping in HSD17B4 transcripts. This is an in-frame event which removes 24 amino acids (p.Gly421 Asp444del) from the Enoyl-CoA hydratase 2 region of the Hydroxysteroid (17-beta) dehydrogenase 4 protein.
In-utero death and post mortem revealed renal tubular dysgenesis.
Sequencing of ACE identified a homozygous variant of uncertain significance:
Chr17:g.61561337G>C
This variant has not previously been reported in ClinVar. This variant is not present in the Genome Aggregation Database (gnomAD).
Conclusions
RT-PCR was performed on mRNA extracted from the whole blood of the unaffected parent carriers.
We detect one abnormal splicing event resulting from the c.1709+5G>C variant (
1. Exon 11 skipping (Bands #2, #4).
A) Using primers flanking the c.1709+5G>C variant we detected 2 bands:
Band #1 and Band #3: normally spliced ACE transcripts
Band #2 and Band #4: exon 11 skipping (only detected in the maternal and paternal samples).
B) We used a forward primer designed to anneal with the exon 10-exon 12 junction to specifically amplify ACE transcripts with exon 11 skipping. Exon 11 skipping was only observed in the maternal and paternal mRNA samples (Band #5), and was not detected in two controls.
C) Amplification of GAPDH demonstrates cDNA loading. Lanes: Mother (M), Father (F), Control 1 (C1) (Female, 36 years), Control 2 (C2) (Male, 39 years).
We also detect normal splicing of ACE transcripts in the maternal and paternal samples.
We used a reverse primer in intron 11 to specifically amplify ACE transcripts with intron 11 retention. There were no detectable levels of intron 11 retention in all samples (data not shown, available on request).
Summary of Results in mRNA Derived from Fibroblasts and Renal Epithelial Cells
RT-PCR was performed on mRNA extracted from the skin fibroblasts and renal epithelia of the unaffected father.
The fibroblasts and renal epithelial cells were cultured in the presence of cycloheximide (CHX), a nonsense-mediated mRNA decay (NMD) inhibitor, or DMSO (control), in order to detect splicing outcomes targeted by NMD.
We detect three different splicing events in both cell types:
In-frame exon 11 skipping (Band #3, #4)
RT-PCR of ACE mRNA isolated from fibroblasts (i) and renal epithelia (ii).
A) Using primers flanking the c.1709+5G>C variant we detected three bands:
Band #1: normally spliced ACE transcripts (paternal sample and controls)
Band #2 Heteroduplex amplicon (paternal sample only)
DSMO: contains a mix of normally spliced transcripts and exon 11 skipping
CHX: contains normally spliced transcripts, exon 11 skipping and use of a cryptic 5′-splice site
Band #3: exon 11 skipping (only detected in the paternal sample).
B) We used a forward primer designed to anneal with the exon 10-exon 12 junction to specifically amplify ACE transcripts with exon 11 skipping. Exon 11 skipping was only observed in the paternal mRNA samples (Band #4), and was not detected in two controls.
C) Amplification of GAPDH demonstrates cDNA loading. Lanes:
i) Father (F), Control 1 (C1) (Male, 52 years), Control 2 (C2) (Male, 49 years).
ii) Father (F), Control 1 (C1) (Male, 30 years).
Band #1 contains normally spliced exon 10-11-12 transcripts (DMSO and CHX).
Band #2 DMSO: heteroduplex containing both normally spliced transcripts and exon 11 skipping.
CHX: heteroduplex containing normally spliced transcripts, exon 11 skipping and use of
a cryptic ‘GC’ 5′-splice site.
Band #3 contains transcripts with exon 11 skipping (DMSO and CHX).
The c.1709+5G>C variant results in:
1. Exon 11 skipping, an in-frame event
2. Use of a cryptic 5′-splice site, out-of-frame
Exon 11 skipping removes 41 amino acids p. (Tyr530_Arg570del) from the peptidase M2 domain of ACE, of which 26 residues are highly conserved between mammals, birds, amphibians and fish (
Use of the cryptic ‘GC’ 5′-splice site induces a frameshift and encodes a premature termination codon p. (Ala565Glufs*64). These transcripts are predicted to be degraded by NMD, consistent with rescue of these transcripts upon CHX treatment. Any transcripts escaping NMD will result in the loss of the 741 C-terminal residues of ACE, with likely/clear damaging consequences
Number | Date | Country | Kind |
---|---|---|---|
2018904348 | Nov 2018 | AU | national |
This application is a continuation of International Patent Application No. PCT/AU2019/000141, filed Nov. 15, 2019, entitled “Methods of Identifying Genetic Variants”. Foreign priority benefits are claimed under 35 U.S.C. § 119(a)-(d) or 35 U.S.C. § 365(b) of Australian Application No. 2018904348, filed Nov. 15, 2018. The contents of each of these applications are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/AU2019/000141 | Nov 2019 | US |
Child | 17319986 | US |