The disclosure relates generally to improved sex chromosome analysis, such as for noninvasive prenatal screening.
Circulating throughout the bloodstream of a pregnant woman and separate from cellular tissue are small pieces of DNA, often referred to as cell-free DNA (cfDNA). The cfDNA in the maternal bloodstream includes cfDNA from both the mother (i.e., maternal cfDNA) and the fetus (i.e., fetal cfDNA). The fetal cfDNA originates from the placental cells undergoing apoptosis and constitutes up to 25% of the total circulating cfDNA, with the balance originating from the maternal genome.
Recent technological developments have allowed for noninvasive prenatal screening of chromosomal aneuploidy in the fetus by exploiting the presence of fetal cfDNA circulating in the maternal bloodstream. Noninvasive methods relying on cfDNA sampled from the pregnant woman's blood serum are particularly advantageous over chorionic villi sampling or amniocentesis, both of which risk substantial injury and possible pregnancy loss.
Determination of the fraction of fetal cfDNA taken from a maternal test sample allows for screening of fetal aneuploidy. The fetal fraction for male pregnancies (i.e., a male fetus) can be determined by comparing the amount of Y chromosome from the cfDNA, which can be presumed to originate from the fetus, to the amount of one or more genomic regions that are present in both maternal and fetal cfDNA. Determination of the fetal fraction for female pregnancies (i.e., a female fetus) is more complex, as both the fetus and the pregnant mother have similar sex-chromosome dosage and there are few features to distinguish between maternal and fetal DNA. Methylation differences between the fetal and maternal DNA can be used to estimate the fetal fraction of cfDNA. See, for example, Chim et al., PNAS USA, 102:14753-58 (2005). In another method, the fraction of fetal cfDNA can be determined by sequencing polymorphic loci to search for allelic differences between the maternal and fetal cfDNA. See, for example, U.S. Pat. No. 8,700,338. However, as explained in U.S. Pat. No. 8,700,338 (col. 18, lines 28-36), use of polymorphic loci to determine fetal fraction can become unreliable when the fetal fraction drops below 3%. See also Ryan et al., Fetal Diag. & Ther., vol. 40, pp. 219-223 (Mar. 31, 2016), which describes setting a threshold for “no call” when the fetal fraction is below 2.8%. United States Patent Publication no. 2018/0089364 entitled “Noninvasive Prenatal Screening Using Dynamic Iterative Depth Optimization.”
The disclosures of all publications referred to herein are each hereby incorporated herein by reference in their entireties. To the extent that any reference incorporated by references conflicts with the instant disclosure, the instant disclosure shall control.
Sex-chromosome aneuploidies (SCA) analysis in a Prenatal Screen serves two purposes: 1) predicting the sex of a fetus (“sex calling”) and 2) screening for sex-chromosome (chromosomes X and/or Y) aneuploidies. We have updated the underlying sex-calling algorithm in order to 1) predicting the sex of each fetus individually in a twin pregnancy (“twin sex calling”) and 2) incorporate two additional variables to identify complex cases, including those likely involving a vanishing twin and maternal mosaicism. These improvements provide a model that is easy to extend and more robust, due to the principled Bayesian theory to provide improved performance and accuracy, while maintaining current production performance.
Systems and methods for analyzing sex-chromosomes are provided. In various implementations, for example, sex-chromosome aneuploidies (SCA) analysis in a prenatal screen is provided to perform at least one of the following: 1) sex calling, 2) screening for sex-chromosome (chromosomes X and/or Y) aneuploidies, 3) perform twin sex calling, and 4) incorporate two or more additional variables to identify complex cases, including those that may involve a vanishing twin and maternal mosaicism. The systems and methods utilize a Bayesian network trained on information related to at least one sex chromosome and trained and calibrated on a cohort of historical samples to establish statistical parameters and thresholds of confidence.
Fetal maternal samples taken from pregnant women include both maternal cell-free DNA and fetal cell-free DNA. Described herein are methods for determining a chromosomal abnormality of a test chromosome or a portion thereof in a fetus by analyzing a test maternal sample of a woman carrying said fetus, wherein the test maternal sample comprises fetal cell-free DNA and maternal cell-free DNA. The chromosomal abnormality can be, for example, aneuploidy or the presence of a microdeletion. In some embodiments, the chromosomal abnormality is determined by measuring a dosage of the test chromosome or portion thereof in the test maternal sample, measuring a fetal fraction of cell-free DNA in the test maternal sample, and determining an initial value of likelihood that the test chromosome or the portion thereof in the fetal cell-free DNA is abnormal based on the measured dosage, an expected dosage of the test chromosome or portion thereof, and the measured fetal fraction.
In one implementation, for example, a system and method adapted to analyze sex-chromosome aneuploidies of an individual is provided. The aneuploidies may include the following types by example: XXY, XYY, X, or XXX (referring to the number of X and Y chromosomes in the fetus) that are copies of chromosomes which are abnormal from the typical female XY and male XX chromosomes. In this implementation, a Bayesian network is adapted to be trained based on predetermined information related to at least one sex chromosome. A machine learning module is used to determine a sex-chromosome status based on a normalized read depth of the individual for the gene. The machine learning module is configured to receive inputs, such as the normalized read depth per chromosome, fetal fraction, and total number of sequencing reads and output the respective sex-chromosome status of the individual.
The foregoing and other aspects, features, details, utilities, and advantages of the present invention will be apparent from reading the following description and claims, and from reviewing the accompanying drawings.
The variables in Table 1 include the fetal fraction as provided from normalized map reads on chrX versus chrY versus a whole genome inference.
In Table 1, FFt is the true unobserved fetal fraction, FFchrX and FFchrY is the deviation from expected normalized read depth for chromosome X and Y respectively, and SCA is a sex call. After selecting priors, the priors P(FFt), and P (SCA), other useful probabilities can also be derived. In one example, it can be assumed that all four parameters have Gaussian error with means and variances. FFt can be assumed to follow beta distribution, and its parameters fit using a maximum likelihood model on previously observed data with known fetal fraction. Elements in the sample space are the following:
The relationships between the observed variables in Table 1 and the unobserved variables (SCA, and FFt) are shown in the graphical model of
In the Bayesian network shown in
p
sex call˜Dirichlet(w) where, w=(w1, . . . , wk), k=6
sex call˜Categorical(psex call)
FFt˜Beta(αFF, βFF)
FFinferred˜(μFF
FCchrX˜(μFC
FCchrY˜(μFC
in which there is a systematic, depth dependent bias for fetal fraction, FFinferred, predictions.
Where αFFi and βFFi are fit by downsampling data. Depth scaling corrections to the variances in the Gaussian probabilities is performed by calculating variances as follows where d is the total number of sequencing reads:
σFC
σFC
σFF
Fold changes and fetal fractions are converted according to a sex call,
where RXY=CNchrY/(2−CNchrX). Where CN is the copy number of placental cells. The relationship between FFchrX and FFchrY can be assumed to not be one-to-one. The parameters are given flat, uniform priors. In one embodiment, depth scaling is of an expected variance for use in a Bayesian graphical model, and the depth can e the total sequencing read count.
w=(wXY, wXX, wXXY, wXYY, wX, wXXX)
αFF, βFF˜Unif
σS
σS
σS
αXY, βXY˜Unif
Since the different sex classes exhibit unique signatures in allosomes (FF_chrX and FF_chrY), these signatures can be used this to make a sex prediction. Table 2 shows six canonical sex classes and the expected values for FF_chrX and FF_chrY for each class.
The prior prevalence of the sex classes can be combined with the likelihood of the data for a given sex-calling hypothesis and constructed a posterior probability of a sex call (see Equation 1). In doing so, a generative model of fetal fraction measurements can be constructed from a true sex call according to a true fetal fraction in which a latent true fetal fraction (FFt) is postulated under which each FF measurement is conditionally independent from the other. And using the Bayesian theorem, the posterior probability of sex calls given the data for each sample can be computed.
P(SCA|FFchrX, FFchrY, FFinferred, depth)∝P(SCA)P(FFchrX, FFchrY, FFinferred, depth|SCAj) (1)
Since the Bayesian sex caller (BSC) uses FFinferred in this example implementation of a model, it can be capable of making sex hypotheses for vanishing twins (XXVT) or maternal mosaic monosomy X (X_MOS) (see Table 3). Vanishing twin syndrome occurs when a twin or multiple disappears in the uterus during pregnancy as a result of a miscarriage of one twin or multiple. The fetal tissue is absorbed by the other twin, multiple, placenta or the mother. This gives the appearance of a “vanishing twin.” Maternal mosaicism is the case that a subset of the mother's own cells have a deletion of a portion or all of chromosome X.
XXVT and X_MOS can be converted to report out as XX since that is the true sex chromosome status of the fetus in these particular scenarios.
For twins' sex calling, the pregnancy can be assumed to be a twin pregnancy and a sex prediction made according to the likelihood specified in Table 4. XX|XX means both twins are female, XX|XY means one fetus is male and the other female, and XY|XY means both twins are male.
In summary, the four variables can be used for each sample to make a sex prediction as described herein.
A model can consume these data and provide a set of posterior probabilities. The model then chooses the sex class for the highest posterior probability for each singleton and twin prediction. An example outcome for a sample is shown in Table 5. The singleton or twin status is provided at the time of ordering, and thus the appropriate sex prediction is reported.
In
SCA sensitivity, SCA specificity, and sex-calling accuracy were evaluated for singletons by using the clinical outcome data. For twins, the sex-calling accuracy was evaluated by using clinical outcome data on twins. Table 6 shows the number of SCAs in the pre-processed clinical outcome data that have been used in the validation.
In this example, 57 twin samples met all the criteria. Table 7 shows the distribution of twin types (XX and XX pregnancy, one XX and one XY pregnancy, or XY and XY pregnancy) samples in the dataset.
The singleton data and the twin data were analyzed and compared them to known sex aneuploidy and sex calls. Each of the calls was labeled according to Table 2 and generate the relative metrics specified in Equation 2, Equation 3, Equation 4, and Equation 5.
System 600 may be, for example, in the form of a client-server computer capable of connecting to and/or facilitating the operation of a plurality of workstations or similar computer systems over a network. In another embodiment, system 600 may connect to one or more workstations over an intranet or internet network, and thus facilitate communication with a larger number of workstations or similar computer systems. Even further, system 600 may include, for example, a main workstation or main general-purpose computer to permit a user to interact directly with a central server. Alternatively, the user may interact with system 600 via one or more remote or local workstations 613. As will be appreciated by one of ordinary skill in the art, there may be any practical number of remote workstations for communicating with system 600.
CPU 601 may include one or more processors, for example Intel® Core™ G7 processors, AMD FX™ Series processors, or other processors as will be understood by those skilled in the art (e.g., including graphical processing unit (GPU)-style specialized computing hardware used for, among other things, machine learning applications, such as training and/or running the machine learning algorithms of the disclosure; such GPUs may include, e.g., NVIDIA Tesla™ K80 processors). CPU 601 may further communicate with an operating system, such as Windows NT® operating system by Microsoft Corporation, Linux operating system, or a Unix-like operating system. However, one of ordinary skill in the art will appreciate that similar operating systems may also be utilized. Storage 602 (e.g., non-transitory computer readable medium) may include one or more types of storage, as is known to one of ordinary skill in the art, such as a hard disk drive (HDD), solid state drive (SSD), hybrid drives, and the like. In one example, storage 602 is utilized to persistently retain data for long-term storage. Memory 603 (e.g., non-transitory computer readable medium) may include one or more types of memory as is known to one of ordinary skill in the art, such as random access memory (RAM), read-only memory (ROM), hard disk or tape, optical memory, or removable hard disk drive. Memory 603 may be utilized for short-term memory access, such as, for example, loading software applications or handling temporary system processes.
As will be appreciated by one of ordinary skill in the art, storage 602 and/or memory 603 may store one or more computer software programs. Such computer software programs may include logic, code, and/or other instructions to enable processor 601 to perform the tasks, operations, and other functions as described herein (e.g., the monte carlo sampling of a posterior distribution from a Bayesian graphical model described herein), and additional tasks and functions as would be appreciated by one of ordinary skill in the art. Operating system 602 may further function in cooperation with firmware, as is well known in the art, to enable processor 601 to coordinate and execute various functions and computer software programs as described herein. Such firmware may reside within storage 602 and/or memory 603.
Moreover, I/O controllers 606 may include one or more devices for receiving, transmitting, processing, and/or interpreting information from an external source, as is known by one of ordinary skill in the art. In one embodiment, I/O controllers 606 may include functionality to facilitate connection to one or more user devices 609, such as one or more keyboards, mice, microphones, trackpads, touchpads, or the like. For example, I/O controllers 606 may include a serial bus controller, universal serial bus (USB) controller, FireWire controller, and the like, for connection to any appropriate user device. I/O controllers 606 may also permit communication with one or more wireless devices via technology such as, for example, near-field communication (NFC) or Bluetooth™. In one embodiment, I/O controllers 606 may include circuitry or other functionality for connection to other external devices 610 such as modem cards, network interface cards, sound cards, printing devices, external display devices, or the like. Furthermore, I/O controllers 606 may include controllers for a variety of display devices 608 known to those of ordinary skill in the art. Such display devices may convey information visually to a user or users in the form of pixels, and such pixels may be logically arranged on a display device in order to permit a user to perceive information rendered on the display device. Such display devices may be in the form of a touch screen device, traditional non-touch screen display device, or any other form of display device as will be appreciated be one of ordinary skill in the art.
Furthermore, CPU 601 may further communicate with I/O controllers 606 for rendering a graphical user interface (GUI) on, for example, one or more display devices 608. In one example, CPU 601 may access storage 602 and/or memory 603 to execute one or more software programs and/or components to allow a user to interact with the system as described herein. In one embodiment, a GUI as described herein includes one or more icons or other graphical elements with which a user may interact and perform various functions. For example, GUI 607 may be displayed on a touch screen display device 608, whereby the user interacts with the GUI via the touch screen by physically contacting the screen with, for example, the user's fingers. As another example, GUI may be displayed on a traditional non-touch display, whereby the user interacts with the GUI via keyboard, mouse, and other conventional I/O components 609. GUI may reside in storage 602 and/or memory 603, at least in part as a set of software instructions, as will be appreciated by one of ordinary skill in the art. Moreover, the GUI is not limited to the methods of interaction as described above, as one of ordinary skill in the art may appreciate any variety of means for interacting with a GUI, such as voice-based or other disability-based methods of interaction with a computing system.
Moreover, network adapter 604 may permit device 600 to communicate with network 611. Network adapter 604 may be a network interface controller, such as a network adapter, network interface card, LAN adapter, or the like. As will be appreciated by one of ordinary skill in the art, network adapter 604 may permit communication with one or more networks 611, such as, for example, a local area network (LAN), metropolitan area network (MAN), wide area network (WAN), cloud network (IAN), or the Internet.
One or more workstations 613 may include, for example, known components such as a CPU, storage, memory, network adapter, power supply, I/O controllers, electrical bus, one or more displays, one or more user input devices, and other external devices. Such components may be the same, similar, or comparable to those described with respect to system 600 above. It will be understood by those skilled in the art that one or more workstations 613 may contain other well-known components, including but not limited to hardware redundancy components, cooling components, additional memory/processing hardware, and the like.
Although implementations have been described above with a certain degree of particularity, those skilled in the art could make numerous alterations to the disclosed embodiments without departing from the spirit or scope of this invention. All directional references (e.g., upper, lower, upward, downward, left, right, leftward, rightward, top, bottom, above, below, vertical, horizontal, clockwise, and counterclockwise) are only used for identification purposes to aid the reader's understanding of the present invention, and do not create limitations, particularly as to the position, orientation, or use of the invention. Joinder references (e.g., attached, coupled, connected, and the like) are to be construed broadly and may include intermediate members between a connection of elements and relative movement between elements. As such, joinder references do not necessarily infer that two elements are directly connected and in fixed relation to each other. It is intended that all matter contained in the above description or shown in the accompanying drawings shall be interpreted as illustrative only and not limiting. Changes in detail or structure may be made without departing from the spirit of the invention as defined in the appended claims.
This application claims the benefit of U.S. provisional application No. 63/063,401, filed 9 Aug. 2020, and U.S. provisional application No. 63/151,451 filed 19 Feb. 2021, each application of which is hereby incorporated by reference as though fully set forth herein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/044644 | 8/5/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63151451 | Feb 2021 | US | |
63063401 | Aug 2020 | US |