The present invention relates generally to the analysis of double-stranded nucleic acids and, more particularly, to the high resolution melt analysis of double-stranded nucleic acids.
High resolution melt (HRM) analysis allows for the detection of mutations, polymorphisms, and epigenetic differences in double-stranded nucleic acids in a sample without sequencing the nucleic acids. Typically, for HRM analysis, a target nucleic acid sequence is amplified using the polymerase chain reaction (PCR) technique in the presence of a reporter molecule, such as a fluorescent dye, that selectively fluoresces when associated with a double-stranded nucleic acid. It has been observed that the signal produced from monitoring the slow melt of a double-stranded nucleic acid, such as an amplified DNA sequence, follows a generally sigmoidal pattern in which the signal level decreases as a function of temperature. The shape of the HRM curve and the melt temperature, i.e., the temperature at which the signal exhibits the greatest amount of change, is determined by the specific sequence of nucleotides composing double-stranded nucleic acid. Samples having mixtures of double-stranded nucleic acids, such as occurs with a heterogeneous sample, or samples having one or more mutations, will exhibit changes in the shape of the HRM curve and/or a shift in the melting temperature.
Existing HRM analysis is done by use of difference plot visualization, in which the changes in signal level of a data set from a first sample is used as a baseline, and the difference from data sets from other samples to the baseline is plotted across the entire temperature range. While this method can allow for the grouping of similar double-stranded nucleic acids, the results can be difficult to analyze, the view of the results is dependent upon the data chosen as the baseline, and automated grouping algorithms can be difficult to construct or can be confused by the baseline choice thus leading to different results. A need for a better method of analyzing HRM data was thus identified.
Before HRM analysis was available, the temperature resolution of instruments was not able to distinguish small changes in double-stranded nucleic acid melt temperature such as those caused by single nucleotide changes. Even after HRM capable instruments have become available, melt temperature information has not been considered informative enough as melt temperature of a heterozygote sample may be very close to homozygote melt temperature. However, the when melt temperature data is combined with other information such as at least one of a peak height value, a peak width value, or an area under the curve value, discrimination between the melt temperatures from double stranded nucleic acids is greatly enhanced.
Described herein are methods and systems for analyzing and visualizing HRM data from a double-stranded nucleic acid utilizing data points associated with the HRM data that includes a combination of the melt temperature and at least one of a peak height value, a peak width value and/or an area under the curve value. The HRM data is generally characterized by a plurality of data points each including a signal value associated with the concentration of a double-stranded nucleic acid in a sample and a temperature value associated with a the temperature of the sample. Embodiments of the invention analyze the HRM curves of samples using the first negative derivative of the HRM curve or by difference plot visualization of the HRM data using a virtual standard.
In one embodiment, the method includes generating a HRM curve from the HRM data for each sample and plotting the first negative derivative of the HRM curves. The melt peak for each sample is identified from the first negative derivative plot for each sample and analyzed. In an embodiment, the melt peak, which represents the melt temperature for the sample, is identified as the data point along the first negative derivative plot having the greatest distance from the x-axis.
In an alternative embodiment, a Gaussian probability function is fit to the first negative derivative plot and the melt peak is the data point along the Gaussian probability function having the greatest distance from the x-axis.
In another alternative embodiment, the Gaussian probability function is subtracted from the first negative derivative curve and a second melt peak from the subtracted data set is identified. Additional melt peaks can be identified with additional Gaussian probability subtraction steps.
In another alternative embodiment, HRM curves from at least two samples are normalized relative to one another. The first negative derivative is then plotted for the normalized HRM curves for each sample and the melt peaks for each sample are identified and analyzed.
In an alternative embodiment, the melt peak is identified for a first negative derivative plot and the width of the plot is calculated at a fraction of the melt peak height. A data point having a temperature value and at least one of a width value or a peak height value is then analyzed, such as by plotting on a scatter plot.
Another aspect of the invention is directed to improved methods of visualizing HRM data for one or more samples by generating a HRM curve for each sample, providing a virtual standard, and plotting the differences between the HRM curve for each sample and the virtual curve. An alternative embodiment further includes plotting the first negative derivative of the HRM curve and the virtual standard and then plotting the differences between the first negative plot for each sample and the virtual curve.
The virtual curve can be provided using a number of techniques. In one embodiment, the signal values for the virtual curve are derived from the averages of the signal values for the HRM curves from the samples. In another embodiment, the virtual curve is a derived from the theoretical melting profile of a target double-stranded nucleic acid. In yet another embodiment, the virtual curve is calculated from a formula with variables that may be adjusted by the user. In a further embodiment, the virtual standard includes a spline curve in a sigmoidal shape that may be altered by the user using a computer interface.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate various embodiments of the invention and, together with a general description of the invention given above and the detailed description of the embodiments given below, serve to explain the embodiments of the invention.
Complimentary strands of nucleic acids form relatively stable double strands of nucleic acids at lower temperatures. As the temperature of a sample containing a double-stranded nucleic acid is increased, the double-stranded nucleic acid melts into two single strands. Similarly, as the temperature of a sample containing complimentary single strands of nucleic acid is decreased, the complimentary nucleic acids will reassociate into double-stranded nucleic acids. The melt temperature of a double-stranded nucleic acid, i.e., the temperature at which a nucleic acid transitions between a double-stranded nucleic acid and a pair of single strands, is determined by the length and sequence of the nucleic acid strands. Differences between two or more double-stranded nucleic acids, such as double-stranded nucleic acids amplified using polymerase chain reaction (PCR), may be inferred by observing and analyzing the high resolution melting or high resolution reassociation of the double-stranded nucleic acids over a range of temperatures. As used herein, the terms “high resolution melting” or “high resolution melt” are understood to include both high resolution melt and high resolution reassociation.
With reference to
In an exemplary embodiment, the signal value for a sample is obtained with a reporter molecule that selectively fluoresces when associated with a doubled stranded nucleic acid. Thus, the signal value, i.e., the level of fluorescence observed in a sample, is indicative of the concentration of double-stranded nucleic acid in the sample. Reporter molecules useful with embodiments of the invention described herein are those that selectively provide a signal, such as a fluorescent signal, when associated with a double-stranded nucleic acid. For example, fluorescent double-stranded nucleic dyes used in real time PCR reactions may be used. Exemplary reporter molecules include SYBR® Green I, SYBR® Gold, PicoGreen® (each available from Invitrogen), and LC Green®, Eva Green, Melt Doctor, SYTO®-9, SYTO®-13, SYTO®-16, SYTO®-60, SYTO®-62, SYTO®-64, SYTO®-82, POPO-3, TOTO-3, PO-PRO-3, TO-PRO-3, YO-PRO®-1, SYTOX® Orange, BEBO, BOXTO, Chromofy, as well as other reporter molecules that selectively fluoresce when associated with double-stranded nucleic acids. In addition to the use of reporter molecules that selectively associate with double stranded nucleic acids, the reporter molecules may also be associated with fluorescent probes or primer based systems. As used herein, the term reporter molecule is understood to include any system, molecule, probe, dye, or combination thereof that is capable of generating a signal that corresponds to the concentration of double-stranded nucleic acid in a sample at a particular temperature.
For high resolution melting, the signal value is obtained from measurements taken at predetermined increments as the temperature of the sample is slowly increased from a temperature at which substantially all of the complementary nucleic acid strands in the sample are in the double-stranded state, to a temperature at which no double-stranded nucleic acid is detectable with the reporter molecule. For high resolution reassociation, the signal value is obtained from measurements taken at predetermined increments as the temperature of the sample is slowly decreased from a temperature at which substantially all of the complementary nucleic acid strands in the sample are in the single-stranded state, to a temperature at which substantially all of the nucleic acid is in a double-stranded state as detected with the reporter molecule. Typically, the signal value is measured over a range of temperatures from about 60 degrees Celsius to about 95 degrees Celsius; however, the temperature range may be increased or decreased as needed to analyze a specific nucleic acid sequence.
In accordance with embodiments of the invention, the signal value is obtained as the temperature increases by fractions of a degree over at least a portion of the melting temperature range. In an embodiment, the signal value is obtained at about every 0.1 degrees Celsius over at least a portion of the melting temperature range. In an alternative embodiment, the signal value is obtained at about every 0.2 degrees Celsius over at least a portion of the melting temperature range. In an alternative embodiment, the signal value may be obtained at about every 0.04 degrees Celsius to about 5.0 degrees Celsius over at least a portion of the melting temperature range.
With reference to
HRM curves, such as shown in
In contrast to routine methods of analyzing HRM data, embodiments of the invention analyze the HRM curves of samples using a plot of the first negative derivative of the HRM curves or the visualization of HRM data using a virtual standard. Embodiments of these methods are referred to herein as the “derivative plot methods” of
With reference to
With continued reference to
The internal smoothing process (block 30) may employ any process that internally removes insignificant variations in the data that are not associated with changes in the concentration of double-stranded nucleic acid. For example, in one embodiment, the smoothing process employs a rolling average method that averages the product values for a plurality of consecutive data points from the HRM data. In another embodiment, the data are smoothed with a Savitzky-Golay smoothing filter by fitting an nth degree polynome to a plurality of consecutive data points and calculating a smoothed product value for one or several data points with the plurality of data points. In one embodiment, the user may optionally designate the number of data points used for the rolling average.
The exponential decay removal (block 32) process removes decreasing signal value trends that are not related to changes in the double-stranded nucleic acid concentration. Exponential decay can be removed by known processes, such as mathematical processes that calculate the amount of decay observed in the saturation region 12 (
After generating the HRM curve, the first negative derivative is plotted for the HRM curve (block 32 of
As illustrated in
In an alternative embodiment, illustrated in
The melt peak data point from a first sample may be compared with the melt peak(s) of one or more other samples, or compared to known standard values, to determine if the sequences of the samples are the same or different. If the melt peaks from the one or samples are different from one another, i.e., have a different signal value, a different temperature value, or both the signal value and the temperature value are different, then the sequences of the nucleic acids in the samples are not identical. In contrast, if the melt peaks from the sample are the same, then the sequences of the nucleic acids in the samples are likely to be the same. Processes for analyzing the melt peaks are discussed in greater detail below.
Comparing the peak values between samples can be difficult due to variability in the peak values that is not associated with the double-stranded nucleic acids in the samples. For example, the melt peak values can vary due to the position of the reaction well on the thermal block or due to inaccuracies in measuring the reagents used in the analysis. In an embodiment illustrated in
HRM data may be normalized by any process that normalizes the data along the thermal axis (x-axis), the signal axis (y-axis) or along both the thermal axis and the signal axis. For thermal axis normalization, each HRM curve is shifted on the thermal axis based its location on the thermal block as determined by the thermal characteristics of the thermal block. For example, the detected melt temperature for each well may be multiplied by a standard adjustment multiplier that corresponds to the typical variation of that well from the mean of the block. The signal axis may be normalized based on user defined areas of interest in the saturation region and the background regions or preliminary areas of interest in these regions may be automatically calculated. In one embodiment, the areas of interest are identified from a first negative derivative plot of the HRM curve. The areas of interest are the areas of the first negative derivative plot having low values that correspond to areas of the HRM curves wherein the change in slope is small. The same area of interest is used for all curves being normalized to one another. The average signal value in the areas of interest across all curves being normalized are averaged and set to a first normalized signal value, such as 100, for the area of interest associated with the saturation region, and a second normalized value, such as 0, for the area of interest associated with the background region. The remaining data points are normalized to relative to the first normalized signal value and the second normalized signal value.
In addition to providing insights into the similarities and differences of HRM curves generated from different samples, the first negative derivative plot from a sample may be analyzed to identify the presence of two or more peaks in the sample, indicating that the sample includes a heterogeneous mixture of double-stranded nucleic acids.
In another embodiment illustrated in
In another embodiment, also illustrated in
In another embodiment, a Gaussian probability function is fit to the first negative derivative data and the AUC is calculated for the Gaussian probability function. This technique is particularly useful to discriminate data wherein there are multiple peaks contained in the data, such as HRM data obtained from a heterogeneous mixture of double-stranded nucleic acids. In this circumstance, the total AUC will include the AUC from each of the multiple peaks. Thus, the total AUC from a heterogeneous mixture of double-stranded nucleic acids will be greater than the AUC for data from samples having a single peak.
Regardless of how the data points are identified in the embodiments described herein, the data points 107 are plotted on a scatter plot for analysis (
In one embodiment, each melt peak data point on the scatter plot is compared with one or more standards. The standards include known values for one or more target sequences, such as the known values such as temperature value, height value, width value and/or area under the curve for each genotype of a heterogeneous gene. The Euclidian distance between the melt peak data point and the one or more standards is calculated. For samples resulting in the identification of two or more melt peaks, the distances calculated between first identified melt peak data point and the standard value is weighted more heavily than the distances calculated for each subsequent melt peak data point. For example, if a sample yields an HRM curve that results in three melt peaks being identified from the first negative derivative plot, the distance between the first melt peak data point and a standard value receives more weight during the subsequent analysis than the distances calculated for the second melt peak data point, which is subsequently weight more heavily than the distances calculated for the third melt peak data point. In one embodiment, a multiplier is used to weight the first peak more heavily than subsequent peaks. For example, the first peak may be multiplied by a value of four whereas the subsequent melt peaks are multiplied by multipliers decreasing in value, such as three, two, or one. For each data point, a ratio of the weighted distances between the closest standard and the second closest standard may be calculated. The ratio is used to indicate confidence with which a sample may be called as being like one of the first or the second closest standard. The ratio may be converted to a percentage efficiency between 100 percent and about 50 percent to indicate from the standard is calculated.
Another aspect of the invention is directed to visualizing HRM data utilizing a virtual standard 110 (
In an alternative embodiment, the first negative derivative is plotted for the virtual standard and the HRM curves (
As mentioned above, the virtual standard contrasts with methods in routine use wherein an HRM curve from one sample in an experiment is used as a baseline against which other samples in the experiment are compared. Embodiments of the virtual standard are useful when comparing HRM data across experiments and even across different platforms, which contrasts with the routine methods of analyzing HRM data which do not allow for such comparisons.
The virtual standard may be generated using a number of techniques as described in the various embodiments below. In each of the virtual standard curve embodiments, the resulting virtual standard curve can then be saved and recalled for use with other data sets.
In one embodiment illustrated in
In another embodiment, the virtual standard is a theoretical standardized curve based on the theoretical melting profile of the nucleic acid sequences being analyzed. The theoretical melting profile of a nucleic acid sequence can be based on the known parameters affecting the melt temperature of a double-stranded nucleic acid, such as the percent of each type of nucleic acid, the sequence of the nucleic acids and the length of the nucleic acid strand. For example, Visual OMP™ Nucleic Acid software from DNA Software could be used to generate a theoretical melt profile for a sequence of nucleic acids.
In an alternative embodiment, the virtual standards are user defined using a mathematical equation. This is done by allowing the user to define where the exponential region starts or stops, a maximum slope, the inclusion of desired inflection points and combinations thereof for an equation describing a theoretical standard curve. For example, in an embodiment, the user may generate a virtual standard by modifying an ideal melt curve defined by the Formula 1 below by setting the maximum signal value (RFUmax), the minimum signal value (RFUmin) and the melting temperature (Tm), wherein C is the curve parameter describing the steepness of the signal change and Ti is the x-axis value.
RFU=RFUmin+RFUmax(1−1/(1+e(C(Tm-Ti))) FORMULA 1
In an another embodiment, the user may generate a virtual standard by combining two weighted, ideal melt curves, such as two ideal curves generated by Formula 1 above. The combined curve is defined by Formula 2 below. In this embodiment, the user may set the maximum signal value (RFUmax), the minimum signal value (RFUmin), the melting temperatures for each curve (Tm1 and Tm2) and the steepness of the signal change for each curve (C1 and C2) and the weight given to each curve (W1 and W2).
RFU=RFUmin+RFUmax(1−W1/(1+e(C
Virtual curves based on other modifications to ideal curves may be used as well.
In an alternative embodiment, a spline curve in a sigmoidal shape similar to a typical melt curve is provided that the user then forms into a desired shape such as by using a computer interface, such as a mouse, to click control the shape of the spline.
The user controlled virtual curves allow the user to match details from one or more of the sample HRM curves. By making portions of the virtual standard match the details seen on one or more of the HRM curves generated from the HRM data, while other portions of the virtual standard match portions of other HRM curves, the differences between the HRM curves can be made more obvious than simply picking one curve as a standard could, as was previously done. By allowing the user or the machine to adjust the shape of the virtual standard, some HRM curves in a data set may peak in the difference plot at one temperature, while the peak for other HRM curves would be at a different temperature or be in the negative direction from the first. Additionally, HRM curves that are distinct from those used to construct the virtual standard would exhibit different shapes distinct from the others.
The analytical processes of the invention may be embodied as a method, a computer program product that includes program code 200 to execute the method, and/or a computer system 202 configured to execute the method
The program code 200 includes instructions executable on a computer system for carrying out the steps of the method. In one embodiment, the program code 200 includes instructions for analyzing HRM data and in particular, code for generating and displaying on a display 214 a scatter plot that includes the data points associated with the melting temperature. The scatter plot has a first axis for plotting the melt value for each data point and a second axis for plotting one of a peak value, a width value, or an area under the curve value for each data point. In another embodiment, the scatter plot has a third axis for plotting a third value for each data point that is the value remaining from the peak value, the width value, and the area under the curve value. Embodiments of the invention, whether implemented as part of an operating system 204, application, component, program code 200, object, module or sequence of instructions executed by one or more processing units 206 are referred to herein as “program code.” The program code 200 typically comprises one or more instructions that are resident at various times in various memory 202 and storage devices 208 in the computer system 200 that, when read and executed by one or more processors 204 thereof cause that computer system 200 to perform the steps necessary to execute the instructions embodied in the program code 200 embodying the various aspects of the invention.
While embodiments of the invention are described in the context of fully functioning computing systems 200, those skilled in the art will appreciate that the various embodiments of the invention are capable of being distributed as a program product on a computer readable storage medium. The program product may embody a variety of forms. The invention applies equally regardless of the particular type of computer readable storage medium used to actually carry out the distribution of the program code 200. Examples of appropriate computer readable storage media for the program product include, but are not limited to, non-transitory recordable type media such as volatile and nonvolatile memory devices, floppy and other removable disks, hard disk drives, USB drives, optical disks (e.g. CD-ROM's, DVD's, Blu-Ray discs, etc.), among others.
Any of the individual processes described above or illustrated in
In addition, the systems for analyzing HRM data may further include a module for collecting the HRM data (i.e. a HRM data generator) 210 and a module for receiving HRM data 212. The HRM data collection module may include a thermal cycler and a device for detecting the signal value, that result from HRM analysis, such as a change in fluorescence from double-stranded nucleic acid over a range of temperatures. HRM data collection modules as known in the art may be used in accordance with the invention. The HRM data receiving module includes components and/or program code to receive HRM data from the HRM data collection module.
A set of samples representing two homogenous samples types and one heterogeneous mix of the two was run in a PikoReal instrument and analyzed by the melt peak method. For this example, olfactory receptor, family 10, subfamily J, member 5 (OR10J5) amplicons having a single nucleotide polymorphism (G or A) identified as rs4656837 were amplified using a standard PCR protocol in the presence of SYBR Green. After amplification, the samples were slowly heated and fluorescence measured at regular intervals. As shown in
These data indicate that the samples include three different mixtures of double-stranded nucleic acids. The first group of samples is homozygous for an amplicon having the G SNP. The second group of sample is homozygous for an amplicon having the A SNP. The third group is a heterozygous mixture of the A and G SNP amplicons. As shown in Table 1, below, the analytical methods described herein clearly distinguish the difference between the homozygous and heterozygous clusters of data points. Samples 1-6 clustered nearest to standard S1 in the upper left hand quadrant of the scatter plot. Samples 7-12 clustered to standard S2 in the central lower region of the scatter plot and samples 13-18 clustered to standard S3 in the upper right region of the scatter plot. By plotting the peak heights on the scatter plot as a function of the melt temperature, the clusters are easily further analyzed such as by determining their Euclidean distance from the first and second closest standards, and calculating the ratio of these distances. The ratio was then used to calculate the percentage efficiency for the sample.
In this example, the peak width was plotted to further discriminate the signal values of the HRM melt curves. When mixed populations of double-stranded nucleic acids were present in the same sample undergoing HRM analysis, they tended to make the resulting HRM curve wider, especially when the difference in the melting temperatures of the individual probes were greater. As observed with the data presented below, this method was found to be particularly useful when analyzing multiplexed reactions.
For this example, ToxA and ToxB were amplified using a standard PCR protocol in the presence two Solaris primers and fluorescent probes for ToxA and ToxB in C. difficile.
While the present invention has been illustrated by the description of specific embodiments thereof, and while the embodiments have been described in considerable detail, it is not intended to restrict or in any way limit the scope of the appended claims to such detail. The various features discussed herein may be used alone or in any combination. Additional advantages and modifications will readily appear to those skilled in the art. The invention in its broader aspects is therefore not limited to the specific details, representative apparatus and methods and illustrative examples shown and described. Accordingly, departures may be made from such details without departing from the scope or spirit of the general inventive concept.
This application claims the benefit of and priority to prior filed pending Provisional Application Ser. No. 61/660,581, filed Jun. 15, 2012, which is expressly incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61660581 | Jun 2012 | US |