Embodiments of the present invention are related to microarray probes, and, in particular, to a method for determining a set of dye-normalization probes that consistently hybridize with target molecules over a wide range of species, tissues, and hybridization conditions.
The present invention is related to microarrays. In order to facilitate discussion of the present invention, a general background for particular kinds of microarrays is provided below. In the following discussion, the terms “microarray,” “molecular array,” and “array” are used interchangeably. The terms “microarray” and “molecular array” are well known and well understood in the scientific community. As discussed below, a microarray is a precisely manufactured tool which may be used in research, diagnostic testing, or various other analytical techniques to analyze complex solutions of any type of molecule that can be optically or radiometrically detected and that can bind with high specificity to complementary molecules synthesized within, or bound to, discrete features on the surface of a microarray. Because microarrays are widely used for analysis of nucleic acid samples, the following background information on microarrays is introduced in the context of analysis of nucleic acid solutions following a brief background of nucleic acid chemistry.
Deoxyribonucleic acid (“DNA”) and ribonucleic acid (“RNA”) are linear polymers, each synthesized from four different types of subunit molecules.
The DNA polymers that contain the organization information for living organisms occur in the nuclei of cells in pairs, forming double-stranded DNA helices. One polymer of the pair is laid out in a 5′ to 3′ direction, and the other polymer of the pair is laid out in a 3′ to 5′ direction, or, in other words, the two strands are anti-parallel. The two DNA polymers, or strands, within a double-stranded DNA helix are bound to each other through attractive forces including hydrophobic interactions between stacked purine and pyrimidine bases and hydrogen bonding between purine and pyrimidine bases, the attractive forces emphasized by conformational constraints of DNA polymers. FIGS. 2A-B illustrates the hydrogen bonding between the purine and pyrimidine bases of two anti-parallel DNA strands. AT and GC base pairs, illustrated in FIGS. 2A-B, are known as Watson-Crick (“WC”) base pairs. Two DNA strands linked together by hydrogen bonds forms the familiar helix structure of a double-stranded DNA helix.
Double-stranded DNA may be denatured, or converted into single stranded DNA, by changing the ionic strength of the solution containing the double-stranded DNA or by raising the temperature of the solution. Single-stranded DNA polymers may be renatured, or converted back into DNA duplexes, by reversing the denaturing conditions, for example by lowering the temperature of the solution containing complementary single-stranded DNA polymers. During renaturing or hybridization, complementary bases of anti-parallel DNA strands form WC base pairs in a cooperative fashion, leading to reannealing of the DNA duplex.
Once a microarray has been prepared, the microarray may be exposed to a sample solution of target DNA or RNA molecules (410-413 in
Finally, as shown in
One, two, or more than two data subsets within a data set can be obtained from a single microarray by scanning or reading the microarray for one, two or more than two types of signals. Two or more data subsets can also be obtained by combining data from two different arrays. When optical detection is used to detect fluorescent or chemiluminescent emission from chromophore labels, a first set of signals, or data subset, may be generated by reading the microarray at a first optical wavelength, a second set of signals, or data subset, may be generated by reading the microarray at a second optical wavelength, and additional sets of signals may be generated by detection or reading the microarray at additional optical wavelengths. Different signals may be obtained from a microarray by radiometric detection of radioactive emissions at one, two, or more than two different energy levels. Target molecules may be labeled with either a first chromophore that emits light at a first wavelength, or a second chromophore that emits light at a second wavelength. Following hybridization, the microarray can be read at the first wavelength to detect target molecules, labeled with the first chromophore, hybridized to features of the microarray, and can then be read at the second wavelength to detect target molecules, labeled with the second chromophore, hybridized to the features of the microarray. In one common microarray system, the first chromophore emits light at a near infrared wavelength, and the second chromophore emits light at a yellow visible-light wavelength, although these two chromophores, and corresponding signals, are referred to as “red” and “green.” The data set obtained from reading the microarray at the red wavelength is referred to as the “red signal,” and the data set obtained from reading the microarray at the green wavelength is referred to as the “green signal.” While it is common to use one or two different chromophores, it is possible to use one, three, four, or more than four different chromophores and to read a microarray at one, three, four, or more than four wavelengths to produce one, three, four, or more than four data sets. With the use of quantum-dot dye particles, the emission is tunable by suitable engineering of the quantum-dot dye particles, and a fairly large set of such quantum-dot dye particles can be excited with a single-color, single-laser-based excitation.
Microarray data processing may reveal systematic variation in the different data sets produced for a single microarray or across several microarrays. As one example, intensities obtained from a green-labeled sample may be of larger magnitude, in general, than intensities obtained from a red-labeled sample of the red and green chromophores. The differences in signal intensities may be produced by differing labeling efficiencies, differences in the power of electromagnetic radiation used to excite the different labels, differing amounts of target molecules labeled in the different channels, or spatial biases in ratios across the surface of the microarray. Researchers, microarray designers, and manufacturers of microarrays and microarray data processing systems have therefore recognized a need for a reliable and efficient method for determining a set of dye normalizing probes that can be used to normalize intensity data generated from analysis of microarrays.
Various embodiments of the present invention are directed to methods for determining a set of dye-normalization probes that consistently hybridize to approximately identical numbers of target molecules in a wide range of sample solutions. One embodiment of the method of the present invention generates a set of candidate probe molecules. The set of candidate probe molecules are arrayed on one or more replicate microarrays. Sample solutions are made from one or more tissues of one or more species. Microarray-base hybridization assays are conducted by using the replicate microarrays and different sample solutions. A subset of the candidate probe molecules that are functional for the microarray-base hybridization assays are determined.
FIGS. 2A-B illustrate the hydrogen bonding between the purine and pyrimidine bases of two anti-parallel DNA strands.
FIGS. 8A-B illustrate red-signal intensity to green signal intensity ratio plots.
FIGS. 22A-B show two of many possible dye-normalization probe feature arrangements.
The present invention is directed to various types of synthetic microarray probes that span the entire intensity distribution of any given microarray experiment, consistently producing an intensity log ratio converging to “0” for different labels that hybridize with target molecules of a variety of species and tissues under various hybridization conditions. The following discussion includes two subsections, a first subsection including additional information about molecular arrays, a second subsection including additional information about dye-normalization probes, and a third subsection describing embodiments of the present invention with reference to
A microarray may include any one-, two- or three-dimensional arrangement of addressable regions, or features, each bearing a particular chemical moiety or moieties, such as biopolymers, associated with that region. Any given microarray substrate may carry one, two, or four or more microarrays disposed on a front surface of the substrate. Depending upon the use, any or all of the microarrays may be the same or different from one another and each may contain multiple spots or features. A typical microarray may contain more than ten, more than one hundred, more than one thousand, more ten thousand features, or even more than one hundred thousand features, in an area of less than 20 cm2 or even less than 10 cm2. For example, square features may have widths, or round feature may have diameters, in the range from a 10 μm to 1.0 cm. In other embodiments each feature may have a width or diameter in the range of 1.0 μm to 1.0 mm, usually 5.0 μm to 500 μm, and more usually 10 μm to 200 μm. Features other than round or square may have area ranges equivalent to that of circular features with the foregoing diameter ranges. At least some, or all, of the features may be of different compositions (for example, when any repeats of each feature composition are excluded the remaining features may account for at least 5%, 10%, or 20% of the total number of features). Inter-feature areas are typically, but not necessarily, present. Inter-feature areas generally do not carry probe molecules. Such inter-feature areas typically are present where the microarrays are formed by processes involving drop deposition of reagents, but may not be present when, for example, photolithographic microarray fabrication processes are used. When present, interfeature areas can be of various sizes and configurations.
Each microarray may cover an area of less than 100 cm2, or even less than 50 cm2, 10 cm2 or 1 cm2. In many embodiments, the substrate carrying the one or more microarrays will be shaped generally as a rectangular solid having a length of more than 4 mm and less than 1 m, usually more than 4 mm and less than 600 mm, more usually less than 400 mm; a width of more than 4 mm and less than 1 m, usually less than 500 mm and more usually less than 400 mm; and a thickness of more than 0.01 mm and less than 5.0 mm, usually more than 0.1 mm and less than 2 mm and more usually more than 0.2 and less than 1 mm. Other shapes are possible, as well. With microarrays that are read by detecting fluorescence, the substrate may be of a material that emits low fluorescence upon illumination with the excitation light. Additionally in this situation, the substrate may be relatively transparent to reduce the absorption of the incident illuminating laser light and subsequent heating if the focused laser beam travels too slowly over a region. For example, a substrate may transmit at least 20%, or 50% (or even at least 70%, 90%, or 95%), of the illuminating light incident on the front as may be measured across the entire integrated spectrum of such illuminating light or alternatively at 532 nm or 633 nm.
Microarrays can be fabricated using drop deposition from pulsejets of either polynucleotide precursor units (such as monomers) in the case of in situ fabrication, or the previously obtained polynucleotide. Such methods are described in detail in, for example, U.S. Pat. No. 6,242,266, U.S. Pat. No. 6,232,072, U.S. Pat. No. 6,180,351, U.S. Pat. No. 6,171,797, U.S. Pat. No. 6,323,043, U.S. patent application Ser. No. 09/302,898 filed Apr. 30, 1999 by Caren et al., and the references cited therein. Other drop deposition methods can be used for fabrication, as previously described herein. Also, instead of drop deposition methods, photolithographic microarray fabrication methods may be used. Interfeature areas need not be present particularly when the microarrays are made by photolithographic methods as described in those patents.
A microarray is typically exposed to a sample including labeled target molecules, or, as mentioned above, to a sample including unlabeled target molecules followed by exposure to labeled molecules that bind to unlabeled target molecules bound to the microarray, and the microarray is then read. Reading of the microarray may be accomplished by illuminating the microarray and reading the location and intensity of resulting fluorescence at multiple regions on each feature of the microarray. For example, a scanner may be used for this purpose, which is similar to the AGILENT MICROARRAY SCANNER manufactured by Agilent Technologies, Palo Alto, Calif. Other suitable apparatus and methods are described in published U.S. patent applications 20030160183A1, 20020160369A1, 20040023224A1, and 20040021055A, as well as U.S. Pat. No. 6,406,849. However, microarrays may be read by any other method or apparatus than the foregoing, with other reading methods including other optical techniques, such as detecting chemiluminescent or electroluminescent labels, or electrical techniques, for where each feature is provided with an electrode to detect hybridization at that feature in a manner disclosed in U.S. Pat. No. 6,251,685, and elsewhere.
A result obtained from reading a microarray, followed by application of a method of the present invention, may be used in that form or may be further processed to generate a result such as that obtained by forming conclusions based on the pattern read from the microarray, such as whether or not a particular target sequence may have been present in the sample, or whether or not a pattern indicates a particular condition of an organism from which the sample came. A result of the reading, whether further processed or not, may be forwarded, such as by communication, to a remote location if desired, and received there for further use, such as for further processing. When one item is indicated as being remote from another, this is referenced that the two items are at least in different buildings, and may be at least one mile, ten miles, or at least one hundred miles apart. Communicating information references transmitting the data representing that information as electrical signals over a suitable communication channel, for example, over a private or public network. Forwarding an item refers to any means of getting the item from one location to the next, whether by physically tran-sporting that item or, in the case of data, physically transporting a medium carrying the data or communicating the data.
As pointed out above, microarray-based assays can involve other types of biopolymers, synthetic polymers, and other types of chemical entities. A biopolymer is a polymer of one or more types of repeating units. Biopolymers are typically found in biological systems and particularly include polysaccharides, peptides, and polynucleotides, as well as their analogs such as those compounds composed of, or containing, amino acid analogs or non-amino-acid groups, or nucleotide analogs or non-nucleotide groups. This includes polynucleotides in which the conventional backbone has been replaced with a non-naturally occurring or synthetic backbone, and nucleic acids, or synthetic or naturally occurring nucleic-acid analogs, in which one or more of the conventional bases has been replaced with a natural or synthetic group capable of participating in Watson-Crick-type hydrogen bonding interactions. Polynucleotides include single or multiple-stranded configurations, where one or more of the strands may or may not be completely aligned with another. For example, a biopolymer includes DNA, RNA, oligonucleotides, and PNA and other polynucleotides as described in U.S. Pat. No. 5,948,902 and references cited therein, regardless of the source. An oligonucleotide is a nucleotide multimer of about 10 to 100 nucleotides in length, while a polynucleotide includes a nucleotide multimer having any number of nucleotides.
As an example of a non-nucleic-acid-based microarray, protein antibodies may be attached to features of the microarray that would bind to soluble labeled antigens in a sample solution. Many other types of chemical assays may be facilitated by microarray technologies. For example, polysaccharides, glycoproteins, synthetic copolymers, including block copolymers, biopolymer-like polymers with synthetic or derivitized monomers or monomer linkages, and many other types of chemical or biochemical entities may serve as probe and target molecules for microarray-based analysis. A fundamental principle upon which microarrays are based is that of specific recognition, by probe molecules affixed to the microarray, of target molecules, whether by sequence-mediated binding affinities, binding affinities based on conformational or topological properties of probe and target molecules, or binding affinities based on spatial distribution of electrical charge on the surfaces of target and probe molecules.
Scanning of a microarray by an optical scanning device or radiometric scanning device generally produces an image comprising a rectilinear grid of pixels, with each pixel having a corresponding signal intensity. These signal intensities are processed by a microarray-data-processing program that analyzes data scanned from an microarray to produce experimental or diagnostic results which are stored in a computer-readable medium, transferred to an intercommunicating entity via electronic signals, printed in a human-readable format, or otherwise made available for further use. Microarray experiments can indicate precise gene-expression responses of organisms to drugs, other chemical and biological substances, environmental factors, and other effects. Microarray experiments can also be used to diagnose disease, for gene sequencing, and for analytical chemistry. Processing of microarray data can produce detailed chemical and biological analyses, disease diagnoses, and other information that can be stored in a computer-readable medium, transferred to an intercommunicating entity via electronic signals, printed in a human-readable format, or otherwise made available for further use.
Multiple data sets may be obtained from a single microarray, and multiple microarrays can generate multiple data sets. These data sets have different meanings, depending on the different types of experiments in which the microarrays are exposed to target-molecule-containing solutions. Frequently, data sets read from multiple microarrays are experimentally related, and data sets read at different optical frequencies from a single microarray are commonly related to one another. However, in order to meaningfully analyze and compare multiple data sets, the multiple data sets need to be normalized with respect to one another.
FIGS. 8A-B illustrate red-channel-to-green-channel-ratio plots for two hypothetical microarray data sets. The hypothetical microarray data sets are obtained from two hypothetical microarray-based assays conducted with sample solutions composed of a variety of different target-molecule pairs, each pair composed of red-labeled and green-labeled target molecules having identical nucleotide sequences and concentrations. In the plots, the horizontal axes, such as horizontal axis 801, correspond to the green-signal intensities, and the vertical axes, such as vertical axis 802, correspond to the red-signal intensities. In FIGS. 8A-B, each data point, such as data point 803, corresponds to the ratio of red-signal intensity to green-signal intensity for a particular feature of a hypothetical microarray data set. The central tendency of the hypothetical data points plotted in
In general, dye-normalization probes are utilized in an attempt to normalize signal intensities, such as the systematic variation shown in
One of many possible embodiments of the present invention is directed to a method for determining a set of dye-normalization probes that consistently hybridize to approximately the same number of target molecules in a wide range of sample solutions and provide signal intensities that span most or all intensities of the entire intensity range of any microarray data set. An initial step of the method of the present invention is to generate a set of candidate probe molecules. A typical microarray probe can be notationally represented as:
[NS]n-X-surface Equation (1):
In general, target molecules have complex nucleotide sequences. In other words, the target-molecule nucleotide sequence generally lacks discernable sequence patterns, and has relatively high information content, or high entropy. If the nucleotide sequence of a specific target molecule has already been determined, a probe can be designed for hybridization with a specific target molecule by synthesizing a complementary, complex nucleotide sequence [NS]n. A probe designed to hybridize with a specific target molecule is unlikely to hybridize with other target molecules present in the sample solution, due to the low probability of a high entropy sequence of length about 8 or more occurring in two different target molecules. By contrast, the probe-design method of the present invention determines a set of candidate probe molecules that are likely to hybridize non-specifically with a wide variety of target molecules. The set of candidate probe molecules obtained by the embodiments of the present invention contains probes having low-complexity, low-entropy nucleotide sequences [NS]n.
Shorter sequences of homopolymers can be bound together to generate additional kinds of low-complexity nucleotides sequences [NS]n. For example, homopolymer sequences can be combined to give the following nucleotide sequence:
[NS]n=[A]i[C]j[T]k[G]l Equation (2):
The nucleotide sequence [NS]n may be composed of repeating homopolymer subsequences:
[NS]n=[C]i[T]j[C]k[G]l Equation (3):
The low-complexity nucleotide sequences [NS]n may also be composed of repeated subsequences, such as the following:
In addition to varying the nucleotide sequence, as described above with reference to
The microarray feature signal intensity can be modulated by varying the GC content of the nucleotide sequence [NS]n. The higher the GC content, the more tightly the nucleotide sequences [NS]n, will hybridize to non-specific target molecules in the sample solution. The set of candidate probe molecules can be expanded to include other low-complexity probes, such as low-complexity probes selected from Agilent's Human 1A Probe Selection Probe Database and probes synthesized from rat and mouse tissues, using the methods described in pending Agilent U.S. patent application Ser. No. 10/303,160 entitled “Methods for Identifying Suitable Nucleic Acid Normalization Probe Sequences for Use in Nucleic Acid Arrays,” filed Oct. 14, 2003, and Agilent U.S. patent application Ser. No. 10/686,092, entitled “Methods for Identifying Suitable Nucleic Acid Probe Sequences for Use in Nucleic Acid Arrays,” filed Nov. 22, 2003, which are incorporated by reference.
Subsequent steps of one method of the present invention identify “functional” candidate probe molecules. Functional candidate probe molecules consistently span the signal intensity range of a microarray, have a log ratio of approximately “0,” and hybridize with target molecules synthesized from different tissues of various species under a variety of hybridization conditions. Functional candidate probe molecules are determined by arraying a large number of candidate probe molecules on microarrays and conducting microarray-based hybridization assays with sample solutions having two or more different target molecules.
In an initial step, a microarray feature arrangement having from about 10,000 to about 22,000 or more different candidate probe molecules is designed. Typically, the microarray features are separated into different groups of one or more features, each group of one or more features having identical, candidate probe molecules. The one or more features having identical, candidate probe molecules are referred to as “replicate features.”
Next, a number of microarray-based hybridization assays are conducted using sets of two or more identical microarrays, each of which have identical arrangements of replicate features. The two or more identical microarrays are referred to as “replicate microarrays.”
The sample solutions used in the microarray-based hybridization assays are prepared by first selecting two or more species, and then selecting two or more tissues from each species. Table 1 displays a hypothetical set of ten possible species and a number of tissues used to determine the functionality of candidate probe molecules:
In Table 1, 10 different tissues are selected for the species “Human,” “Mouse,” and “Rat,” and 2 different tissues are selected for the remaining species listed. For example, the two different tissues selected for the species “Rice” may be the bran and grain tissues. Note that the present invention is not limited to the particular species nor to the number of species displayed in Table 1. In alternate embodiments, the number of different species may range from about 2 to about 20 or more, and the number of tissues selected for each species may range from about 2 to about 20 or more.
Next, target molecules for each sample solution are isolated from the nucleic acid molecules of each tissue. The target molecules can be either cDNA or amplified RNA copies of all expressed mRNA molecules in a given tissue. The target molecules synthesized from different tissues of a species are grouped in pairs called “target-molecule pairs.” Table 2 displays one of many possible target-molecule-pair combinations for the species “Human,” listed above in Table 1:
In Table 2, target-molecule pair 1 is composed of target molecules isolated from lung and heart tissues. Note that the present invention is not limited to any particular set of tissues for determining target molecule nucleotide sequences. In alternate embodiments, an entirely different set of tissues can be selected. Note further that the present invention is not limited to the particular target-molecule pairs displayed in Table 2. For a species with 10 different tissues, such as the Human species, there are 45 possible target-molecule pair combinations. For example, target molecules extracted from lung tissue can be paired with target molecules extracted from liver tissue. The third column of Table 2 identifies the labels assigned to all target molecules of a particular tissue.
Next, for each target-molecule pair of each species, a separate sample solution is prepared.
When microarrays are exposed to a sample solution, target molecules are allowed to hybridize through nucleotide pairing interactions with complementary sequences of candidate probes bound to the surface of the microarray.
The replicate microarrays are then read and the image data analyzed to determine those candidate probes that are functional across tissue and species. One of many possible means for analyzing the functionality of candidate probes is to plot the intensity log ratio versus red and green signal intensity. The log ratio for each target-molecule pair experiment is computed according to the following expression:
Next, the candidate probe molecules that satisfy the tolerance interval requirements described above with reference to
In order to determine candidate probe molecules that are suitable for a variety of hybridization conditions, the sample solution conditions, such as temperature, acidity, alkalinity, and salinity, may be varied for hybridization assays having one or more identical sample solutions. Each condition can varied without variation of the other conditions. For example, in order to determine which candidate probe molecules are functional for a variety of hybridization temperatures, candidate probe molecules are tested by hybridizing identical sample solutions at different hybridization temperatures, such as 50°, 55°, 60°, 65° and 70° Celsius. Moreover, combinations of the conditions can be varied, such as varying the temperature and acidity.
The surviving set of candidate probe molecules that satisfy the tolerance requirements, as described above with reference to
Five separate sample solutions composed of tissues pairs isolated from each of the three species “Human,” “Mouse,” and “Rat” are prepared. For example, the hypothetical target-molecule pairs, described above with reference to Table 2, can be isolated for each species. The five different samples solutions composed of target-molecule pairs are spiked with synthetic targets that are complementary to the eQC probes, for which different expression results have already been determined. The synthetic target molecules are referred to as “eQC target molecules.”
After the 8-pack microarray-based hybridization assays are completed, the data is examined using Agilent's Feature Extraction software described in detail in U.S. Pat. No. 6,591,196, entitled “Method and System for Extracting Data from Surface Array Deposited Features,” filed Jun. 6, 2000, which is incorporated by reference. For each 8-pack microarray, approximately 300 dye-normalization probes are used to normalize the intensity data using the “Norm file editor” method in Agilent's Feature Extraction software. The log ratio results to be derived from each eQC probe are known. The effectiveness of the dye-normalization probes in normalizing 8-pack microarray data is indicated by the accuracy of the differential expression values generated from the eQC probes. The data normalized using the 300 dye-normalization probes is compared to data from identical microarrays that have been normalized using Agilent's standard rank consistency dye-normalization method described in U.S. Patent Application No.: U.S. 2003/0215807, entitled “Method and System for Normalization of Microarray Data Based on Local Normalization of Rank-Ordered, Globally Normalized Data,” filed May 9, 2002, which is incorporated by reference.
A subset of the set of dye-normalization probes can be used to normalize the signal data from a variety of microarray experiments by dedicating approximately 10% of the features of a microarray to dye-normalization probes. FIGS. 22A-B show two of many possible dye-normalization probe feature arrangements for two hypothetical microarrays. In FIGS. 22A-B, the dye normalization probes are identified by shaded square features, such as features 2202 and 2204, respectively. In
Although the present invention has been described in terms of a particular embodiment, it is not intended that the invention be limited to this embodiment. Modifications within the spirit of the invention will be apparent to those skilled in the art. For example, an almost limitless number of different implementations of the many possible embodiments of the method of the present invention can be performed. In alternate embodiments, features in alternative types of molecular arrays may be arranged to cover the surface of the molecular array at higher densities, such as offsetting the features in adjacent rows in order to produce a more densely packed feature arrangement. In alternate embodiments, one, three, four or more tissues can be used in an experiment to determine functional candidate probes that span tissues of a single species. In alternate embodiments, the number of tissues pairs selected from a single species can range from about 2 to 16 or 20 or more different tissues, and can include diseased tissues, such as leukemia, HeLa, MG63, and K-562 cells. In an alternate embodiment, the steps used to determine the set of dye-normalization probes described above with reference to
The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that the specific details are not required in order to practice the invention. The foregoing description of specific embodiments of the present invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously many modifications and variations are possible in view of the above teachings. The embodiments are shown and described in order to best explain the of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents: