Embodiments of the present invention are related to data extraction from images of microarrays and, in particular, to a general method and system for rectilinearizing a double-density microarray having a non-rectilinear, outermost, feature-position arrangement.
The present invention is related to microarrays. In order to facilitate discussion of the present invention, a general background for microarrays and examples of their use is provided below. In the following discussion, the terms “microarray,” “molecular array,” and “array” are used interchangeably. The terms “microarray” and “molecular array” are well known and well understood in the scientific community. As discussed below, a microarray is a precisely manufactured tool which may be used in research, diagnostic testing, or various other analytical techniques.
Microarray technologies have gained prominence in biological research and in diagnostics. Currently, microarray techniques are most often used to determine the concentrations of particular nucleic-acid polymers in complex sample solutions. Microarray-based analytical techniques are not, however, restricted to analysis of nucleic acid solutions, but may be employed to analyze complex solutions of any type of molecule that can be optically or radiometrically scanned and that can bind with high specificity to complementary molecules synthesized within, or bound to, discrete features on the surface of a microarray. Because microarrays are widely used for analysis of nucleic acid samples, the following background information on microarrays is introduced in the context of analysis of nucleic acid solutions following a brief background of nucleic acid chemistry.
Deoxyribonucleic acid (“DNA”) and ribonucleic acid (“RNA”) are linear polymers, each synthesized from four different types of subunit molecules.
The DNA polymers that contain the organization information for living organisms occur in the nuclei of cells in pairs, forming double-stranded DNA helixes. One polymer of the pair is laid out in a 5′ to 3′ direction, and the other polymer of the pair is laid out in a 3′ to 5′ direction, or, in other words, the two strands are anti-parallel. The two DNA polymers, or strands, within a double-stranded DNA helix are bound to each other through attractive forces including hydrophobic interactions between stacked purine and pyrimidine bases and hydrogen bonding between purine and pyrimidine bases, the attractive forces emphasized by conformational constraints of DNA polymers. Because of a number of chemical and topographic constraints, double-stranded DNA helices are most stable when deoxy-adenylate subunits of one strand hydrogen bond to deoxy-thymidylate subunits of the other strand, and deoxy-guanylate subunits of one strand hydrogen bond to corresponding deoxy-cytidilate subunits of the other strand. FIGS. 2A-B illustrates the hydrogen bonding between the purine and pyrimidine bases of two anti-parallel DNA strands. AT and GC base pairs, illustrated in FIGS. 2A-B, are known as Watson-Crick (“WC”) base pairs. Two DNA strands linked together by hydrogen bonds forms the familiar helix structure of a double-stranded DNA helix.
Double-stranded DNA may be denatured, or converted into single stranded DNA, by changing the ionic strength of the solution containing the double-stranded DNA or by raising the temperature of the solution. Single-stranded DNA polymers may be renatured, or converted back into DNA duplexes, by reversing the denaturing conditions, for example by lowering the temperature of the solution containing complementary single-stranded DNA polymers. During renaturing or hybridization, complementary bases of anti-parallel DNA strands form WC base pairs in a cooperative fashion, leading to reannealing of the DNA duplex.
The ability to denature and renature double-stranded DNA has led to the development of many extremely powerful and discriminating assay technologies for identifying the presence of DNA and RNA polymers having particular base sequences or containing particular base subsequences within complex mixtures of different nucleic acid polymers, other biopolymers, and inorganic and organic chemical compounds.
Once a microarray has been prepared, the microarray may be exposed to a sample solution of target DNA or RNA molecules (410-413 in
As shown in
When a microarray is scanned or otherwise analyzed, data may be collected as a two-dimensional digital image of the microarray, each pixel of which represents the intensity of phosphorescent, fluorescent, chemiluminescent, or radioactive emission from an area of the microarray corresponding to the pixel. A microarray data set may comprise a two-dimensional image or a list of numerical or alphanumerical pixel intensities, or any of many other computer-readable data sets. An initial series of steps employed in processing digital microarray images includes constructing a regular coordinate system for the digital image of the microarray by which the features within the digital image of the microarray can be indexed and located. For example, when the features are laid out in a periodic, rectilinear pattern, a rectilinear coordinate system is commonly constructed so that the positions of the centers of features lie as closely as possible to intersections between horizontal and vertical gridlines of the rectilinear coordinate system, alternatively, exactly half-way between a pair of adjacent horizontal and a pair of adjacent vertical grid lines. Then, regions of interest (“ROIs”) are computed, based on the initially estimated positions of the features in the coordinate grid, and centroids for the ROIs are computed in order to refine the positions of the features. Once the position of a feature is refined, feature pixels can be differentiated from background pixels within the ROI, and the signal corresponding to the feature can then be computed by integrating the intensity over the feature pixels.
Scanning of a microarray by an optical scanning device or radiometric scanning device generally produces an image comprising a rectilinear grid of pixels, with each pixel having a corresponding signal intensity. These signal intensities are processed by an microarray-data-processing program that analyzes data scanned from an microarray to produce experimental or diagnostic results which are stored in a computer-readable medium, transferred to an intercommunicating entity via electronic signals, printed in a human-readable format, or otherwise made available for further use. Microarray experiments can indicate precise gene-expression responses of organisms to drugs, other chemical and biological substances, environmental factors, and other effects. Microarray experiments can also be used to diagnose disease, for gene sequencing, and for analytical chemistry. Processing of microarray data can produce detailed chemical and biological analyses, disease diagnoses, and other information that can be stored in a computer-readable medium, transferred to an intercommunicating entity via electronic signals, printed in a human-readable format, or otherwise made available for further use.
The microarrays illustrated in
Many currently available programs that extract feature information from scanned or otherwise analyzed images of microarrays rely on a rectilinear, outermost, feature-position arrangement in order to generate an initial, or seed, location for each feature. Attempts to extract data from microarrays with non-rectilinear, outermost feature boundaries by current feature extraction programs often result in errors. Therefore, designers, manufacturers, and users of microarrays have recognized a need for a method and system to enable extraction of data from microarrays having non-rectilinear, outermost feature boundaries.
One embodiment of the present invention comprises a method and system for rectilinearizing a double-density, non-rectilinear microarray of features within a scanned or otherwise analyzed image of a microarray. A feature-coordinate grid of horizontal and vertical grid lines is superimposed over the microarray image so that the center of each feature of the microarray image coincides with a unique intersection of a horizontal and vertical grid line. Three corner features are selected and indexed. The coordinates of the three selected corner features are used to determine three feature positions defining three corners of a rectilinear, outermost, feature-position arrangement of the non-rectilinear microarray of features. A fourth feature position of the rectilinear, outermost, feature-position arrangement is determined from two of the three feature positions.
FIGS. 2A-B illustrate the hydrogen bonding between the purine and pyrimidine bases of two anti-parallel DNA strands.
FIGS. 8A-B illustrate two types of arrangements of densely-packed, disk- shaped microarray features.
The present invention is directed toward a method and system for rectilinearizing a scanned or otherwise analyzed image of a non-rectilinear grid of microarray features.
The following discussion includes two subsections, a first subsection including additional information about molecular arrays, and a second subsection describing embodiments of the present invention with reference to
An array may include any one-, two- or three-dimensional arrangement of addressable regions, or features, each bearing a particular chemical moiety or moieties, such as biopolymers, associated with that region. Any given array substrate may carry one, two, or four or more arrays disposed on a front surface of the substrate. Depending upon the use, any or all of the arrays may be the same or different from one another and each may contain multiple spots or features. A typical array may contain more than ten, more than one hundred, more than one thousand, more ten thousand features, or even more than one hundred thousand features, in an area of less than 20 cm2 or even less than 10 cm2. For example, square features may have widths, or round feature may have diameters, in the range from a 10 μm to 1.0 cm. In other embodiments each feature may have a width or diameter in the range of 1.0 μm to 1.0 mm, usually 5.0 μm to 500 μm, and more usually 10 μm to 200 μm. Features other than round or square may have area ranges equivalent to that of circular features with the foregoing diameter ranges. At least some, or all, of the features may be of different compositions (for example, when any repeats of each feature composition are excluded the remaining features may account for at least 5%, 10%, or 20% of the total number of features). Inter-feature areas are typically, but not necessarily, present. Inter-feature areas generally do not carry probe molecules. Such inter-feature areas typically are present where the arrays are formed by processes involving drop deposition of reagents, but may not be present when, for example, photolithographic array fabrication processes are used. When present, interfeature areas can be of various sizes and configurations.
Each array may cover an area of less than 100 cm2, or even less than 50 cm2, 10 cm2 or 1 cm2. In many embodiments, the substrate carrying the one or more arrays will be shaped generally as a rectangular solid having a length of more than 4 mm and less than 1 m, usually more than 4 mm and less than 600 mm, more usually less than 400 mm; a width of more than 4 mm and less than 1 m, usually less than 500 mm and more usually less than 400 mm; and a thickness of more than 0.01 mm and less than 5.0 mm, usually more than 0.1 mm and less than 2 mm and more usually more than 0.2 and less than 1 mm. Other shapes are possible, as well. With arrays that are read by detecting fluorescence, the substrate may be of a material that emits low fluorescence upon illumination with the excitation light. Additionally in this situation, the substrate may be relatively transparent to reduce the absorption of the incident illuminating laser light and subsequent heating if the focused laser beam travels too slowly over a region. For example, a substrate may transmit at least 20%, or 50% (or even at least 70%, 90%, or 95%), of the illuminating light incident on the front as may be measured across the entire integrated spectrum of such illuminating light or alternatively at 532 nm or 633 nm.
Arrays can be fabricated using drop deposition from pulsejets of either polynucleotide precursor units (such as monomers) in the case of in situ fabrication, or the previously obtained polynucleotide. Such methods are described in detail in, for example, U.S. Pat. No. 6,242,266, U.S. Pat. No. 6,232,072, U.S. Pat. No. 6,180,351, U.S. Pat. No. 6,171,797, U.S. Pat. No. 6,323,043, U.S. patent application Ser. No. 09/302,898 filed Apr. 30, 1999 by Caren et al., and the references cited therein. Other drop deposition methods can be used for fabrication, as previously described herein. Also, instead of drop deposition methods, photolithographic array fabrication methods may be used. Interfeature areas need not be present particularly when the arrays are made by photolithographic methods as described in those patents.
A molecular array is typically exposed to a sample including labeled target molecules, or, as mentioned above, to a sample including unlabeled target molecules followed by exposure to labeled molecules that bind to unlabeled target molecules bound to the array, and the array is then read. Reading of the array may be accomplished by illuminating the array and reading the location and intensity of resulting fluorescence at multiple regions on each feature of the array. For example, a scanner may be used for this purpose, which is similar to the AGILENT MICROARRAY SCANNER manufactured by Agilent Technologies, Palo Alto, Calif. Other suitable apparatus and methods are described in published U.S. patent applications 20030160183A1, 20020160369A1, 20040023224A1, and 20040021055A, as well as U.S. Pat. No. 6,406,849. However, arrays may be read by any other method or apparatus than the foregoing, with other reading methods including other optical techniques, such as detecting chemiluminescent or electroluminescent labels, or electrical techniques, for where each feature is provided with an electrode to detect hybridization at that feature in a manner disclosed in U.S. Pat. No. 6,251,685, and elsewhere.
A result obtained from reading an array, followed by application of a method of the present invention, may be used in that form or may be further processed to generate a result such as that obtained by forming conclusions based on the pattern read from the array, such as whether or not a particular target sequence may have been present in the sample, or whether or not a pattern indicates a particular condition of an organism from which the sample came. A result of the reading, whether further processed or not, may be forwarded, such as by communication, to a remote location if desired, and received there for further use, such as for further processing. When one item is indicated as being remote from another, this is referenced that the two items are at least in different buildings, and may be at least one mile, ten miles, or at least one hundred miles apart. Communicating information references transmitting the data representing that information as electrical signals over a suitable communication channel, for example, over a private or public network. Forwarding an item refers to any means of getting the item from one location to the next, whether by physically transporting that item or, in the case of data, physically transporting a medium carrying the data or communicating the data.
As pointed out above, array-based assays can involve other types of biopolymers, synthetic polymers, and other types of chemical entities. A biopolymer is a polymer of one or more types of repeating units. Biopolymers are typically found in biological systems and particularly include polysaccharides, peptides, and polynucleotides, as well as their analogs such as those compounds composed of, or containing, amino acid analogs or non-amino-acid groups, or nucleotide analogs or non-nucleotide groups. This includes polynucleotides in which the conventional backbone has been replaced with a non-naturally occurring or synthetic backbone, and nucleic acids, or synthetic or naturally occurring nucleic-acid analogs, in which one or more of the conventional bases has been replaced with a natural or synthetic group capable of participating in Watson-Crick-type hydrogen bonding interactions. Polynucleotides include single or multiple-stranded configurations, where one or more of the strands may or may not be completely aligned with another. For example, a biopolymer includes DNA, RNA, oligonucleotides, and PNA and other polynucleotides as described in U.S. Pat. No. 5,948,902 and references cited therein, regardless of the source. An oligonucleotide is a nucleotide multimer of about 10 to 100 nucleotides in length, while a polynucleotide includes a nucleotide multimer having any number of nucleotides.
As an example of a non-nucleic-acid-based molecular array, protein antibodies may be attached to features of the array that would bind to soluble labeled antigens in a sample solution. Many other types of chemical assays may be facilitated by array technologies. For example, polysaccharides, glycoproteins, synthetic copolymers, including block copolymers, biopolymer-like polymers with synthetic or derivitized monomers or monomer linkages, and many other types of chemical or biochemical entities may serve as probe and target molecules for array-based analysis. A fundamental principle upon which arrays are based is that of specific recognition, by probe molecules affixed to the array, of target molecules, whether by sequence-mediated binding affinities, binding affinities based on conformational or topological properties of probe and target molecules, or binding affinities based on spatial distribution of electrical charge on the surfaces of target and probe molecules.
Scanning of a molecular array by an optical scanning device or radiometric scanning device generally produces an image comprising a rectilinear grid of pixels, with each pixel having a corresponding signal intensity. These signal intensities are processed by an array-data-processing program that analyzes data scanned from an array to produce experimental or diagnostic results which are stored in a computer-readable medium, transferred to an intercommunicating entity via electronic signals, printed in a human-readable format, or otherwise made available for further use. Molecular array experiments can indicate precise gene-expression responses of organisms to drugs, other chemical and biological substances, environmental factors, and other effects. Molecular array experiments can also be used to diagnose disease, for gene sequencing, and for analytical chemistry. Processing of molecular-array data can produce detailed chemical and biological analyses, disease diagnoses, and other information that can be stored in a computer-readable medium, transferred to an intercommunicating entity via electronic signals, printed in a human-readable format, or otherwise made available for further use.
One embodiment of the method of the present invention is described by applying the method to an example hypothetical microarray having a double-density, non-rectilinear, outermost, feature-position arrangement.
Three features of the microarray are selected from the corner features of two rectilinear lattices of the hypothetical microarray 1002.
After the three corner features have been selected, the selected corner features are indexed 1, 2, 3, or 4. In the present example, the indexes are assigned in a clockwise manner beginning with assigning the index value 1 to the selected corner feature located in the top, left-hand corner the microarray 1002, as denoted by pair 1111, and ending with assigning the index value 4 to the selected corner feature located in the bottom, left-hand corner of the microarray, as denoted by pair 1114. The selected corner features are referred to as “FCorner
In order to rectilinearize a microarray having a non-rectilinear, outermost, feature-position arrangement, a feature-coordinate grid identifying the centroid coordinates of each feature of the microarray is needed.
Next, the x and y coordinates of each of the three selected corner features FCorner
L1CCorner
L1CCorner
L2CCorner
L2CCorner
where L1CCorner
Next, the selected corner features FCorner
(1) For selected corner feature F1, if each of the points L1C1 and L2C1 coincide with features of the microarray, then selected corner feature F1 is a corner feature of lattice L2. If L1C1 does not coincide with a feature of the microarray, and L2C1 coincides with a feature of the microarray, then selected corner feature F1 is a corner feature of lattice L1.
(2) For selected corner feature F2, if the point L1C2 coincides with a feature of the microarray, and L2C2 does not coincides with any features of the microarray, then selected corner feature F2 is a corner feature of lattice L2. If L1C2 does not coincide with a feature of the microarray, and L2C2 coincides with a feature of the microarray, then selected corner feature F2 is a corner feature of lattice L1.
(3) For selected corner feature F3, if each of the points L1C3 and L2C3 coincide with features of the microarray, then selected corner feature F3 is a corner feature of lattice L1. If L1C3 coincides with a feature of the microarray, and L2C3 does not coincides with a feature of the microarray, then selected corner feature F3 is a corner feature of lattice L2.
(4) For selected corner feature F4, if the point L1C4 coincides with a feature of the microarray grid, and L2C4 does not coincides with any features of the microarray, then selected corner feature F4 is a corner feature of lattice L2. If L1C4 does not coincide with a feature of the microarray, and L2C4 coincides with a feature of the microarray, then selected corner feature F4 is a corner feature of lattice L1.
Next, the coordinates of three of the four feature positions defining the four corner features of the rectilinear, outermost, feature-position arrangement of the microarray of features are determined. The feature positions defining the four corner features of a derived, outermost, rectilinear arrangement are referred to as “PCorner
(1) If selected corner feature F1 is a corner feature of lattice L1, then
P1
P1
(2) If selected corner feature F1 is a corner feature of lattice L2, then
P1
P1
(3) If selected corner feature F2 is a corner feature of lattice L1, then
P2
P2
(4) If selected corner feature F2 is a corner feature of lattice L2, then
P2
P2
(5) If selected corner feature F3 is a corner feature of lattice L1, then
P3
P3
(6) If selected corner feature F3 is a corner feature of lattice L2, then
P3
P3
(7) If selected corner feature F4 is a corner feature of lattice L1, then
P4
P4
(8) If selected corner feature F4 is a corner feature of lattice L2, then
P4
P4
The three feature positions P1 1601, P2 1602, and P4 1604 define two vectors v12 and v14 given by:
v12=P2−P1=<7,0>
v14=4−P1=<0,5>
where the brackets “<>” represent vector coordinates.
v12·v14=<7,0>·<0,5>=0
Because a vector is a directed line segment having an initial point and an ending point, the line segments associated with the two vectors v12 1701 and v14 1702 form two sides of the derived, rectilinear, outermost, feature-position arrangement of the microarray.
The fourth feature position completing the derived, rectilinear, outermost, feature-position arrangement of the microarray is determined according to the following four conditions:
(1) If the feature positions P1, P2, and P3 are known, then
P4
P4
(2) If the feature positions P2, P3, and P4 are known, then
P1
P1
(3) If the feature positions P1, P3, and P4 are known, then
P2
P2
(4) If the feature positions P1, P2, and P4 are known, then
P3
P3
In
v23=P3−P2=<0,5>
v43=P3−P4=<7,0>
where v12·v43=0
The vector v23 1704 is orthogonal to the vector v12 1701, and the vector v43 1705 is orthogonal to the vector v14 1702. The line segments associated with the vectors v23 1704 and V43 1705 provide the two remaining sides of a rectilinear, outermost, feature-position arrangement of the microarray 1002. Finally, the two feature positions P2 1602 and P4 1604 are added as imaginary features to the non-rectilinear, outermost, feature-position arrangement to give a rectilinear, outermost, feature-position arrangement of the hypothetical microarray 1002.
The C-like pseudo code implementation showing an embodiment of the present invention is provided below. Note that the pseudo code implementation is not intended to describe a complete rectilinearization program for the microarray feature data, but to provide sufficient detail to illustrate one possible embodiment of the rectilinearization methodology as the embodiment might occur within a microarray feature extraction program or in microarray feature extraction and data processing equipment. The rectilinearization program utilizes a feature-coordinate grid similar to that described above in relation to
First, the pseudocode implementation includes several constants:
Next, the pseudocode implementation includes the structure “xy_coord” provided below:
The variables “x” and “y” are the rectangular Cartesian x and y coordinates used to describe coordinate locations in the feature-coordinate grid as described above in relation to
Next, the pseudocode implementation contains the two function prototypes:
The functions “corner_vertices” are “fourth_point” are identified as void functions. The function “corner_vertices” expects the structure pointers “f” and “p” as arguments. The function “fourth_point” expects the structure pointer “p” and the pointer “ci” as arguments.
The function “corner_vertices” contains the code relevant to one embodiment of the present invention. An implementation of the function “corner_vertices” is provided below:
The function “corner_vertices” takes the following arguments: (1) “f,” a pointer to the structure elements FCorner
The function “fourth_point” contains the code relevant to one embodiment of the present invention. An implementation of the function “fourth_Point” is provided below:
The function “fourth_point” takes the following arguments: (1) “p,” a pointer to the structure elements PCorner
Although the present invention has been described in terms of a particular embodiment, it is not intended that the invention be limited to this embodiment. Modifications within the spirit of the invention will be apparent to those skilled in the art. For example, an almost limitless number of different implementations of the many possible embodiments of the method of the present invention can be written in any of many different programming languages, embodied in firmware, embodied in hardware circuitry, or embodied in a combination of one or more of the firmware, hardware, or software, for inclusion in microarray data processing equipment employing a computational processing engine to execute software or firmware instructions encoding techniques of the present invention or including logic circuits that embody both a processing engine and instructions. In various embodiments of the present invention, the process of selecting corner features can be performed automatically by a computer program, or the corner features can be selected manually. In various embodiments, the selected corner features can be ordered in various ways such as in a clockwise or counter-clockwise manner beginning with any one of the four corners of the microarray. In various embodiments, different orderings of the corner features may be employed. In various embodiments, different possible selected corner features may be considered.
The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that the specific details are not required in order to practice the invention. The foregoing description of specific embodiments of the present invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously many modifications and variations are possible in view of the above teachings. The embodiments are shown and described in order to best explain the of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents: