The present invention relates generally to the field of molecular biology and particularly to the artificial synthesis of long DNA fragments including fragments encompassing a gene or multiple genes.
Significant efforts have been made to synthesize genes from oligonucleotides, with the assembly of viral and bacteriophage genomes being reported. See, e.g., J. Cello, et al., Science, 297, 2002, pp. 1016-1018; H. O. Smith, et al., Proc. Natl. Acad. Sci. USA, 100, 2003, pp. 15440-15445. Assembly of these long sequences required the use of hundreds of commercially synthesized and gel-purified olignucleotides. Thus, such approaches are not economically feasible for the routine synthesis of genes for research and clinical purposes.
Over the last decade, techniques have been developed for the synthesis of DNA (deoxyribonucleic acid) on solid substrates for use in genetics studies, particularly for hybridization experiments with microarrays. These developments have included systems to carry out precision patterning and fluorescence analysis. See, e.g., P. B. Garland, et al., Nucleic Acids Res., 30, 2002, pp. e99, et seq: A. Relogio, et al., Nucleic Acids Res., 30, 2002, pp. e51 et seq. DNA “chips” formed in this manner offer the potential for acquiring a large number of user-defined DNA oligonucleotide sequences for subsequent use in biological applications. Although oligonucleotides grown on slide surfaces have been extensively employed in this manner, there remains some uncertainty concerning the amount and relative proportion of failure sequences on the chip surface. Previous studies have estimated that a total of about 10 to 30 pmol/cm2 of oligonucleotides are synthesized on the chip surface. G. McGall, et al., J. Am. Chem. Soc., 119, 1997, pp. 5081-5090; E. LeProust, et al., Nucleic Acids Res., 29, 2001, pp. 2171-2180. However, it is not clear whether this estimate represents the population of full-length product or a mixture of full-length and truncated or mutated sequences. In studies using photogenerated acids during DNA synthesis, it has been postulated that proximity to the synthesis surface led to lower fidelity, and that this decrease is due to inefficient reactions of various reagents. It is unclear, however, whether such surface effects occur in photolithographic procedures using photolabile 2-nitrophenyl propoxycarbonyl (NPPOC) photodeprotection-based DNA synthesis.
Historically, scientists have made use of gene synthesis to produce those genes recalcitrant to cloning due to high organismal A-T or G-C content or to modify genes for optimal protein expression and heterologous hosts. Such expression targets are generally less than three thousand bp (base pairs) in length. Gene synthesis has also been utilized to create larger assemblages (e.g., 7-8 kb) but the conventional techniques used have often required very long lengths of time (e.g., months) to obtain the final product. J. Cello, supra.
New techniques have been developed for the assembly of genes, including ligase-chain reaction (LCR) and suites of polymerase chain reaction (PCR) strategies. While most gene assembly protocols start with pools of overlapping synthesized oligonucleotides, and end with PCR amplification of the assembled gene, the pathway between those two points can be quite different. In the case of LCR, the initial oligonucleotide population is required to have phosphorylated 5 ends that allow Pfu DNA ligase to covalently connect these building blocks together to form the initial template. Single stranded (ss) PCR assembly, however, makes use of unphosphorylated oligonucleotides, which undergo repetitive PCR cycling to extend and create a fill length template. A variant of this method, termed double stranded (ds) PCR involves combining all single stranded PCR oligonucleotides and their reverse complement oligonucleotides for assembly. Additionally, the LCR process requires oligonucleotide concentrations in the μM(10−6) range, whereas both ss and ds PCR options have concentration requirements that are much lower (nM, 10−9 range). The relative efficiencies and mutation rates inherent in these different strategies are not necessarily well understood. In addition to the manner used to assemble genes, the size of the initial oligonucleotides utilized may also have significant impact upon the final product and the efficiency of the process. Prior synthesis attempts have generally used oligonucleotides ranging in size from 20 to 70 bp, assembled through hybridization of overlaps in the range of 6-40 bp. Since many factors in the process are determined by the length and composition of the oligonucleotides (Tm, secondary structure, etc.), the size and heterogeneity of the initial oligonucleotide population can have a significant effect on the efficiency of the assembly and the quality of the final assembled genes.
In accordance with the present invention, synthesis of long chain molecules such as DNA is carried out rapidly and efficiently to produce relatively large quantities of the desired product. The synthesis of an entire gene or multiple genes formed of many hundreds or thousands of base pairs can be accomplished rapidly and, if desired, in a fully automated process requiring minimal operator intervention, and in a matter of a day or a few days rather than many days or weeks.
In the present invention, production of a desired gene or set of genes having a specified base pair sequence is initiated by analyzing the specified target sequence and determining a set of subsequences of base pairs that can be assembled to form the desired final target sequence. For example, a target sequence having several hundreds or thousands of base pairs may be divided up into a set of subsequences each having a much smaller number of base pairs, e.g., 400 to 600 bp, which are then further divided into oligonucleotide sequences, e.g., in the range of 20 to 100 bp, which may be conveniently synthesized utilizing automated oligonucleotide synthesis techniques. An exemplary oligonucleotide synthesis technique utilizes a maskless array synthesizer (MAS) by which large numbers of different oligonucleotide sequences (e.g., 50 to 100 bases in length) are generated in a array on a support in a few hours under computer control utilizing phosphoramidite chemistry without moving parts or operator intervention, although other synthesis materials and techniques may also be utilized. The synthesized oligonucleotides are subsequently selectively released from the support to be used in a sequential assembly process. The oligonucleotides may be released utilizing, for example, base labile linkers or photo-cleavable linkers. In a preferred process, the oligonucleotide sequences include not only the desired subsequences for the final product but also end sequences that may be utilized as primers in the polymerase chain reaction (PCR), allowing the initial set of oligonucleotides to be greatly amplified in volume using PCR techniques. After the oligonucleotides have been amplified by PCR, the primer sequences are then removed, leaving only the desired oligonucleotides.
DNA error filtering is preferably carried out on short double-stranded oligonucleotides and longer DNA fragments before and during the assembly process. An exemplary error filtering technique is DNA coincidence filtering, which utilizes the bacterial MutS protein to bind DNA duplexes containing mismatched bases while allowing error free duplexes to pass through. Assembly chambers are utilized for mixing and thermal cycling during the DNA fragment assembly. Oligonucleotides or intermediate sized DNA fragments flow into the chambers along with PCR buffer, deoxynucleotide triphosphates, and thermostable DNA polymerase. These reagents are then mixed, e.g. by ultrasonic mixing, and then thermal cycled for assembly and amplification reactions. An integrated fluidic system collects the released oligonucleotides from the synthesis chamber and routes them through the error filters to and from the assembly chambers. The system also delivers reagents needed for fragment assembly and error filtering. The fluidic system is preferably constructed of microfluidic channels and includes integrated micro-valves, flow sensors, heaters, ultrasonic mixers, and appropriate connections to external reagents, pumps and waste containers.
Further objects, features and advantages of the invention will be apparent from the following detailed description when taken in conjunction with the accompanying drawings.
In the drawings:
For purposes of exemplifying the invention,
An exemplary oligonucleotide synthesis system in accordance with the invention uses the intrinsic parallelism of optical imaging that allows very high densities (>300,000 cm−2) of oligonucleotide sequences to be synthesized on a support such as a glass surface. By releasing selected oligonucleotides from the support in an effective and controllable way, long dsDNA can be created by assembling the short oligonucleotide pieces. Thus, after release and step-wise assembly, the desired dsDNA sequence is formed. The gene assembly system is thus based on four capabilities: (1) the ability to synthesize arbitrary sequences of short oligomers in a massively parallel way, in situ, starting from monomers; (2) the ability to selectively release from the synthesis support whichever oligomer sequences are desired in order to perform a partial assembly; (3) the ability to assemble these intermediate length oligomers into a full length final product; and (4) the ability to filter and eliminate assembly or synthesis errors. The functional features (3) and (4) may be carried out in multiple steps and be interleaved with one another.
The synthesis of oligonucleotides traditionally occurs in the 3′-5′ direction for optimal synthesis yields. For the purpose of creating oligonucleotide microarrays useful in bioassays requiring enzymatic processing of the 3′ ends of the DNA, synthesis in the 5′-3′ direction is required. The quality of oligonucleotides synthesized by inverse 5′-3′ chemistry has been shown to be comparable to that obtained in the normal 3′-5′ direction. Oligonucleotides may be synthesized in either or both directions as needed. For the purposes of gene synthesis, the oligonucleotides need to be released from the support surface, and thus a cleavable linker is required. Standard oligonucleotide synthesis on controlled pore glass substrates utilize a base-labile linker that is cleaved along with the nucleobase protecting groups by ammonium hydroxide or ethylene diamine at the end of the synthesis. Although the base-labile linker approach should be sufficient for the release of oligonucleotides from the glass surface, it requires additional features: (1) the chip surface reactions must be divided into microchannels for the independent release of two or more groups of oligonucleotides for separate assembly, and (2) the DNA is released along with the nucleobase and phosphate protecting group cleavage products, requiring a purification/buffer exchange before the oligonucleotides can be used for assembly. A safety catch photolabile (SCPL) linker is preferably used to allow both the light-directed synthesis and light mediated surface release of oligonucleotides, as illustrated in
The quality of synthetic oligonucleotides is governed by a number of factors including: (1) achieving highest possible yield of photodeprotection to obtain acceptable full length products from a multi-step (e.g., up to 80) linear synthesis, (2) the efficiency of attachment of the bases to the deprotected sites (coupling efficiency), and (3) the amount of damage by excess light energy to the growing oligonucleotide strands. To address these issues, methods may be used to speed up the photoreaction and minimize damage to the growing oligonucleotide chains by shifting the deprotection wavelength from the UV to the visible range and suppressing unwanted side reactions during photodeprotection.
Due to the extremely small quantities of oligonucleotides produced per chip (˜10-20 pmol/cm2) utilizing a maskless array synthesizer, highly sensitive methods are required to analyze the quality of the oligonucleotides. Oligonucleotides produced on the MAS chip's surface have been analyzed by cleaving the silicon tether between the linker and the glass slide through extended treatment with ammonium hydroxide, phosphorylating the released oligonucleotides with ATP-y-32P, and separating the oligonucleotides on the PAGE denaturing gel to visualize the distribution of oligonucleotide lengths produced and to provide a quantitative assessment of synthesis efficiency. The ladders show that the full length products are being produced as the primary products, but also reveal a ladder of truncates, indicating that purification will be required to isolate full length oligonucleotides from truncates and synthesis by-products.
Four examples of specialized photolabile nucleoside phosphoramidites with base-labile linkers are shown in
It has been determined that thioxanthone sensitizers increase the quantum efficiency of NPPOC deprotection, that is, the use of sensitizers generates more “light-activated” molecules per photon. New photolabile groups have been developed with faster deprotection rates, improving the speed of photocleavage by about a factor of three.
Experiments clearly indicate that sensitized deprotection is a viable option for shifting the irradiation wavelength into the visible (>400 nm) region. This is due to the fact that energy band gap between the relevant excited states is smaller in the sensitizer than in the NPPOC. Thus, the necessary wavelength for “populating” the deprotection transition state, the NPPOC-triplet (T1), is shifted from 365 nm to about 405 nm via indirect excitation. As can be seen in the graph of
To improve the quality of released oligonucleotides prior to assembly, a reverse phase C18 purification step may be implemented to isolate oligonucleotides that received a base in the final synthesis cycle from those that did not. This should separate primarily full length oligonucleotides from tuncated sequences. In the final cycle, standard dimethoxytrityl (DMT)-protected nucleoside phosphoramidites may be used in place of the NPPOC—protected phosphoramidites such that, after deprotection of the nucleobase/phosphate protecting groups and activation of the safety-catch, oligonucleotides containing a DMT group will be selectively retained on C18-silica. After cleavage of the DMT group with aqueous acid, primarily full length oligonucleotides will be eluted for use in assembly reactions. This trityl-on synthesis and C18 purification is a standard protocol in oligonucleotide synthesis. If this purification is insufficient for assembly, full length oligonucleotides may be isolated by electrophoresis and/or ion exchange chromatography prior to assembly. If separation by oligonucleotide length is required, the oligonucleotide design may be restricted to have all oligonucleotides used in an assembly reaction be of the same length. Where a C18 purification step may be required to remove truncates, a base-activated SCPL-linker may be utilized. A synthesis of a base-activated SCPL-linker is discussed further below and illustrated in
Although the “building block” nucleotides can undergo filtering and subsequent purification to allow for a reduction in error-filled DNAs, the size of the oligonucleotides themselves may play a vital role in assembly success. Since step-wise base addition is not 100% efficient, the longer oligonucleotides are more likely to have errors and truncate species. However, although the longer oligonucleotides have more errors, fewer of these “blocks” are needed for assembly. The size of the “building block” can have a significant effect on the amount of error introduced into the assembled gene.
One approach for gene assembly in accordance with the invention involves a two stage process in which the synthesized oligonucleotides are first eluted and concentrated prior to assemblage into dsDNA. Assembly (the second stage) occurs in two steps: initially, the 20-50 bp short ssDNA are hybridized together and extended into ever-increasing lengths of dsDNA. After denaturation, this cycle is repeated until the oligonucleotides form the full length template. Next the full length template is amplified by PCR using primers directed against sequences present at the 5′ and 3′ ends of the assembled gene. Amplified products may be cloned and sequenced for quality control. However, depending on the use of the product, large sets of unassembled oligonucleotides or the PCR amplified DNA itself may be provided to the end-user, if desired. In this manner, the picomole concentrations of oligonucleotides present on the glass surface are converted into the nanomole and micromole amounts of DNA needed for cloning.
The two stages (elution and assembly) may be done in one step, but there is a predicted risk of creating truncated amplification products since hybridization is occurring at very low total mass concentrations. Another option involves performing the assembly reaction with the 5′ or 3′ oligonucleotides covalently attached to a small domain on the glass surface. The linker attaching this terminal oligonucleotide to the glass may be either chemically or photolytically labile so that the surface-assembled dsDNA molecule can be released into solution and amplified with the addition of micromole amounts of universal primers.
Results with PCR assembled genes have shown that errors in the initial assembly products are commonplace. These errors limit the immediate usefulness of assembled double stranded DNA for all applications requiring perfect DNA sequences, such as gene expression. Indeed, this problem may be very significant with regard to the length of time required to produce any given sequence, since correcting errors is a time consuming process. To address these problems, general approaches to reduce or eliminate errors in assembled DNA sequences are utilized. There are two distinct phases where additions, deletions, and transversion errors are introduced in synthetic DNA: during the oligonucleotide synthesis; and during the assembly processes. During synthesis, errors can occur through unintended photodeprotections by stray photons, incomplete photodeprotection, incomplete couplings, incomplete nucleobase or phosphate backbone deprotections, as well as plethora of other side reactions. During assembly, errors can be introduced via mls-hybridization or mls-incorporation of bases by the polymerase. Most errors will occur randomly, although some may occur systematically and possibly be sequence dependent. The general preferred approach is termed “consensus filtering” as it utilizes DNA shuffling, error removal, and reassembly to convert a population of DNA molecules with random or partial systematic errors to a population of DNA enriched with molecules containing the consensus sequence of the original population. The error removal process utilizes the mismatch binding protein MutS to remove duplexes containing mismatches via affinity capture from a population of dsDNA molecules. The MutS filter may be considered a “coincidence filter”. The term “coincidence filter” is similar in concept to an “AND” gate in electronic circuitry wherein signal 1 AND signal 2 must be present for an event to be counted. The adaptation of this concept for DNA error filtering works as follows: for every oligonucleotide synthesized on the chip surface, its complement oligonucleotide will also be synthesized. Because the vast majority of the oligonucleotides are wild type (wt) or error-free, the error-containing or mutant type (mt) oligonucleotides will be most likely to hybridize with wild type, thus creating double-stranded oligonucleotides containing mismatches. The mismatched bases in the double-stranded oligonucleotide cause a bulge at the position where the base pairing is incorrect and will thus be trapped by an immobilized MutS protein while error-free pairs will flow through. To ascertain the effectiveness of MutS filtering, a 160 bp region of the green florescent protein (GFP) gene was assembled from unpurified 40mer oligonucleotides. The assembly product was either directly cloned into an expression vector, or heat denatured, re-annealed and subjected to MutS filtering before cloning. Although there were no apparent differences at the functional level (as assayed by visual inspection of the GFP fluorescing transformants), sequence analysis revealed that the control population lacking the MutS filter was 81% wt, whereas the “filtered” population was 100% wt. This experiment demonstrated that MutS filtering can increase the percentage of wt clones. From these and other assembly reactions using PCR, overall mutation rates are between 0.2 and 1.2 errors/kilobase (data not shown). Consensus filtering is essentially equivalent to DNA shuffling with a MutS mismatch removal step. The pool of dsDNA molecules containing mutations is fragmented into sets of overlapping fragments via restriction digestion and re-assembled into full length molecules by primerless PCR and amplification PCR. Although DNA shuffling has traditionally been used as a method for creating diverse populations of DNA molecules with all possible combinations of mutations present in the original population, the creation of diversity from a fixed population of mutants also demands an equivalent reduction in diversity among the shuffled products. Indeed, with this approach it is possible to start with a population of DNA molecules wherein every individual in the population contains errors, and create a new population of molecules in which the dominant species have the consensus sequence of the original population.
As illustrated in
The following simple mathematical model can be used to predict some parameters of consensus shuffling.
Where
P=percentage of clones with no errors
S=average size of fragments
E=errors per 1000 bases of input DNA population
M=MutS factor (fraction of mismatches escaping filter)
C=cycles of MutS filter
An input population of dsDNA molecules of length N, containing E errors/kb is fragmented into shorter dsDNA fragments of average length S. The fraction of oligonucleotide fragments with correct sequences (on average) will be 1−S*E/1000. The likelihood of the assembled product also containing the correct sequence will be the product of the likelihoods of all the individual oligonucleotides used in the assembly having the correct sequence. A reasonable approximation for the required number of oligonucleotides of average length S to assemble a gene of length N is 2N/S, assuming both strands must be represented. If a MutS error filter is applied to the re-annealed dsDNA fragments, the fraction of error containing dsDNA hybrids will be reduced by fraction M, the MutS factor. If the MutS process is iterated to increase the population of correct sequences, the fraction of error-containing sequences (S*E/1000) can be multiplied by the MutS factor M each cycle.
Several interesting predictions emerge from this model. First, some realistic assumptions are made about the variables in this model: error rates in the initial assembly product are between 1 and 5 errors/kb, target sequence lengths are between 500 bases and 5 kb, average fragment lengths are between 50 and 200 bases, MutS factors of 1.0 (no filtering), 0.5 (50% efficient), 0.25 (75% efficient) or 0.1 (90% efficient) are considered. From the results of the theoretical calculations shown in Table 1 below, less than 3 rounds of consensus shuffling with a MutS filter should be sufficient to convert a population of DNA sequences where all molecules contain multiple errors in to a population of DNA sequences where the correct sequence is the dominant sequence. The model also predicts that fragment sizes between 50 and 200 will not be a critical factor, and that MutS filtering, even if poorly efficient (50%) is effective upon multiple iterations.
Consensus shuffling will be necessary whenever a significant portion of the DNA population contains errors. By fragmenting the full length DNA into shorter fragments, the MutS filter will be able to remove the mismatched fragments while allowing a much greater proportion of the DNA to pass through the filter. In the case where all members of the population contain errors, coincidence filtering of the product alone would be ineffective.
Gene sequence fidelity and production efficiency depend on specificity and completeness of sub-sequence hybridization. The primary bioinformatics objectives are to ensure that each assembly sub-sequence has one and only one complementary target sequence and to ensure that each component sequence is free of any secondary structure that would preclude gene assembly. Thus, the problem of breaking down a complete gene (2,000-10,000 base pairs) into assembly sequences is solved when each of the sequences is unique and structure free.
Bioinformatics software may be utilized to divide a target DNA sequence into oligonucleotides capable of assembly. Effective gene assembly begins with careful planning. The bioinformatics software deconstructs the whole gene into the small oligonucleotide building blocks from which it will be constructed. There are several critical factors that affect the choice of lines of demarcation between assembly sequences. The first step in actual gene assembly is hybridization of sub-sequences. Hybridization between any two indivicial complements should be complete and specific. That means that the thermodynamic stability of the duplex should be known and that the annealing temperature be appropriate to that value. When a sub-sequence has strong secondary structure it cannot effectively hybridize to its complement. Therefore, the potential for secondary structure must be evaluated for each elementary sequence. Next, the potential for mishybridization must be evaluated by identifying gene sequences with a high level of homology to the sub-sequence under consideration. With a fixed annealing temperature, it is possible to predict the extent of mishybridization by calculating the thermodynamic free energy of formation between the sub-sequence and the sequence at the improper target location. The levels of tolerance for secondary structure and mishybridization are difficult to predict without supporting experimental validation.
A relatively simple gene assembly design software breaks the complete gene down into fixed length (N) oligonucleotides. The length is typically 20-60 bases. The length of the overlap between sub-sequences is set at N/2. To find the “best” set of oligonucleotides for assembly, the algorithm divides the sequence into all possible N-mers with N/2 overlap and then calculates the Tm (Tm=81.5+0.41(% GC)−500/length+16.6 log[salt]) of all overlapping portions. The highest score is given to the set with the most uniform set of melting temperatures. The algorithm also scans each overlap sequence for complete uniqueness for its identified target within the context of the entire gene. If more than one target is identified for a sub-sequence, assembly is split to separate the intended target from the unintended target into separate subassembly steps. Sub-assemblies are completed and then combined for the final assembly. Sets with only a few sub-assembly steps are scored more favorably than those with multiple assembly steps. The output of the software is the set of oligonucleotides with the best overall score. In a more sophisticated software approach, the gene is still divided into fixed length (N) sub-sequences, but instead of simply having fixed N/2 overlaps, overlap length is adjusted to achieve a specific melting temperature (% G/C method).
The software may have a web based graphical user interface based on the design of the familiar NCBI BLAST interface. The user can paste or upload a sequence file of the desired DNA sequence into the sequence window. The user then chooses the sub-sequence length and the desired assembly temperature. The user can also specify the coordinates of the open reading frame and choose from a menu of codon preferences for the output oligonucleotides. This feature enables sequences from one species to be efficiently expressed in another. The output is displayed in two formats. The text mode displays lists of oligonucleotides with their melting temperatures broken up into assembly steps. The graphics mode visually shows the oligonucleotides and overlaps. Each image of a fragment is a link to a text string representation of that fragment sequence. The two modes have clickable links to an output tab delimited file containing the list of oligo sequences to be synthesized, its step, and its overlap melting temperature. The links allow the user to open or save the file.
Various adjustments and enhancements may be made to the basic software structure. A first adjustment updates the method of calculating melting temperature to one that uses nearest neighbor (NN) free energies. The accuracy of the NN method is significantly higher than the % GC method. A second adjustment eliminates the requirement for fixed length product. Rather, an assembly Tm can be defined and the length of sub-sequence products adjusted in each case to be the sum of two variable length sequences chosen to agree with the design Tm. Once the entire gene is broken down into parts, each part can be evaluated for secondary structure (e.g., hairpin information) using the publicly available Mfold or other similar software packages. Such programs have been used to evaluate large combinatorial libraries (17 million individual sequences) of long 100mer oligonucleotides for secondary structure and cross-hybridization between individual members. Sets for the synthesizer can be scored highly which have little or no secondary structure at the assembly temperature. The overlapping sequences are tested for uniqueness in the gene and near-identical sequences can be evaluated as potential sources of error. Specifically, partial match sequences can be identified which may contain mismatches, insertions, or deletions, and their thermodynamic binding energy can be calculated. The error prone sequences (those whose free energies indicate unacceptable levels of formation at the design Tm) can either be separated during assembly or an alternate set will be chosen which divides the conflicting sequences. Finally, the software can automatically perform a BLAST search for each gene sequence to ensure that it does not contain significant sub-sequences of forbidden pathogens (Anthrax, Plague, Ebola etc.)
There are four critical aspects of the multiplexed surface invasive cleavage reaction bioinformatics that deserve attention. First, one must consider the uniqueness of each probe and its specificity for the desired target in the context of the complete sample. While it is quite straightforward to ensure that the complete probe sequence is unique, one also must consider non-specific hybridization, which would inhibit proper signal generation. Second, one must consider the uniformity of duplex formation temperature. For the invasive cleavage reaction, the optimum reaction temperature is identical to the melting temperature of the target:probe duplex. Duplexes whose formation temperatures differ from the reaction temperature may not produce large signals because of limited cleavage. Third, it is becoming well known that the duplex formation energies are lower on surfaces than in solution. The reasons are just now being elucidated. This fact must be accounted for when choosing sequences and reaction temperatures. Fourth, in one of its current forms, the surface invasive cleavage reaction requires addition of invader oligonucleotides in solution. It is important that these oligonucleotides also have high specificity for the target and additionally do not hybridize to any probes at the reaction temperature. This concern is obviously eliminated for the second format of the reaction where both invader and probe are co-immobilized on the same array element.
After the set of oligonucleotides has been selected, synthesis of these oligonucleotides is preferably carried out utilizing an automated DNA synthesizer system. Because of its flexibility and addressability, a large massively parallel optical DNA maskless array synthesizer (MAS) system which is based on the use of a high density spatial light modulator (e.g., as described in U.S. Pat. No. 6,375,903, incorporated herein by reference) is a preferred system for oligonucleotide synthesis. An image locking system as described below is preferably used to eliminate image drift during synthesis of the set of oligonucleotides.
A maskless array synthesizer can generate several μm of drift over several hours due to the thermal expansion of optics parts and from other sources. The optical path between the DMD 20 and DNA cell 22 is about 1 meter. The thermal expansion caused by the temperature and humidity fluctuation of surrounding environments and also due to UV exposure, a slight change of position or rotation of the primary spherical mirror and other optical parts may result. This slight change may cause several μm of drift of the projected image. Since the space between each digital micromirror is only 1 μm, this image drift can cause the projected image to be shifted to expose the UV light at the wrong oligonucleotide spots, generating defects in oligonucleotides sequences and their spatial distribution. The image locking system 16 confines the image shift within a certain range to minimize image drift.
The system 28 can be a 0.08 numerical aperture reflective imaging system based on a variation of the 1:1 Offner relay. Such reflective optical systems are described in A. Offner, “New Concepts in Projection Mask Aligners,” Optical Engineering, Vol. 14, pp. 130-132 (1975). The DMD 30 can be a micromirror array available from Texas Instruments, Inc. The reaction cell 38 includes a quartz block 47, a glass slide 49, a projected image 51, a radiochromic film 52, and a reference mark 53. The UV lamp 44 can be a 1000 W Hg Arc lamp (e.g., Oriel 6287, 66021), which can provide a UV line at 365 nm (or anywhere in a range of 350 to 450 nm). Other sources, such as, e.g., Ar-ion lazers and Hg—Xe high pressure lamps, may also be used.
The laser 42 projects a laser beam onto beam splitter 36 which reflects a portion of the beam onto DMD 30. DMD 30 has a two-dimensional array of individual micromirrors which are responsive to the control signals supplied to the DMD 30 to tilt in one of at least two directions. A telecentric aperture may be placed in front of the convex mirror 34.
The camera 40 is a closed circuit device (CCD) camera used to capture an image of one or more alignment marks. The captured image is transferred to a computer 46 for image processing. When a misalignment is detected, correction signals are generated by the computer 46 and sent to actuators 48 and 50 as the feedback to adjust the mirror 32, so that the correct alignment is reestablished. In at least one alternative embodiment, three electro-strictive actuators (instead of actuators 48 and 50) are used to provide minimum incremental movement of 60 nm and control the rotations and movement of the mirror 32. The displacement of the projected image at the glass slide is highly sensitive to the rotations and movement of the mirror 32.
In an exemplary embodiment, an image processing procedure calculates the image displacement from the images captured by the camera 40, by calculating the cross-correction signals between a captured input image described with reference to
or, using the Wiener-Khintchine Theorem, as
cgh(X,Y)=IFFT(FFT2(g(X,Y))·FFT2(rot90(h(X,Y))))
The new locations of the reference mark and the projected image are marked by correlation peaks (i.e., the highest value of cgh(X,Y)). Based on the new locations, correction signals are computed and sent to the actuators to move the mirror. This correction procedure continues until the synthesis is completed.
In an exemplary embodiment, computer programs control the actuators and generate the correction signals by image processing. A log file of displacements can also be recorded and analyzed for measuring actual displacement indirectly and its direction for further refinement of the algorithm. Various mark shapes (e.g., crosses, chevrons, circles) can be used as the reference mark 53.
Since each pixel is approximately 15 μm in size, it is necessary to keep the image locked to less than 200 nm. Since the distance from the concave mirror 32 (
Other designs are possible, involving different schemes for the detection of the displacements. The actuators 48 and 50 can be used to effectively align the optics. In another exemplary embodiment, diffractive marks can also be used, alleviating the need for microscopes. Partially transmitting marks (half toned) can be used for other schemes of detection.
The synthesis stage may utilize the technology that has been developed for the fabrication of rapid turnaround microarray DNA chips and that is being commercialized by NimbleGen, Inc. See, e.g., F. Cerrina, et al., Microelectronic Engineering, 61-2, 2002, pp. 33-40. In this process, oligonucleotides are attached to the substrate by a stable linker, and are terminated with a photolabile protecting group. Exposure to the light removes the photolabile protective group, making the attachment point available to chemicals that are floated into the reaction cell. These chemicals can be phosphoramidite based, or can be other types of more general chemicals, and carry the photoprotecting group. After attachment of the base (the chemicals to be attached will be referenced to as “base” although other molecules are possible), the base is connected to the pre-existing oligonucleotide and the photolabaile group protects it from further development. After four of these steps, one per base, the surface of the chip will have an array of the four different “colors,” i.e., A, C, T or G. In the next round of exposure, the photolabile groups are again deprotected by selective light exposure and the next base is attached. In this way, if N illuminated pixels are used to form the exposure, at the end of 20 cycles N different oligonucleotides will be distributed on the surface of the chip in separate and distinct locations. The areas where the oligonucleotides have been synthesized are “tiled” on the surface and are separated from each other by a region where no exposure takes place. This reduces the problem of light being scattered from one tile into the other and thus into causing unwanted reactions. The use of digital micromirror display (DMD) based optics as discussed above allows great flexibility in the DNA chip layout. To completely deprotect a site requires about 60 seconds at a fluence of about 100 mw/cm2 of Hg I-line radiation (365 nm). Throughout the system, great care is used to contain stray and diffracted light because photons that reach unwanted sites will cause unwanted deprotection reactions and thus errors in the synthesis. Stray light must be kept to an absolute minimum. This may be done by using high quality optical mirrors and anti-reflection coatings on all of the surfaces that are present throughout the system.
In the formation of the oligonucleotides for gene synthesis, the dimensions of the features are usually relatively large, approximately 100×150 microns. That means that the geometrical depth of focus of the image is of the order of 1400 microns at a NA of 0.07, while the cavity of the typical reaction chamber is only of the order of 100 microns. As shown in
Synthesis may also be carried out by other types of systems, for example, based on the use of an array of light emitting diodes (LEDs) or solid state lasers. Such an array can be placed at the focal plane of the mirrors assembly, replacing the micromirror spatial light modulator and lamp. Several types of LEDs are commercially available, based on gallium nitride and/or aluminum nitride formulation with different lifetimes and different wavelength characteristics, from companies such as Nichia, Cree and Uniroyal. An array of solid state lasers may also be used instead of an array of LEDs.
Other types of automated synthesis systems may also be utilized that do not rely on optical image formation to form an array. For example, synthesis can also be carried out utilizing a column packed with microspheres as illustrated in
In the apparatus 110 shown in
A plurality of controllable light sources 130 are mounted at spaced positions along the length of the transparent wall 120 of the conduit to allow selective illumination of separated sections of the conduit and of the particles held therein in the separated sections. Light emitted from the sources 130 may be focused by lenses 131 before passing through the wall 120 of the conduit to illuminate separated sections 133 of the particles within the conduit. Light absorbing or blocking elements 135 may be mounted between each of the light sources 130 to minimize stray light from one light source being directed to the region to be illuminated by an adjacent light source. The light sources 130 may be any convenient light source, for example, light emitting diodes (LEDs), which are selectively supplied with power on lines 136 from a computer controller 137, such that any combination of the light sources can be turned at a particular point in time. Any other controllable light source may be utilized, including individual lamps of any type that can be turned on and off, constantly burning lamps with mechanical shutters (including movable mirrors as well as light blocking shutters) or electronic shutters (e.g., liquid crystal light valves), and fiber optic or other light pipes transmitting light from single or multiple sources, etc. The controller 137 is also connected to controllable valves 140 and 141 which are connected to an output line 138 which receives the fluid from the outlet end 119 of the conduit. The controller 137 can control the valves 140 and 141 to either discharge the reagents that have been passed through the conduit onto a waste (collection) line 143, or to direct oligomers which have been released from the conduit onto a discharge line 145 which can be directed to further processing equipment or to readers, etc.
In operation, the reagent supply initially provides fluid flowing through the conduit that creates a photodeprotective group covering the surfaces of the carrier particles 122. The flow of reagent is then stopped and the controller 137 turns on a selected combination of the light sources 130 (typically at ultraviolet (UV) wavelengths) to illuminate selected ones of the separated sections 133 of the packed particles within the conduit. In a conventional manner, the light emitted from each active source 130 renders the photodeprotective group susceptible to removal by a reagent which is passed through the conduit by the reagent supply 111, following which the reagent supply can be controlled to provide a desired molecular element, such as a nucleotide base (A,G,T,C) which will bind to the surfaces of the carrier particles from which the photodeprotective group has been removed. Thereafter, the reagent supply can then provide further photodeprotective group material through the conduit to protect all bases, followed by activation and illumination from selected sources 130 to allow removal of the photodeprotective group from the particles in selected sections of the conduit. After removal of the susceptible photodeprotective material, the reagent supply 111 can then provide another base material that is flowed through the conduit to attach to existing bases on the carrier particles which have been exposed. The process as described above can be repeated multiple times until a sufficient size of chain molecule is created. Each of the light sources 130 can separately illuminate one of the separated sections of packed particles, allowing different sequences of, e.g., nucleotides within the oligomers formed at each of the separated sections.
Although it is preferable that the controller 137 be an automated controller, for example, under computer control, with the desired sequence of reagents and activated light sources 130 programmed into the controller, it is also apparent and understood that the reagent supply 11 and the light sources 130 can be controlled manually and by analog or digital control equipment which does not require the use of a computer.
The surfaces of the carrier particles 122 are coated with a material that acts as a group linker between the surface of the particle and the chain molecule to be formed. The carrier particles may have a diameter substantially less than the width of the channel so that multiple carrier particles may pack each section of the channel between the walls of the channel. The carrier particles are otherwise free from attachment to each other or to the walls of the conduit. As illustrated in
The light sources emit light within a range of a selected wavelength, and lenses and/or mirrors may be mounted with the sources to couple and focus the light from the sources onto the sections of the channel. The sources may also be mounted to the conduit such that a face of the source (e.g., a light emitting diode) from which light is emitted forms a portion of the transparent wall of the conduit. Light blocking material may be mounted between adjacent sources in position to prevent light from one source passing into a section of the channel that is to be illuminated by an adjacent source. The conduit may be filled with an index matching fluid to minimize scattering losses. The apparatus may further include a transparent window spaced from the transparent wall of the conduit and including an enclosure forming an enclosed region with the window and the transparent wall of the conduit. An index matching fluid within the enclosed region has an index of refraction near that of the transparent wall of the conduit to minimize reflections at the transparent wall of the conduit. The light sources may be mounted outside of the window in position to project light through the window, the index matching fluid, and the transparent wall of the conduit. The window can include an antireflective coating thereon to minimize unwanted reflections and dispersion of light. Where the conduit has walls which are all transparent to light, a material may be formed adjacent to the conduit, between the separated sections to be illuminated, which absorbs or reflects light transmitted through the walls of the conduit to minimize stray light.
After synthesis and elution, volumes of materials may be handled through a repetitive process. The post-synthesis steps can be automated using a microtiter plate preparation robotic workstation. In this approach, the oligonucleotide sets are selectively eluted to individual wells in a (e.g., 96-well) microtiter plate. Then, these oligonucleotides are purified using an array of C18 pipette tips mounted on the robotic tool head, as illustrated in
Each assembled 500mer pool is purified using another C18 array to remove the polymerase enzyme and then dispensed into three wells (pools) with equal volume to perform consensus filtering. Each pool undergoes complete digestion with one or more restriction enzymes. The digested pools of DNA are denatured and re-annealed using the cycler. The MutS filtering step can also be accomplished using parallel pipettes and fluid dispensing. The MutS pipette tips may be formed as shown in
Before the 500mers are assembled into the final 10K bp gene, a small volume of the individual 500mers can be sampled and sequenced. The retention of 500mer samples can be used for quality control. For example, if it is found that the final gene has an error in the sequence, only the particular 500mer responsible for the error needs to be resynthesized rather than the entire library of 500mers. The final assembly can combine all the individual 500mers with the necessary PCR reagents and proceed in a thermal cycler. If desired, a robotic system, similar, for example, to the Beckman Coulter Biomek, can be integrated with the automated gene synthesizer.
A hybrid microfluidic fabrication technology may be used to provide both flexible integration and inexpensive manufacturing, preferably using liquid phase photopolymerization methods to fabricate post-synthesis fluidics features between two glass plates, and a top PDMS (polydimethysiloxane) layer to implement fluid control valve elements. It is desirable to reduce the synthesis chamber volume to reduce reagent cost. In the synthesis chamber, the volume is preferably reduced to ˜500 nl by using capillaries as synthesis cells. However, the reduction in release volume increases the difficulty of post-synthesis fluid handling. Pipette manipulation is more difficult with smaller volumes, but microfluidics provides a more suitable approach that can be easily integrated into the post processing steps. Microfluidics can also improve the concentration of the final product by two mechanisms: the reduction of material lost due to fewer fluid transfer steps, and the reduction of final assembly reaction volume. In the robotic approach, each 500mer assembly requires up to 14 transfers (if the consensus filter is repeated 3 times) of the oligonucleotides between microtiter plates, and each of these transfers is done with pipette tips. During these handling steps, the oligonucleotides may be lost due to residual transfer volumes. The microfluidics approach greatly reduces the amount of fluid handling, and hence the reagent costs. Furthermore, the final assembly steps can be performed in smaller volumes than previously possible, resulting in higher oligonucleotide concentrations in the final product without using complicated concentration steps. Individual functional components can be implemented and integrated into a microfluidic platform. Instead of storing the eluted oligonucleotides in wells and purifying them using pipette tips (20 to 100 μL volumes), flow-through elements can be used to purify and filter the synthesis product as it is eluted from the synthesis chamber. The μFT method as illustrated in
In each 10 k bp assembly, multiple microfluidic chips preferably are operated simultaneously to achieve maximum efficiency. This can be done by minimizing the chip area for each assembly process and placing multiple copies of the system on the same wafer. However, this approach is limited by the volume requirements and the useable area on a substrate. Another approach is to use a 3D stackable architecture and arrange the individual assembly chips so that they share common fluidic interconnects.
Dependent upon the chemistry utilized, many stages throughout the synthesis and assembly process can be assayed for quality control. Where photorelease chemistry is utilized, this allows for a spatial and temporal release of oligonucleotides. Therefore, it is possible to synthesize and leave a variety of “control” oligonucleotides tethered to each chip. A diagram of a control process is shown in
There is currently great interest in the use of small molecule microarrays and high throughput identification of new bioactive compounds. Indeed, it is hoped that microarrays of ligands will accelerate chemical genomics in much the same way DNA microarrays have accelerated genomics. The small molecule microarrays can be formed either by physical spotting of compounds into arrays with robotics, assembly of DNA/RNA-small molecule conjugates into DNA arrays, or by in situ synthesis. A new approach to in situ synthesis is the use of photolabile protecting group chemistry for use in light directed combinatorial synthesis of small molecule arrays.
The use of light-directed combinational chemistry has thus far been limited to the synthesis of linear polymers (DNA, polypeptides, etc.) due primarily to the lack of photolabile protecting groups that allow the independent, selective deprotection of multiple protecting groups on the same molecular framework. The ability to independently cleave multiple protecting groups using light would open the door for in situ light directed combinatorial chemistry to build drug-like small molecule libraries in arrays with the MAS. Although several approaches can be envisioned to solve this problem, many suffer drawbacks that make them unattractive. One approach involves the development of protecting groups that are sensitive to different wavelengths of light, and another uses photo-generated cleavage reagents. The former approach has difficulties associated with specificity of cleavage and demands specialized light sources; the latter suffers from a loss of spatial resolution due to the generation of diffusible chemical reagents. A preferred approach is a multiple orthogonal safety-catch photolabile (SCPL) protecting group that can be independently photocleavable with a 365 nm light source through the use of a chemical pre-activation step that converts a photo-inert protecting group to a photocleavable group. These latent photocleavable protecting groups enable a large variety of small molecule combinatorial chemistry to be accomplished using a MAS modified to allow the introduction of many independent reagents during the diversity introduction steps in the synthesis. In combination with a surface sensitive method for imaging the binding of unlabelled proteins to small molecule arrays, this platform enables high throughput (up to >10000 compounds/chip) synthesis and screening of small molecule combinatorial libraries to identify library members that selectively bind to proteins.
In this approach, as illustrated with reference to
A first scheme (Scheme 1) as shown in
At least three orthogonal SCPL-protecting groups can be synthesized. Along with the parent photolabile group, this provides four independent orthogonal photolabile protecting groups (direct photodeprotection plus three safety catch). The SCPL-protecting groups need only be orthogonal to one another within a linear sequence of activation and cleavage conditions, and thus each group need not be fully orthogonal to all others. A synthetic route is outlined in
To provide a set of reagents, S2-1, S2-3 and S2-6 are converted to the active carbonates S2-2, S2-4, S2-7 for introduction into scaffold molecules. It should also be noted that S2-1, S2-3 and S2-6 can also be converted to esters for the protection of carboxylic acids. To characterize each of the SCPL-protecting groups, a series of protected benzylamines S3-1 are produced as shown in
A suitably protected scaffold may be used to test up to three orthogonal SCPL-protecting groups. One scaffold may be based upon the dipeptide Lys-Glu. A synthetic route to this scaffold is shown in
Compound 4 is subjected to UV photolysis to deprotect the a-carboxyl of aspartic acid and coupled to benzylamine with PyAOP. Treatment with 5% trifluoroacetic acid can unveil the photolabile group protecting the α-amine of lysine. Photodeprotection and coupling with benzoyl chloride will cap the amine. Deprotection of the dithiane with periodate will activate the final safety-catch for photolysis and coupling to benzoyl chloride. The allyl ester of S4-4 can be deprotected with Pd to allow covalent attachment to amine terminated glass slides. Various fluorescent dyes may be used on the three sites on the Lys-Asp dipeptide for independent, orthogonal deprotection of the SCPL-protecting groups. Using a set of orthogonal SCPL-protecting groups, biologically interesting scaffolds can be chosen for the creation and screening of microarrayed combinatorial libraries through in situ synthesis.
It is understood that the invention is not limited to the embodiments set forth herein as illustrative, but embraces all such forms thereof as come within the scope of the following claims.
This application claims the benefit of provisional patent application No. 60/715,623, filed Sep. 9, 2005, the disclosure of which is incorporated herein by reference.
This invention was made with United States government support awarded by the following agency: DOD ARPA DAAD 19-02-2-0026. The United States government has certain rights in this invention.
Number | Date | Country | |
---|---|---|---|
60715623 | Sep 2005 | US |