This application pertains to substance identification methods using pooling.
Pooled biological material, such as DNA, RNA, proteins, and the like, may be screened by a wide variety of methods, such as sequencing, PCR (Polymerase Chain Reaction), DNA/DNA hybridization, DNA/RNA hybridization, RNA/RNA hybridization, single strand DNA probing, protein/protein hybridization, and a wide variety of additional methods. References describing many of these methods include Ausubel et. al., “Short Protocols in Molecular Biology,” Wiley and Sons, New York and Sambrook et. al, “Molecular Cloning, A Laboratory Manual,” Cold Spring Harbor Press, New York, as well as numerous others. Also referenced are U.S. Pat. No. 5,780,222 (Method of PCR Testing of Pooled Blood Samples) and its references cited therein. Further referenced are U.S. Pat. Nos. 6,126,074 and 6,477,669 and their references cited therein, including those pertaining to Veterbi, Reed-Solomon, and other Error Correction and Data Compression Coding schemes.
Some embodiments are described below with reference to the following accompanying drawings.
a and 9b combined is a grid representing wells of a logical array including five 384-well plates;
The embodiments herein allow the incorporation of “loss-less information compression and error correction” or other known error correction strategies to increase the robustness of identification with significantly reduced numbers of samples to be processed by the end user. By having the samples pooled again after collection, it is possible to drastically reduce manipulations by the end user while still keeping very fine detail in the identification of the individual samples or populations originally pooled. Error-correction methods are well known in the computer data transmission field, but have not been used in the pooling of biological or chemical samples. Such methods allow a large reduction in the number of experiments to identify the specific biological sample or population containing a region of interest.
The embodiments herein include screening methods where the entire “pooling strategy” used is determined a priori. It is possible to conceive strategies where the strategy used in subsequent levels of processing depends on the outcome of previous levels, and such methods may increase efficiency.
The pooled material can be from individuals or a population. In order to reduce the analysis time, materials and expense, the pooling of small, high resolution pools in a matrix allows for a lower number of samples to be analyzed. The resulting high resolution data obtained from screening matrix pools are equivalent to the data obtained if the researcher had analyzed the complete set of small pools (much more expensive, time consuming, and difficult). The embodiments also give the added advantage of having two positive signals for identification. This reduces errors associated with a false positive when only one signal is obtained for identification, as in known methods.
The matrix pooling can be just for one superpool. Alternatively, it can be a matrix of a variety of different superpools and/or across a variety of different types of pools to allow the screening of the complete library with just one round of experiments. To do this, each small pool would be added to between 6 and 20 of the collection of re-pooled intermediate or final pools. Then, with the total number of pools of between 40 and 100, the complete library (or any set of biological samples) could be screened with high confidence and the ability to resolve multiple hits. If the library had a large redundancy of signal, the total number of pools could be increased to maintain accurate resolving power of the matrix method. The incorporation of positive controls in a matrix pattern can be used for quality assurance and for assisting in deconvolution if desired.
Biological materials may include Bacterial Artificial Chromosome (BAC) genomic DNA libraries and other biological or chemical libraries like cDNA libraries, protein libraries, RNA libraries, DNA libraries, cellular metabolic libraries, and chemical libraries. The current state of the art in pooling of biological materials for screening includes collecting all of the indexed microtiter plates containing the BAC library and then stacking these plates into a cube. These indexed plates are generally 96, 384, 864 well, or sometimes even 1536 well microtiter plates. The cube is then transected by a number of different planes (usually 4 to 8) which produce a large number of pools from each plane. The collection of all the pools from all of the planes is then screened to identify the clones of interest. This scheme is the current state-of-the-art and can identify multiple clone hits with some degree of reliability to identify multiple targets (i.e., BAC clones) at a specific coordinate.
According to Klein et al., their scheme with 6 planes in a collection of 24,576 BAC's could detect between 2 and 6 BAC's and over 90% could be reliably assigned to a specific coordinate with 184 screening pools (that is, 184 user experiments). S. Asakawa, et al., “Human BAC Library: Construction and Rapid Screening,” Gene, Vol. 191, pp. 69-79 (1979) may disclose some initial steps similar to the embodiments herein in the “Methods I” section on page 72. However, Asakawa requires pooling clones before growth and requires construction of each screening pool directly from the pooled clones after growth.
Embodiments herein may provide beneficial accuracy, efficiency and reduced cost by using at least one additional step of repooling the intermediate subpooled genomic DNA clone DNA into a final screening pool. The individual genomic DNA clone may be in at least three unique final screening pools, such as from three to ten unique final screening pools, or from four to eight.
If the BAC Library is from an organism with a genome larger than 1,000 megabases (Mb, millions of base pairs), the researcher may find that there are very few ambiguous hits in a plate, row, column, and diagonal (PRCD) plate. The Plate, Row, and Column pools correctly identify the clone of interest without the need for the Diagonal Pools. If the Diagonal Pools are only screened to solve the infrequent ambiguity, there would be a reduction in the number of PCR experiments.
A Bac-Bank is a way of storing fragments of DNA, together constituting the whole genome of an organism. The DNA of an organism is (semi) randomly cut in pieces, and these fragments are inserted into bacteria, which are then plated out so that a single colony grows from a single modified bacterium. Only modified bacteria are allowed to grow by using a bacterium that is potentially resistant to a certain antibiotic, and whose resistance is “switched on” by the presence of a foreign DNA fragment (insert), and by using a growth medium containing the antibiotic. The resulting (potentially) unique colonies of bacteria are then picked up individually and transferred to the wells of 384-well plates. The resulting stack of plates holding a large number of unique bacteria, ideally containing the whole genome of the original organism, is known as a “Bac-Bank”. It serves as a research database of the genome of the original organism. This database can be searched for fragments of DNA using PCR techniques.
Pooling is a method that allows one to quickly and economically search a Bac-Bank for the presence of certain DNA fragments. A Bac-Bank normally contains a large number of clones (˜100,000). Testing all clones individually for the presence of a fragment of DNA occurring only a few times (typically less than 100 times) in the original organism's genome is prohibitively expensive and laborious. When pooling is used, the DNA of several clones is gathered into a much lower number of wells (pools), every well containing DNA from several clones and every clone's DNA being present in multiple wells. The distribution pattern (“pooling method,” “pooling strategy,” or“rule-set”) is designed in such way that, when using PCR reactions to screen the pools, a pattern (of PCR reaction results) emerges that may be unique to the clone(s) having the required properties.
A simple example: take a 384-well plate having 16 rows of 24 columns; imagine pooling all wells horizontally and vertically, resulting in 16 row-pools and 24 column pools. If a single clone in this plate has a certain property, only the column-pool and the row-pool that particular clone is in will display a positive reaction when screened; the other 38 pools will be negative. Using only 40 PCR reactions it is therefore possible to pinpoint the positive clone in this 384-well plate; almost a tenfold reduction in labor and cost. As long as there are relatively few individuals with a certain property there are few errors. For properties that are shared among many individuals pooling methods may break down (yield incorrect results, either false-positives or (worse) false-negatives), and when this happens one has to resort to screening the clones individually.
Most often the individual clones in Bac-Bank are identified/labeled according to some hierarchical structure dictated by the physical properties of the Bac-Bank. The number of dimensions of a Bac-Bank is then related to the hierarchical structure of the storage format.
An example: the clones of a Bac-Bank are individually stored in wells on a plate. The wells are arranged in a rectangular pattern of rows and columns. If the plate constitutes the whole Bac-Bank, then the Bac-Bank can be viewed as one-dimensional if all the wells on the plate have consecutive numbers from left to right and top to bottom. One single parameter (well number) suffices to address every individual clone/well on the plate, and therefore the Bac-Bank is one-dimensional. A more natural approach in this example would be to address each well by its column and row numbers; then we would need two parameters to address an individual well, and therefore the same Bac-Bank can be two-dimensional as well.
For a larger Bac-Bank one plate would not suffice, and we could give each plate a separate ID code. This would add one coordinate to the number of coordinates required to address each individual well, and therefore there is one more dimension in this case than there would be for a single plate. Another approach is to store the well-plates in boxes of (for example) 10; each plate itself would then have two parameters (coordinates) for an address: the box number and (within that box) the plate number.
All these items are a matter of choice and, therefore, the number of dimensions of a Bac-Bank is a choice as well. It is even possible to use several different addressing schemes without imposing any structure upon this number/code. But, one may also choose to address an individual well as “[C,23,4,A,6]”, when the clone is located in fridge C, box 23, plate 4, column A, and row 6.
It is noteworthy that it is most convenient to have some sort of logical structure related to the physical location of a clone. Such logical structure helps in finding individual clones faster and, most often, there is also a relation between this logical/physical organization and the way the Bac-Bank is pooled. In most examples herein, it is assumed that the Bac-Bank includes 300 plates of 24×16 wells and that the Bac-Bank is three-dimensional.
It will be readily apparent that there are a number of benefits may arise from the embodiments, including but not limited to:
1. Higher resolution deconvolution of complex data without as many analysis reactions.
2. Analyzing a two, three, or more-dimensional matrix of pools allows significant reduction in analysis reactions while retaining a high degree of specificity.
3. The incorporation of loss-less compression and error-correction into the pooling strategy allows improved robustness of analysis and identification of individuals from the pools with increased effectiveness while reducing the numbers of analyses.
4. Significantly reducing the number of analysis reactions used by other less sophisticated pooling systems if a matrix re-pooling design is utilized.
5. As the analytical methods improve, the ability of re-pooling pools (that currently are at the limits of detection) is another significant benefit.
Embodiments herein are distinguished from known methods in that a collection of substances is systematically divided into smaller subsets which are then re-pooled to make the final screening pools. The pooled material can be from individual samples or a population of samples. In order to reduce the analysis time, materials and expense, the pooling of high resolution small pools in a matrix allows for a lower number of user experiments to have higher resolution (as if the researcher had analyzed the complete set of small pools).
One embodiment includes a two-step method that first screens for a superpool in which an item of interest appears. Then, that specific superpool's pools are re-pooled into matrix pools (which are 36 matrix pools instead of 76 pools). The matrix pools screened in this method also give the added advantage of having two or more positive signals for identification. This reduces the current state-of-the-art limitations associated with a false positive and/or false negative experimental result when only one signal is obtained for identification.
The Round I PCR may be performed on all of the Superpools containing all BAC clones in the Library. Each Superpool may contain 4,608 individual BAC clones. The results from Round I of PCR identify Superpool BAC clone(s) with the sequence of interest (there may be more than one Superpool identified). The researcher may choose to pursue one or more positive hits from the Round I PCR.
The Round II PCR may be performed on the Matrix Pools for the specific Superpool identified in Round I PCR. Round II PCR uses 36 PCR experiments plus controls (for each positive hit pursued from Round I PCR). The results from Round II PCR allow the researcher to identify the plate and well position for several positive hits and to rule out many potential false positives (in the particular Superpool(s) being pursued). In comparison, using a known plate/row/column/diagonal strategy, Round II PCR screening of PROD pools requires 76 PCR reactions plus controls. The Matrix system reduces the PCR experiments by 50%.
The Matrix Pools are PROD pools combined so that EACH of these PROD pools is contained in TWO unique Matrix Pools. There are a total of 36 Matrix Pools for each Superpool. Eight Matrix Plate Pools (MPP), eight Matrix Row Pools (MRP), 10 Matrix Column Pools (MCP) and 10 Matrix Diagonal Pools (MDP). There are at most 1,152 individual BAC clones inside each Matrix Pool well.
The matrix pooling can be just in one superpool. Alternately, it can be a matrix of a variety of different superpools and/or across a variety of different types of pools to allow the screening of the complete library with just one round of experiments. To do this, each small pool may be combined with any number (generally between six and many thousands depending on the sensitivity/robustness of the user's experimental screening strategy) of final collection pools (which are re-pooled intermediate pools). For this example we'll use the range of between 6 and 20 collection pools (fully compatible with a PCR based screening technology). Then, with the total number of pools of between 40 and 180, or between 80 and 96, the complete library may be screened with high confidence and the ability to resolve multiple samples in the library containing an identical region of interest. If the library had a large redundancy of signal, the total number of pools could be increased to maintain accurate resolving power of the matrix methodology. The incorporation of positive controls in a matrix pattern can be used for quality assurance and for assisting in deconvolution, if desired.
After compiling the entire BAC Library 10, the researcher receives two identical Superpool Collection Plates that are then used for Round I PCR. Specifically,
In
In
DNA or protein samples in Superpool SP1 from the plates of plate pool 30, the rows of row pool 40, the columns of column pool 50, and the diagonals of diagonal pool 60 are sequentially pooled as represented in
Material from the intermediate subpools is further combined and repooled into the Matrix Pool Plate of
This description is based on 384 well index plates, but it could be used with other plate formats as well with appropriate considerations. It is also based on a BAC genomic DNA library comprised of individual BAC clones, but it could be used with a large variety of biological sample collections or chemical sample collections. The system includes a collection of multiple Superpools that are screened during First Round PCR, to determine which set of Matrix Pools to screen during Second Round PCR. The total number of Superpools is determined by the total number of clones in the BAC library. Each Superpool has its own 96-well plate of corresponding Matrix Pools.
Superpools: Each superpool includes twelve consecutive 384-well plates from a BAC library. DNA is prepared by growing EACH BAC CLONE separately (to avoid growth competition between BAC clones) then combining the 4,608 cultures into one large-scale BAC prep. The Superpool of BAC DNA is then aliquoted onto a 96-well plate. Superpool SP-1 has all the BAC clones in the first twelve plates of the BAC library (Plate 001 to Plate 012). Superpool SP-2 has all the BAC clones in the second twelve plates of the BAC library (Plate 013 to Plate 024). This naming continues for the entire library.
Matrix Pools: For each superpool there is one set of Matrix Pools (this set of 36 Matrix Pools are aliquoted onto a Matrix Pool Plate. The Matrix Pools of Superpool SP1 are named Matrix Plate Pools, Matrix Row Pools, Matrix Column Pools, and Matrix Diagonal Pools.
Matrix Plate Pools 1 MPP-A1 through 1 MPP-H1 for the 8 wells that contain the matrix of plates 1-12 in Superpool SP1. Each Matrix Plate Pool contains 1,152 clones. Table 1 indicates the clones in each well.
Matrix Row Pools 1 MRP-A2 through 1 MRP-H2 for the 8 wells that contain the matrix of rows A-P in Superpool SP1. Each Matrix Row Pool contains 1,152 clones for twelve 384 well plates. Table 2 shows the composition of each well in the Matrix Row Pools.
Matrix Column Pools 1 MPP-A3 through 1 MPP-B4 for the 10 wells that contain the matrix of columns 1-24 in Superpool SP1. The Matrix Column Pools in wells A3 through D3 have 1,152 clones (6 different columns X 192 column wells/plate=1,152 clones per Matrix Column Pool). The Matrix Column Pools in wells E3 through B4 contain 768 clones (4 different columns X 192 column wells/plate=768 clones per Matrix Column Pool). Table 3 shows the composition of each well in the Matrix Column Pools.
Matrix Diagonal Pools 1MDP-G4 through 1MDP-H5 for the 10 wells that contain the matrix of diagonals 1-24 in Superpool SP1. The diagonal pools are a collection of clones from all twelve plates in one superpool that has been transected by a plane that goes diagonal in an XY plane and diagonal in a XZ plane through the 12 plates. The diagonals are named by the number of the column that the clone from row A on plate 1 of the specific diagonal. The Matrix Diagonal Pools in wells G4 through B5 have 1,152 clones (6 different diagonals X 12 plates/diagonal X 16 column wells/plate=1,152 clones per Matrix Diagonal Pool). The Matrix Diagonal Pools in wells C5 through H5 contain 768 clones (4 different diagonals X 12 plates/diagonal X 16 column wells/plate=768 clones per Matrix Diagonal Pool). Table 4 shows the exact location by plate number, row letter, and column number of each well included in each diagonal pool. Notably, as the diagonal number (column number) approaches 24, the diagonal pool wraps back to column 1 for a 16 row by 24 column plate. Diagonal pool composition is depicted graphically by
Table 4 is but an example of a diagonal scheme that is non-redundant with other pools. The embodiments are not limited to one specific diagonal scheme since there are additional diagonal scheme that can be used as alternatives to this diagonal scheme.
After screening the matrix pools by one of many possible methods, the identity of a specific positive clone from the library can be determined. The specific identification can be determined by a number of ways. If the pool design and matrix design are written or available in electronic form, the unique clone can be identified by a visual or electronic search. There can also be algorithms written based on the pool and matrix designs that can identify the unique clone.
The second example describes a method to form a matrix of a variety of different superpools and/or across a variety of different types of pools to allow the screening of the complete library with just one round of experiments. To do this, each small pool or subpool would be added to between 6 and 20 of the collection of re-pooled intermediate or final pools. Then with the total number of pools of between 40 and 180, and between 80 and 94, the complete library could be screened with high confidence and the ability to resolve multiple hits. If the library had a large redundancy of signal, the total number of pools could be increased to maintain accurate resolving power of the matrix solution. Note: 94 experiments is a convenient number, because current screening technologies are performed on a 96-well index plate format (94 experiments will allow room for a positive control and negative control).
In the second example, an additional method allows the complete library to be screened in one step while still maintaining the resolution of the superpool individual pools formed in Example 1. Example 2 further illustrates the benefits and possibilities of the embodiments. This example is also based on 384 well index plates, but it could be used with other plate formats as well with appropriate considerations. It is also based on a BAC genomic DNA library comprised of individual BAC clones, but it could be used with a large variety of biological collections. The superpools will be composed of eight 384 well plates per superpool and with 10 superpools combined into one large set of matrix pools. Therefore there will be 80 plates (30,720 individual BAC clones in the library) in this one matrix screening that can be tested with a limited number of tests while still maintaining good resolution to an individual clone or may possibly requires screening a few clones during the clone confirmation test directly on the clone(s) of interest. This scheme also allows a single set of experiments (instead of two sets of experiments as described in Example 1).
In this scheme, the individual superpools are numbered so that each individual ⅓ plate, row, column and diagonal pool has a unique number. Since there are 88 pools per superpool and ten superpools in this example, there are a total of 880 individual pools that will be combined into one large set of matrix pools. Depending on the number of redundant clones in the BAC library (a function of the genome size and the insert size of the BAC clones), the idealized degree of redundancy can dramatically improve the ability to identify multiple positive clones in one screening and thus minimize ambiguous results (when the user is analyzing data from the screening experiments).
The first ⅓ plate pools are formed by collecting all of the clones in plate 1 from columns 1-8. Then the second ⅓ plate pool is all of the clones from columns 9-16 of plate one. This continues on until the 24th ⅓ plate pool is from columns 17-24 of plate 8. The twenty-four ⅓ plate pools from superpool two would be considered being in pools 89-112 and so on until the tenth superpool where the ⅓ plate pools would be in pools 793-816.
The row pools would be built the same way as Example 1 but since there are only 8 plates in each superpool, each pool would have 192 clones. All of the clones in row A of the eight plates would be pooled together and these clones would be considered pool number 25. This would continue on in a similar fashion so all of the clones in row B of all eight plates of the superpool would belong to pool 26 (and so on) until finally, the pool of all of the clones in row P of the first eight plates would belong to pool number 40. Similarly, the row pools from the second superpool will be in pools numbered 113-128. This would continue in a similar fashion until all of the superpool individual clones belong to row pools and each are assigned unique numbers.
The column pools would be formed the same way as in Example 1 but since there are only 8 plates in each superpool, each pool would have 128 clones. All of the clones in column 1 of the eight plates would be pooled together and would belong to pool number 41. This would continue on in a similar fashion until all of the clones in column 2 of all eight plates of the superpool would belong to pool 42 (and so on). Until finally, the pool of all of the clones in column 24 of the first eight plates belong to pool number 64. Similarly, the column pools from the second superpool will be in pools numbered 129-152. This would continue in a similar fashion until all of the superpools belong to column pools and each are assigned unique numbers.
The diagonal pools would be formed the same way as in Example 1 but since there are only 8 plates in each superpool, each pool would have 128 clones. See Table 6 for the 8 plate superpool diagonal composition. All of the clones in diagonal 1 of the eight plates would be pooled together and would belong to pool number 65. This would continue on in a similar fashion until all of the clones in diagonal 2 of all eight plates of the superpool would belong to pool 66 (and so on). Until finally, the pool of all of the clones in diagonal 24 of the first eight plates belong to pool number 88. Similarly, the diagonal pools from the second superpool will be in pools numbered 152-176. This would continue in a similar fashion until all of the superpools belong to diagonal pools and each are assigned unique numbers.
To see one design of many possible schemes for identifying a complete set unique pool numbers, see Table 7. Table 7 is designed for 88 pools in each subset (superpool) and ten subset (superpools) in the complete set. These unique pool numbers are used to construct various tested screening pool pooling strategies. Notably, as the column number approaches 24, the diagonal pool wraps back to column 1 for a 16 row by 24 column plate.
Table 6 describes an alternate embodiment for constructing the diagonal pool composition for an 8 plate Superpool.
Table 7 sequentially assigns numbers to individual small pools or subpools from ten consecutive from eight plates so that the subpools may be repooled into final screening pools according to example alternative embodiments depicted in Tables 8-11.
Tables 8-11 describe various embodiments in the systematic or randomization of the loading of the small pool or subpooled plate, row, column, and diagonal pooled DNA (
ng pool
ng pool
indicates data missing or illegible when filed
Tables 8, 9, 10 and 11 show four of the many specific repooling designs that were tested to demonstrate the utility of this patent.
Tables 12-16 are data showing multiple embodiments of various randomization schemes for pooling a quantification of data loaded into the Matrix Pool Plate (
Tables 13, 14, 15 and 16 show data collected from various pooling designs.
In order to facilitate quick and accurate analysis of user screening data, we have developed a computer program, which identifies the appropriate plate and well position of all potential positive clones. The results will be processed with error correction algorithms to enhance the reliability of the results and compensate for false negative data and false positive data (inherent in many screening technologies like PCR). The results will be displayed as probability scores indicating the likelihood of the resulting plate and well position being correct.
While the invention has been described with reference to more than one embodiment, it is to be clearly understood by those skilled in the art that the invention is not limited to those embodiments. The general concept involves separating the large library set into multiple superpools and then making one, or more than one, set(s) of matrix pools formed by re-pooling a subset of the unique pools into screening pools that will be screened. Each unique pool can be placed in 0, 1, or more than one screening pools, depending on the redundancy of identification required.
Some of the embodiments contemplated herein pertain to construction of pooled biological material, such as DNA, RNA, proteins and the like, that are able to be screened by a wide variety of methods, such as sequencing, PCR (Polymerase Chain Reaction), DNA/DNA hybridization, DNA/RNA hybridization, RNA/RNA hybridization, single strand DNA probing, protein/protein hybridization, and a wide variety of additional methods. Construction of pools and superpools for screening as described herein differs from known methods in that the biological material set is systematically divided into a variety of smaller subsets, which are then re-pooled to make the final screening pools. This pooled material can be from individual samples or a population of samples. In order to reduce the analysis time, materials, and expense, the pooling of high resolution small pools in a matrix allows for a lower number of user experiments to have higher resolution (as if the researcher had analyzed the complete set of small pools).
In one embodiment, a substance identification method includes using a collection of segregated substances placed in respective wells of a plurality of collection plates physically or logically arranged in a stack. The wells are arranged in a plurality of rows and a plurality of columns and individual substances have a unique coordinate locating a well position defined by a plate identifier, a row identifier, and a column identifier. The method includes combining the substances into four or more intermediate subpools in respective wells of a subpool plate. The four or more intermediate subpools are of at least one type of intermediate subpool. One to four of the types of subpool are selected from the group consisting of a plate pool from wells having a common plate identifier, a row pool from wells having a common row identifier, a column pool from wells having a common column identifier, and a diagonal pool from wells having column and/or row identifiers per plate that are offset with respect to column and/or row identifiers per plate of any adjacent plate in the stack.
The four or more intermediate subpools are repooled into a number of final screening pools less than the four or more intermediate subpools. The final screening pools are placed in respective wells of a matrix pool plate based on a repooling design providing the subpooled substances in at least three different final screening pools. The method includes screening the final screening pools and identifying the presence of an item of interest associated with a substance. By using the repooling design, the coordinate is determined locating the well position in the collection for the substance associated with the item of interest.
By way of example, the subpooled substances may be different. However, as discussed above, some redundancy of substances may exist in a collection. The collection may be a portion of a BAC library, or may be an entire BAC library. Other possibilities for the substances may be selected from the group consisting of biological material clones or fragments, expressed proteins, purified proteins, materials exhibiting biological activity, chemicals expressed in biological processes, and combinations thereof. The item of interest may be selected from the group consisting of a nucleotide sequence in a biological material clone or fragment, a biological activity exhibited by a material, a chemical composition, and combinations thereof. Biological activity for proteins could include binding to specific chemicals, receptor sites, or antibodies, regulating proteins for transcription or translation, DNA binding proteins that turn other genes on or off, etc. Accordingly, the substances may be biological material clones including genomic DNA clones and the item of interest may be a DNA nucleotide sequence in a genomic clone DNA insert.
When the substances are biological material clones and the item of interest is a nucleotide sequence, the method may further include culturing the collection of clones, producing respective individual clone cultures, and forming the intermediate subpools using the individual clone cultures. Biological material fragments may be isolated from the four or more intermediate subpools and stored in a stable form prior to the repooling.
The at least one type of intermediate subpool may include four types of subpool including the plate pool, the row pool, the column pool, and the diagonal pool. The offset column and/or row identifiers of the diagonal pool may be offset by one column and/or row with respect to adjacent plates and might not be repeated in the diagonal pool for any other plate. The screening may be selected from the group consisting of sequencing, PCR probing, DNA to DNA hybridization probing, RNA to DNA probing, protein to protein probing, antibody to protein probing, DNA to protein probing, RNA to protein probing, chemical compound to protein probing, ligand to protein probing, and combinations or modifications thereof.
The repooling design may provide the subpooled substances in four to eight of the final screening pools to establish the benefits enumerated above. When the collection is a three-dimensional array, the combining of substances may use four types of intermediate subpools to provide four-dimensions of intermediate subpools. A sum of the plurality of plates, the plurality of rows, and the plurality of columns may be less than a number of the intermediate subpools sufficient to identify the well position of any substance in the array. Then, the repooling design may produce a number of final screening pools sufficient to identify the well position of any substance in the array, even though the number of final screening pools is less than the sum.
It follows, in one embodiment, that a method for identifying an individual genomic clone DNA insert from a collection of genomic DNA clones includes the following features. The individual genomic DNA clones are arrayed in a plurality of respective wells of a plurality of collection plates comprised of rows and columns. Individual genomic DNA clones have a specific coordinate locating a well position defined by three or four pools chosen from the group consisting of a plate pool, a row pool, a column pool, and a diagonal pool. The pools are in a hierarchical structure that is composed of a plate identifier, a row identifier, and a column identifier.
The method includes culturing the collection of genomic DNA clones and constructing at least four intermediate subpools by combining individual genomic DNA clone cultures in accordance with the hierarchical structure. Genomic DNA clone DNA is isolated from the at least four intermediate subpools and stored in a stable form. The at least four intermediate subpools are repooled into a number of Final Screening Pools based on a chosen repooling design. The subpooled individual genomic DNA clone DNA is in at least 4 Final Screening Pools and no more than 8 Final Screening Pools. The number of Final Screening Pools is screened for a DNA sequence of interest, determining the specific coordinate using the chosen repooling design and identifying the well position of the DNA sequence of interest.
Additional embodiments involve using a still further pooling design. In one embodiment, a substance identification method includes using a collection of segregated substances placed in respective wells physically or logically arranged in a two-dimensional array. The wells are arranged in a plurality of rows and a number of columns that is at least 1.5 times the plurality of rows. Individual substances have a unique coordinate locating a well position defined by a row identifier and a column identifier.
The method includes combining the substances into a number of screening pools in respective wells of a matrix pool plate. A plurality of individual screening pools include substances from wells having a row identifier in common with one other well. Pools are based on a pooling design that provides the pooled substances in two different screening pools. The method also includes screening the screening pools and identifying the presence of an item of interest associated with a substance. The pooling design is used to determine the coordinate locating the well position in the collection for the substance associated with the item of interest.
By way of example, further features described above for other embodiments may also be used in the present embodiment, if pertinent and supportive thereof. The number of columns may be at least two times the plurality of rows, or at least two times the plurality of rows plus one. The number of screening pools may match the number of columns. Given that the array may be logically arranged, instead of physically arranged, the array may reside on a plurality of microtiter well plates and at least some of the screening pools may extend across a plurality of the plates. The pooling design may provide screening pools from contiguous wells or, instead, one or more of the wells in a pool may be non-contiguous. In the context of the present document, contiguous wells are those that are adjacent in the sense that they are not separated from one another by another well, whether in a horizontal, vertical, or diagonal direction. A number of the screening pools sufficient to identify the well position of any substance in the array may be less than a sum of the plurality of rows and the number of columns.
The pooling design may reduce the two-dimensional array to a one-dimensional array. The one-dimensional array may include pseudo-column pools from a plurality of wells having a common column identifier and another equal number of wells having a row identifier in common with the plurality of wells. Instead, the one-dimensional array may include bi-diagonal pools from a plurality of wells that do not have row or column identifiers in common and another equal number of wells having a row identifier in common with the plurality of wells. The other number of wells in the pseudo-column pools or the bi-diagonal pools might not have row or column identifiers in common.
The pooling design may instead reduce a part of the two-dimensional array to a one-dimensional array. The screening pools from another part of the two-dimensional array may form screening pools in a second dimension, such as a row dimension. The number of screening pools may be at least the number of wells of the matrix pool plate, as shown Table 17 below, that corresponds to a total number of wells in the pooling design. The combination function formula below Table 17 used to calculate the contents of Table 17 may be used to determine any number of screening pools not shown in Table 17.
When the substances are biological material clones and the item of interest is a nucleotide sequence, the method may include culturing the collection of clones. The method may further include producing respective individual clone cultures, forming the screening pools using the individual clone cultures, and isolating biological material fragments from the screening pools. The isolated fragments may be stored in a stable form prior to the screening.
With the logical arrangement of
Any of the wells in columns 2-29 of
Hence, a pseudo-column pool may be created that includes column 1 and “extends” the column through the two-dimensional array in
A second screening pool may have the same form as the lightly shaded wells in
The intersection shown in
For consistency in the
Screening pool 16 includes column 16 and the diagonal that begins with well 306 at row M, column 18 and extends up to well 374 at row B, column 29. However, the diagonal does not end at column 29 and wraps to column 1 as though it were present next to column 29 to include well 1 at row A, column 1. Thus, well 1 is the intersection of screening pool 16 and screening pool 1.
If column 29 were absent from the logical array in
It follows then, that a number of columns that is 1.5 to 2 times the number of rows may introduce uncertainty that could warrant an additional test to clarify the ambiguity. Some level of nonspecific deconvolution may be acceptable depending on factors such as the ultimate purpose of the data not requiring absolute deconvolution. For example, with oversampling, a desired region of interest or substance may well be deconvoluted even though every specific individual or substance might not be absolutely deconvoluted.
Column 30 may be added to the array so that the number of columns is greater than two times the number of rows plus one. One way to add column 30 and still keep 14 rows includes duplicating wells 301 to 312 of row M (or other wells) in column 30. Of course, the duplication would create redundant testing. Another way to add column 30 is to move wells 313 to 324 from row N to a new column 30. No duplication would exist, but the total number of tests to resolve the location of all 384 substances would increase from 29 to 30.
It is also worth noting that the 29 screening pools (29 columns) designed as in
Another variation in the embodiments includes the one-dimensional array having bi-diagonal pools from a plurality of wells that do not have row or column identifiers in common and another equal number of wells having a row identifier in common with the plurality of wells. Like the pseudo-column pools embodiment, such an embodiment also fits within the general criteria of screening pools including wells having a row identifier in common with one other well based on a pooling design that provides the pooled substances in two different screening pools.
The considerations for pseudo-column pools discussed above also apply to the bi-diagonal pools of
Lightly shaded screening pool 1 (column 1) in
However, row pools L to P, designed as shown for row pool L in
Per Table 17 below, using 24 matrix wells could be used to determine unique locations for 276 wells when there are 24 screening pools with two screening pools in each combination. Known row and column pooling for five rows and 24 columns can determine unique locations for 120 (=5×24) wells. In total, 396 well locations can be identified with the
Conceivably, a three-dimensional array may also be reduced to a one-dimensional array using the principles described herein. For example, an additional diagonal or column might be added to the pools for
For example, with 28 matrix wells available, 378 wells combined into 28 two-dimensional screening pools could be uniquely located. The wells could all be on one 384-well plate. Also, with 28 matrix wells available, 3,276 wells combined into 28 three-dimensional screening pools could be uniquely located. The wells could all be on nine 384-well plates. It will be further appreciated that up to 4,560 wells from twelve 384-well plates could be combined into 96 two-dimensional screening pools and placed in a single 96-well plate. The full 4,608 wells on twelve 384-well plates would use 97 matrix wells, as further shown below with respect to
Values in the 2-D Wells and 3-D Wells columns are calculated using the following formula for a combination function:
The function returns the number of combinations where n=a given number of items (matrix wells) and k=a number of items in each combination (2=2-D Wells; 3=3-D Wells).
a and 9b show a grid representing five 384-well plates and use of the embodiments herein for a collection of segregated substances placed in respective wells logically arranged in a two-dimensional array extending across all five plates. Only a portion of the logical array appears in
Columns 1-6 of the fifth plate labeled as section 1 form columns 25-30 for rows 1-16 rows of the logical array. Columns 7-12 of the fifth plate labeled as section 2 form columns 25-30 for rows 17-32 of the logical array. Columns 13-18 of the fifth plate labeled as section 3 form columns 25-30 for rows 33-48 of the logical array. Columns 19-24 of the fifth plate labeled as section 4 form columns 25-30 for rows 49-64 of the logical array. Darkly shaded screening pool 1 including the wells of column 1 extends across plate 1 and plate 5, section 1 as well as plate 2 and plate 5, section 2. Lightly shaded screening pool 31 including the wells of column 31 extends across plate 2 and plate 5, section 2 as well as plates 3 and 4 and plate 5, section 4. Screening pool 1 and screening pool 31 intersect at one well on plate 5 in section 2.
Using 61 experiments, the embodiment of
Sometimes it is advantageous to use less than ideal pooling schemes because often they allow significantly improved cost effectiveness and reduced effort with very little loss in data. It may be beneficial to gather data even when every clone cannot be absolutely deconvoluted. For example, if whole genome sequencing is one of the goals, it is not critical that every individual clone have a known sequence; only as much as possible of the genome need be known. These are examples of times when the ratio of the number of columns being 2 times the number of rows plus one is not directly followed, but just used as a guide. Since the number of individuals on a given plate is fixed by the physical choice of columns and rows, sometimes it is beneficial to have some individuals not able to be absolutely deconvoluted. The benefit of having the entire plate processed and the data gathered for all wells outweighs the circumstance that one does not know exactly which individual the data came from for all of the data or individuals.
For 48 rows, two times the number of rows plus one would yield 97 columns.
Because the logical array includes 96 pools (96 columns) instead of 97 pools, pool 1 and pool 49 overlap at two wells in the same manner as discussed above regarding
In comparison, the plate, row, column, diagonal intermediate subpool embodiment herein utilizing re-pooling for a 12 plate stack of 384-well plates uses 34 experiments. Even though the number of experiments is significantly lower, in practice, the complexity of a three-dimensional pooling design is significantly greater than the complexity of a one-dimensional pooling design. A known two-dimensional row pool and the column pool design screening the 12 plates would use 144 experiments (=96+48).
Accordingly, the two-dimensional to one-dimensional approach in
Further, the complexity of building the intermediate sub pool by pooling 12 plates, rows on each of 12 plates, columns on each of 12 plates, and diagonals on each of 12 plates is more time intensive in comparison to the pseudo-column pools of
In the three-dimensional pooling design, the number of experiments was reduced by first increasing the number of dimensions from three to four (from plate, row, and column to plate, row, column, and diagonal) to reduce the number of experiments and then repooling to further reduce the number. Thus, with increased dimensions, experiments were reduced. In the two-dimensional pooling design, it was determined that the number of dimensions could decrease while still reducing the number of experiments. As a result, the two-dimensional pooling design produces a surprising result.
In compliance with the statute, the embodiments have been described in language more or less specific as to structural and methodical features. It is to be understood, however, that the embodiments are not limited to the specific features shown and described. The embodiments are, therefore, claimed in any of their forms or modifications within the proper scope of the appended claims appropriately interpreted in accordance with the doctrine of equivalents.
The present application is a continuation-in-part of U.S. application Ser. No. 10/841,375, filed May 5, 2004, entitled “Pool and Superpool Matrix Coding and Decoding Designs and Methods” and now U.S. Pat. No. 8,301,388, which claims the benefit of priority under 35 U.S.C. §119 to U.S. Provisional App. No. 60/467,912, filed May 5, 2003, entitled “Pool and Superpool matrix provisional application,” both which are herein incorporated by reference.