[unreadable] DESCRIPTION (provided by applicant): Obtaining complete genome DNA sequences of organisms from bacteria to man is a hallmark achievement of 20th century science and has had a huge impact on the biological sciences as well as the practice of medicine. Sequence assembly software played a critical role in making this possible. Now, there is a new rapid and dramatic fall in the cost of sequence data acquisition due to the emergence of revolutionary new sequencing machines that can turn out data in great quantities for a fraction of the cost. This is initiating a second wave of sequencing on a far greater scale and opens up many new applications, in addition to sequencing new genomes, that were heretofore not dreamed of or too costly to carry out. Projects such as using whole genome "snapshots" to identify crucial sequence changes that occur during evolution, metagenomic sequencing of communities, and medical diagnostic applications that can help track down alterations in genomic DNA related to disease are all becoming feasible. Medical scientists and entrepreneurs are even dreaming of the time when complete human genomes can be determined for about $1000, although this awaits yet another revolutionary advance in sequence data acquisition. Utilizing the present flood of new data will require corresponding progress in assembly software performance. We propose here to develop and evaluate prototype software approaches that can take advantage of all the new data gathering techniques. This will require developing new algorithms that take advantage of the abundant data generated to overcome bottlenecks that have previously limited the accuracy and completeness of assemblies necessitating expensive manual intervention to correct. As an independent software developer without vested ties to particular technical approaches, DNASTAR is uniquely positioned to provide a trusted implementation that can effectively combine data from all approaches allowing researchers the freedom to use whatever combination of technologies that best fits their needs. Our proposed new software techniques will make repeat handling, scaffold ordering and annotation far more efficient, and we predict the speed of assembly will also be increased dramatically. These feasibility experiments will be conducted on the strong foundation of DNASTAR's high performance SeqMan Genome Assembler platform. This will maximize the probability that a robust, commercially attractive product will emerge after Phase II with the high performance and accuracy needed by the large government sponsored sequencing centers, commercial and clinical sequencing operations as well as by individual research laboratories. We are embarking on a new era in medicine in which the genetic basis for diseases will be determined, both at the general public level and as individuals. This is being made possible, in part, by dramatic improvements in the ability to decipher vast amounts of DNA at a greatly reduced cost. This proposal aims to develop the computational software needed to convert that mass of coded data into medically useful information. [unreadable] [unreadable] [unreadable]