DESCRIPTION (provided by applicant): Our goal is to advance our understanding of predicted and/or unexpected DNA-binding proteins through the identification of their preferred binding sites by the novel combinatorial approach, Restriction Endonuclease Protection, Selection and Amplification (REPSA). While the Human Genome Project has yielded a wealth of information for both the human genome and other model organisms, much remains to be determined regarding individual genes and the biological roles played by their encoded products. For example, while the bacteria Escherichia coli strain K12 has 4364 open reading frames, only about half of these genes have been well characterized by genetic, biochemical or molecular biological means. Many of the known genes (260+) encode for proteins that presumably bind specific DNA sequences. However, for most of these proteins (>200) their preferred DNA-binding sites have not been determined empirically. We have developed a combinatorial approach, REPSA, which does not require any prior knowledge of a ligand in order to determine its preferred binding site on duplex DNA. Thus we hypothesize that REPSA can be used to identify the preferred DNA- binding sites of uncharacterized proteins in the model organism E. coli K-12. We propose the following four Specific Aims: (1) Identify preferred DNA-binding sites using REPSA with inputs ranging from purified E. coli proteins to bacterial extracts, massively parallel DNA sequencing, and bioinformatics analyses (e.g., Multiple Expectation Maximum for Motif Elicitation, MEME). (2) When necessary, identify proteins bound to identify preferred DNA-binding sites by DNA- affinity/magnetic bead capture and peptide fingerprinting. (3) Validate protein-DNA binding and determine binding specificity using recombinant proteins, point-mutated oligonucleotides, and fluorescence polarization binding assays. (4) Propose potential biological functions for these proteins and their binding sites through binding site mapping on the E. coli genome using Find Individual Motif Occurrence (FIMO) and annotations in available databases (e.g., EcoCyc, EcoliWiki, EcoGene, RegulonDB, UniProtKB). We expect our research to ultimately catalyze numerous studies by us and other laboratories leading to a better understanding of E. coli biology at a molecular level and provide a framework for similar studies in other organisms.