DESCRIPTION (provided by applicant): The widespread adoption of picture archiving and communications systems (PACS) in radiology and the implementation and deployment of the DICOM communication standard represent an opportunity to link multiple PACS at multiple sites into a distributed data warehouse of great potential utility for investigators in oncology research and epidemiology. Where the federal HIPAA privacy regulations have largely been seen as an emerging impediment to oncology research from the creation, management and use of cancer registries to large-scale retrospective studies addressing rarer forms of neoplasia, in fact the digital nature of PACS-based imaging data lends itself to automated de-identification that could transform multiple distributed clinical information systems into a readily accessible treasure trove of research data that falls within the "safe harbor" provisions of HIPAA's privacy regulations. Our firm has developed a platform, originally intended for clinical use, to securely link multiple PACS and RIS from multiple vendors beneath a web interface giving users transparent access to a "virtual archive" spanning an arbitrary number of institutions. In this Phase I SBIR application, we propose to explore the feasibility of extending our system to grant researchers access to large volumes of dynamically de-identified imaging data while surmounting each of the major criticisms of the viability of such data for research purposes. We propose developing an open web-services architecture that will enable straightforward integration with any other information system and propose a design that adheres to existing industry standards while laying the groundwork for compliance with future standards and informatics initiatives. This study will also involve examining the regulation of re-identification through the use of threshold cryptography, as well as the feasibility of a probabilistic sampling search engine intended to prevent unauthorized identification of patients through multiple intersecting queries on narrowing criteria, while still permitting researchers to choose the appropriate resolving power of the engine to suit a particular investigation. These studies will include benchmarking the performance of these dynamic processes, quantifying the load they place on live clinical information systems, and optimizing the design to minimize such impact. Should feasibility be demonstrated, Phase II would involve a proof-of-concept demonstration across multiple academic medical institutions as well as steps to prepare for commercialization including indexing studies based on structured reporting and natural language processing, content-based information retrieval, refinement and usability testing of the web interfaces, and extension of the system to permit IRB-approved research on individually-identifiable data. Commercialization is expected as subscription service not unlike current bioinformatics databases, granting investigators access to a large-scale, globally distributed data warehouse comprised of participating PACS-enabled medical centers.