CNRI is proposing to develop a framework and a related set of infrastructural tools that will greatly improve the ability of research organizations to register scientific data sets, either those they hold directly or those which they have funded and which are held elsewhere, and expose them for discovery, analysis, and further processing. The project will build the tools and low-level APIs to be suitable for use by data producers as well as organizations that are expert in metadata and data organization.<br/><br/>There is no widely adopted infrastructure currently in place for sharing research data. Individual pieces certainly exist, especially within given domains, but transparent and seamless sharing of scientific data requires a level of standardization and acceptance that simply doesn't exist today. To realize the potential of widely available scientific data, it must be discoverable, reference-able, and understandable, and it must be so without the investment of enormous amounts of time and effort on the part of those who are providing the data or those consuming the data. Research institutions currently expose their data through institution-specific web sites and APIs. The PI propose to build a pair of registries that will enable the use of a common API as well as the ability to federate registries across institutions when it makes sense, without requiring the existing underlying storage and management systems to change. We also propose to design basic metadata schemas to be used in those registries. <br/><br/>The first of the two registries is a metadata registry in which data sets can be registered and described. A common API will be built both for the registration process as well as for access to the resulting metadata objects. Each metadata object and, if required, each data set, will be given a unique, persistent identifier. These identifiers will resolve to the metadata objects and data sets respectively and their assignment will be part of the deposit API. We will also enable related objects to be associated with each other through the registry and through identifier resolution, depending on the specific cases in hand. This will be transparent to users of the access API. <br/><br/>The second of the two registries is a type registry. The metadata objects and data sets will each be typed and the type registry will provide the information needed to decipher those types. The goal is to be able to answer the question of, given a specific identifier or piece of data, what does it represent and how should I interpret it. This interaction will be made as transparent as possible to the access API. The interaction between these two registries is key to the proposed framework. <br/><br/>The proposed deliverables will include an open source release of the metadata registry and the type registry software, the basic metadata schemas applicable for those registries, and a prototype service that demonstrates the infrastructure capability by federating research data from at least two sources.