The current disclosure relates to ensuring the integrity of data transmitted from one location to another, e.g., to ensure that the transmission did not introduce any errors in the data or that the data was not otherwise tampered with.
Data transmitted from one location to another may need to be verified to ensure that the transmission did not introduce any errors in the data. One example of such data transmission occurs in clinical studies, also known as clinical trials. These studies are typically conducted to evaluate the safety and efficacy of medicines, medical devices, or other medical treatments by monitoring and studying their effects on groups of people. Using clinical studies, doctors and researchers may find new and better ways to prevent, detect, diagnose, or treat diseases. A clinical study is often sponsored by a drug manufacturer (sometimes called the “sponsor”) and may be carried out by a contract research organization (“CRO”), and may involve numerous entities such as hospitals, doctors (principal investigators), nurses, patients, and site monitors. Findings or results from these clinical studies may then be sent by the sponsor to regulatory agencies such as the United States Food and Drug Administration (“FDA”) or the European Medicines Agency (“EMA”).
During the course of a clinical study, a large amount of clinical data and information may be gathered at various investigator sites, such as hospitals and clinics, by personnel such as doctors, patients, nurses, and technicians. These data may be inputted into a system where they may be recorded and stored. These data may then be transmitted by the sites to, for example, CROs, sponsors, and/or regulatory agencies. In some cases, an investigator site may transmit the data to a CRO, which may in turn forward that data to a sponsor that may finally submit the data to a regulatory agency, such as the FDA or EMA.
Where considered appropriate, reference numerals may be repeated among the drawings to indicate corresponding or analogous elements. Moreover, some of the blocks depicted in the drawings may be combined into a single function.
In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the invention. However, it will be understood by those of ordinary skill in the art that the embodiments of the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to obscure the present invention. The present invention is not intended to be limited to any particular operating system, software application, or market. Additionally, any examples of particular software applications or markets used herein are included for illustration purposes and are not intended to be limiting.
With the advent of computer and network technologies, data may be collected using electronic means during the course of a clinical study. Electronic data collection may present challenges in ensuring that the data transmitted from one organization to another are accurate and valid. It may be a challenge to keep track of updates or changes made to the clinical data over the course of a clinical study. It may also be difficult to trace back to such updates and changes that may be made at a given time during the clinical study.
A regulatory agency does not generally have the ability to accurately and rapidly assess whether the data that it receives from a life sciences company, such as a drug sponsor, for regulatory purposes have been altered in any way. For example, the FDA may receive, at the end of a clinical study, a copy of the data from the sponsor, which certifies that the data are as accurate as the data collected at the source. However, even though current clinical applications may include auditing capabilities, it may be difficult (if not impossible) for the FDA to fully verify quickly whether the data have been altered, either inadvertently or intentionally, by the sponsor or someone else in the data transmission chain. Thus, a regulatory agency would like to ensure there has not been any data tampering, corruption, or change between the time the clinical data were collected and the time when it receives the data. Regulatory agencies also often require site personnel to certify at the end of a study or when a patient completes his or her participation in a study that the data transmitted from the site to the sponsor are the same as the data that were entered by site personnel into various eClinical systems during the course of the study, i.e., that the site has been in control of its data throughout the process of data capture, cleaning, and submission to the agency.
A system for ensuring that clinical data submitted to a regulatory agency are accurate and valid has been developed. This system may collect data from a clinical study and then may apply an algorithm to the stream of collected data to generate a single number representative of the collected data stream. The collected data may then be transmitted to another entity, such as a sponsor, which then prepares a submission to the regulatory agency in support of regulatory approval of the item being studied. The submission may include the sponsor's version of the collected data. The regulatory agency may then verify that the data from the sponsor are the same as the data collected during the study by applying the same algorithm to the sponsor's data and comparing the representative number from that algorithm to the representative number previously generated. If the representative numbers differ, the regulatory agency knows that the data from the sponsor are not the same as the data transmitted to the sponsor. The system may also be used by site personnel to verify that the data the site generated are being transmitted to the sponsor and the regulatory agency.
The algorithm applied to the data streams may be a hashing algorithm and the single number generated that is representative of the data stream may be a hash number. Generally, hashing is a transformation of a set of data into, for example, a value of a pre-determined length that reflects that set of data. A set of data that may be hashed includes, for example, a string or a page of alphanumerical characters, an entire electronic data file, and an electronic form with multiple fields. Hashing algorithms that may be used in conjunction with this system may include, but are not limited to, the MD5 algorithm, the MD6 algorithm, and customized hashing programs. Hashing the data stream allows for much more rapid verification of data integrity than comparing the two sets of data line-by-line or field-by-field, which may be time consuming, cost prohibitive, cumbersome, and error prone.
A further feature of the present invention is the ability to take into account all of the information related to a set of clinical data, which information may be represented by a set of audits. As used herein, an audit may be a record of a transaction occurring at one or more clinical data sources. An audit may include clinical data, operational data, or both, generated as a result of the transaction executed at the data source. Clinical data may include height, weight, blood tests, blood pressure, activity metrics, glucose levels, ECG data, and other pharmacokinetic and pharmacovigilance data. Operational data may include time stamps, vector stamps, and, more broadly, causality-determining markers associated with an executed transaction. Operational data may also include data regarding what action was taken, who took the action, the identity of a device used to take the action (e.g., record some data), on whose behalf the action was taken, when the action was taken, what was changed from a previous state, the reason for the change, and what other audits may be related to it (e.g., identified by transaction ID), along with other information. (An “action” as used herein may include recording, calculating, converting, or transmitting data, and may be a subset of or coextensive with a transaction.) Audits may ultimately provide a permanent and indelible record, in keeping with the regulatory requirements that govern many clinical study systems. Thus, embodiments of the present invention involve hashing audit streams rather than just clinical data streams.
The system is not limited to ensuring the integrity of data submitted to a regulatory agency from a sponsor in the context of a clinical study, but may encompass situations in which the integrity of data that are transmitted to multiple entities needs to be ensured.
Reference is now made to
Data sources 110 may include sources that provide, for example, electronic data, medical image data, medical instrument data, blood test results, pharmacy records, various clinical analysis data, and scanned paper document data, just to name some of the types of sources. More specific examples of such data are patient x-ray images or CT scan images from an imager, a patient's body temperature measured from a digital thermometer, various blood measurements obtained from a digital blood analysis machine, a pharmacy record obtained from a pharmaceutical dispensing management system, and a physician's analysis scanned from a paper-based document. Besides patient-related data, there may be other data related to a clinical study, such as operational data, summary data, and payment data.
In a clinical study, such data may come from patients, principal investigators, nurses, technicians, and clinical research associates (CRAs), among others. eClinical systems 120 may include electronic data capture (EDC) systems, electronic medical records (EMR) systems, electronic health records (EHR) systems, eCRF (electronic case report form) systems, clinical data management (CDM) systems, randomization systems, coding systems, health or activity tracking devices, and ECG and glucose monitors, among other electronic and/or web-based systems used for the capture of clinical trial data.
Audit system 130 collects audits from the various eClinical systems and, because audits may be used as a permanent record of the clinical study, may format the audits in accordance with rules provided by the data checker. In one embodiment of the present invention, audit system 130 may be operated by a third party (that is, a party that is different from final data provider 150 and data checker 160) that collects and assembles the audit stream and then transmits it to data provider 150 and to data checker 160, along with audit stream hash 145. The third party may be considered to be a “trusted” or “independent” third party by data checker 160.
Reference is now made to
Each of the eClinical systems may produce audits and transmit them to audit system 230. The audits may be appended by audit system 230 into audit stream 235, which may then be input to hash number generator 240, producing audit stream hash 245. Audit system 230 may then provide audit stream 235 to sponsor 250, possibly along with data stream 238. Audit system 230 may provide audit stream hash 245 to regulatory agency 260. Sponsor 250 may provide a package to regulatory agency 260, so as to meet the requirements of the regulatory agency with respect to, for example, approval for a drug based on the clinical study. This package may include sponsor audit stream 255 (and may also include a sponsor data stream (not pictured)). Regulatory agency 260 then may review the package submitted by the sponsor. If the regulatory agency wants to quickly determine whether sponsor audit stream 255 is the same as audit stream 235 that was actually produced during the clinical study, regulatory agency 260 may hash sponsor audit stream 255 using hash number generator 270 to generate sponsor audit stream hash 275 and may then use comparator 280 to compare audit stream hash 245 and sponsor audit stream hash 275. Discrepancies in the hash numbers indicate differences in the audit streams, which may indicate errors in the data or that at least one part of the data from the study has been inadvertently or intentionally changed or tampered with.
In a manner similar to the way the regulatory agency may verify data integrity by using the hashing techniques of the present invention, so too may site personnel, such as a doctor, principal investigator, or other health care professional who may have input the data, use such hashing techniques, as illustrated in
As was also discussed with respect to
The blocks shown in
The benefit of the type of hashing used in the present invention is that if there is any tampering with the data and/or audits, a single hashing of the altered audit stream will uncover such tampering because it will differ from the audit stream hash. That situation is demonstrated in
Sponsor 250 may receive audit stream 235 and notice that the SBP readings for patient P are not favorable. Sponsor 250 may then attempt to modify the SBP readings of patient P to follow trace 402, shown in graph (b), that removes episodes A and B. (Graph (c) shows both traces superimposed.) Trace 402 would then be included in sponsor audit stream 255. Sponsor 250 may then provide sponsor audit stream 255 to regulatory agency 260.
Upon receiving sponsor audit stream 255, regulatory agency 260 may then perform a hash of sponsor audit stream 255 and compare sponsor audit stream hash 275 to audit stream hash 245 and determine at 295 that the data were actually changed.
Examples of appended data streams are shown in
Next, in operation 635, regulatory agency 260 may compute the hash number of sponsor audit stream 255 using hash number generator 270 and compare that hash number to audit stream hash 245 in operation 640. If there are any discrepancies detected in operation 695, then the regulatory agency knows that the audit stream has been altered or that there are errors in the data.
Besides the operations shown in
Data and audits from a clinical study are only one example of how the invention may be used—other scenarios exist in which clinical data may need to be verified. One scenario is ensuring quality in pharmaceutical manufacturing facilities, where certain data, such as temperature, pH, etc., may need to be collected for each bottle, and the manufacturing facility keeps audit records that may be checked later by an assurance agency. Another scenario is airline maintenance, where records may need to be kept to ensure ongoing quality and to determine whether anything wrong occurred in the case of an investigation. More generally, the present invention may be used in industries and scenarios in which there is a requirement (whether legal or not) to keep data and records.
In addition, the present invention may also be used to operate on data that do not comprise the complete data stream from a study. Hash numbers of pieces of data or of cumulative data may be transmitted to the data checker, for example, during a study, and then the hash number may be updated at a different time, for example, the next day. Such updates may occur regularly, at consistent intervals, or periodically, at varying intervals. Because the updated data or audit stream may include more bits, the hash number becomes stronger. The data and audit streams may also have associated time stamps, further strengthening the resulting hash numbers.
The present invention may keep track of and record every data entry event, including adding, modifying, and deleting data. The audit stream includes the data plus all the details about the data, such as operational data and metadata. By assembling the audits into a cumulative audit stream and then computing a hash number based on the cumulative audit stream, the present invention allows a data checker to rapidly verify the integrity of clinical data it receives. In addition, the present invention accumulates audits from a number of clinical applications (e.g., eClinical systems) and hashes the resulting cumulative stream, whereas prior auditing capabilities were generally limited to that specific application, with no comprehensive auditing capability.
Aspects of the present invention may be embodied in the form of a system, a computer program product, or a method. Similarly, aspects of the present invention may be embodied as hardware, software or a combination of both. Aspects of the present invention may be embodied as a computer program product saved on one or more computer-readable media in the form of computer-readable program code embodied thereon.
For example, the computer-readable medium may be a computer-readable storage medium. A computer-readable storage medium may be, for example, an electronic, optical, magnetic, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof.
Computer program code in embodiments of the present invention may be written in any suitable programming language, including C, Objective-C, C # (c-sharp or .NET), JavaScript, Ruby, and others. The program code may execute on a single computer or on a plurality of computers. The computer may include a processing unit in communication with a computer-usable medium, wherein the computer-usable medium contains a set of instructions, and wherein the processing unit is designed to carry out the set of instructions.
The above discussion is meant to be illustrative of the principles and various embodiments of the present invention. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
This application is a continuation-in-part of and claims priority from U.S. application Ser. No. 14/140,734, filed Dec. 26, 2013, the entirety of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | 14140734 | Dec 2013 | US |
Child | 16991915 | US |