The present invention relates to a system and method for design and execution of life science processes, in particular, design and implementation of physical and digital processes via automated tools.
Researchers across multiple disciplines in the life science domain have their own custom workflows which have been developed and fine-tuned to their requirements. Though these workflows vary across disciplines, they can be simplified and grouped into the following stages—
If the outcome is successful, the researcher publishes the results. If the outcome is unsuccessful or inconclusive, the researcher either goes back to Stage 1 to read and refine the process and repeats the rest of the stages. These 5 stages depending upon the domain (e.g. Synthetic Biology) are sometimes referred to as the Design, Build, Test and Analyse (DBTA) cycle or the Design, Build, Test and Learn (DBTL) cycle. Irrespective of how these cycles are referred to in different domains, Stage 1-5 are commonly observed across all domains. Every cycle of this workflow needs to be meticulously documented for troubleshooting and ensuring reproducibility. However, this is very difficult and it is only recently that researchers have begun to address the elephant in the room which is non-reproducibility of research.
Research and development in life sciences is plagued with non-reproducibility of results. Non-reproducibility of research is a critical problem as it slows down the time to market of drugs, biomolecules, cleaner bio-alternatives in industry etc., and hinders human knowledge and progress. There are studies and surveys which state that less than 70% of the researchers failed to reproduce other researcher's work [1]. We are losing hundreds of billions of dollars to non-reproducibility annually, and therefore it is critical that we develop novel tools and approaches to solve this issue.
There are many reasons which cause non-reproducibility and the reasons can also be very specific or unique to the type of research i.e. wet-lab or computational research [2], and specific to the domain of research e.g. psychology [3], neuroscience [4], cancer biology [5] and drug discovery [6]. For research being done in bio-medical, biological and chemistry and its respective sub-domains which include both wet-lab and computation type of work, the primary causes of non-reproducibility are can be attributed to the following:
There are multiple tools available which capture one or more stages in different combinations. These tools can also be integrated together to form an end-to-end custom solution. However, working with these tools is cumbersome for users as they are difficult to set up and not seamlessly connected which forces the user to modify their natural research workflow to adapt it to the available solutions. This discourages the users from using tools which increases non-reproducibility. Furthermore, the tools are also very expensive and difficult to maintain. Some tools also require system administrators for setup and maintenance which adds to the cost.
Lack of adoption of common standards/format for information storage—Different tools store information in different proprietary formats which makes it very difficult for users who are working with two different tools to exchange and work with the data. This silos users based on the tools they use and exacerbates the non-reproducibility problem. There are continuous attempts to use open-source standards but such standards have limited adoption because of a very high cost of switching from legacy solutions [7].
Limitations of information sharing medium—Currently research information is disseminated by publishing in scientific journals. These journals put multiple restrictions on the users in terms of publishing formats and article size in page numbers and words which limits the users from sharing their complete research which is critical for reproducibility. Even though journals are evolving their tools, the new tools push the burden on the users to store different types of data in different tools like university or institute specific repositories, public repositories like gitHub (for code) or inside supplementary information which is attached to journal articles. It becomes a tedious task for interested researchers to find and access this data on multiple platforms. The tools are not always easily accessible and the information stored loses context when stored across multiple tools.
Multiple Stakeholders with limited cross-domain knowledge—Research is increasingly multi-disciplinary and involves researchers, technicians from various backgrounds to work together and have an understanding of each other's work. Since, different disciplines use different tools which are preferred in their domains e.g. Perl programming for bioinformatics, Python and C is preferred by computational scientists, R is preferred by statisticians, it becomes very difficult to share, investigate and reproduce other user's research work.
Lack of common infrastructure—Different users require different wet-lab (labware, instrumentation etc.) and different computational (Operating Systems, libraries, applications, compute and storage capacity) infrastructure to perform their research. It is very difficult for researchers to be able to reproduce research, as the research specification is tied to infrastructure used. Translating or porting research to other platforms/infrastructure is a tedious task and requires a lot of effort in optimisation.
Communication gap—Research is a global initiative and researchers globally publish their research work and progress in scientific journals. The de-facto standard of written research communication is in Natural Languages like English. There are also multiple other languages in which research is published like Japanese, Mandarin and Spanish. Natural languages (like English) are not the best medium for communication, as they can be easily misinterpreted. The communication can be incorrect depending on the researcher's command over the language (eg. native and non-native speakers), ability to articulate an idea, difference arising when translating from their native language to English. This problem is further amplified when technical and non-technical information is intentionally or unintentionally omitted from the communication. This leads to incomplete information and makes it very difficult for other researchers to reproduce a research work and further build on it.
Formal languages (e.g. XML, JSON), programming languages (eg. C, C++) and other high-level languages (e.g. Python, Domain Specific Languages) are useful for communicating accurately with hardware. However, the use of such languages to communicate research methodology and results would be less than ideal. These languages are accurate but very verbose, so it is a trade-off between case of use and accuracy [8].
Further, it is pertinent to note that current specification format tools of wet-lab and computational experiment designs available are very hardware dependent. The names of the physical instruments and software applications/libraries are required to be specified in the experimental design along with parameters and terminology unique to them. This is a major problem when trying to reproduce the experiment design using a similar but different instrument or application. For example, certain instructions like ‘Pipette 2 ml of Sample from Eppendorf A to 1.5 ml Eppendorf B’—here, Pipette is the name of the instrument but for it to be clearly understood/interpreted by other stakeholders it should be replaced with the term ‘Transfer’. Further, ‘Eppendorf A and B’ refer to tubes containing the samples. To ensure that the experiment designs are not misinterpreted by other people, it is required to use terminology which is easily understood by people viewing and using the specification format. Currently, there are no such standards existing. Notwithstanding the above, in order to ensure that the designs are future proofed viz. they do not get obsolete as hardware evolves (Ex.: Milifluidic, microfluidic, nanofluidic platforms), it is required to abstract the experimental design from the execution hardware. This will ensure longevity and portability of experimental designs.
Further, use of natural language for communicating research, especially experimental designs are not optimal. There have been attempts to use graphical specification but it's limited to specific stages and has a very narrow objective [9]. We critically need an unambiguous method of communication of different stages of the research workflow.
It is thus the objective of the present invention is to provide a system that overcomes the above issues, problems and drawbacks of the existing tools. Further described herein in relation to one or more embodiments, are the methods and systems are provided for generating entity state-based stage-wise formal specification of processes and a method thereof.
One of the primary objectives of the present invention is to reduce research non-reproducibility using an easy-to-use graphical research specification system that allows researchers to accurately specify their research in different stages of their workflow with minimal effort. The specification system addresses all the previously described causes of non-reproducibility of research including lack of adequate tools, lack of common standards/format for information storage, limitations of information sharing medium, multiple stakeholders with limited cross-domain knowledge, lack of common infrastructure and communication gap.
Another objective of the present invention is to provide a computer-implemented system and method for generating graphical formal specification based on entity states for each stage of any life science processes.
Another objective of the present invention is to provide a graphical specification system focussing on the entity and the automatic generation of its subsequent states upon application of any change rather than the changes and their processes tied to specific hardware.
Yet another objective of the present invention is to provide a graphical specification system that allows any type of entity viz. physical materials and digital files unlike other existing specification systems which allow specification of only one or the other type of entity.
Still another objective of the present invention is to provide a graphical specification system that aids in debugging/troubleshooting and reproducing the experiment in different stages of the workflow.
The present invention has overcome the problems associated with the prior arts by providing methods and systems for reproducible and scalable process workflows. Accordingly, the first aspect of the present invention provides a method for generating graphical specification of process design, planning, execution and analysis stages of a plurality of life science processes, for each such stage of the plurality of life science processes. The said method comprising the steps of specifying workflow of the process in terms of a plurality of entities and/or batches of a plurality of entities wherein entities include physical materials (including reagents, patient samples, cells, tissues, organs, animals, chemicals etc., in solid or liquid form) and digital files (including images, genome sequences, protein sequences etc.); applying information causing state change to the atleast one plurality of entities and/or batches of a plurality of entities to form its respective new entities with a modified state by way of specifying change specific parameters change specific parameters (including quantity manipulation, transformation and measurement for physical materials, and data manipulation and visualization for digital files); and generating each of the plurality of processes of their respective stages in a graphical specification comprising a plurality of layers with layer-specific information, each said layer further includes a plurality of nodes and edges, wherein layer represents each stage of the process, node of any shape or size represents one entity and/or a batch of multiple entities including their states; and edge represents information causing the state change in the entity and/or batch of multiple entities.
The method further enables at least two processes to be combined to form a complex process by merging and/or linking their boundary nodes for arrangement of the said processes in sequential order or to create a single process; at least one step in the at least one process of the plurality of processes to be traversed by a certain pre-selected time point to minimize its execution time; and at least one process is automatically translated to a preferred natural language for further processing by a human.
The second aspect of the present invention provides that nodes of the graphical specification are mapped to materials and data, and edges are mapped to compatible instruments and software applications during the planning stage of the process.
According to the third aspect, instrument-specific execution instructions are automatically sent to instruments during process execution so that configurable virtual machines are deployed on the cloud or locally for applications. The entire virtual machine is saved for future use along with data, dependencies and environment. Further, all data generated during the process execution is stored in the context of the respective edge and is represented on the node. The said data can then be visualised and analysed in the analysis stage in the analysis section using configurable virtual machines.
Fourth aspect of the present invention provides a system for generating graphical specification of process design, planning, execution and analysis stages of a plurality of life science processes, for each such stage of the plurality of life science processes.
The invention is further described in the detailed description that follows, by reference to the noted drawings by way of illustrative embodiments of the invention, in which like reference numerals represent similar parts throughout the drawings. The invention is not limited to the precise arrangements and illustrative examples shown in the drawings:
Detailed embodiments of the present invention are disclosed herein with reference to the drawings. However, it is to be understood that the disclosed embodiments are merely exemplary of the invention, which can be embodied in various forms. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present invention in virtually any appropriately detailed structure. Further, the terms and phrases used herein are not intended to be limiting but rather to provide an understandable description of the invention.
The present invention discloses a graphical specification system enabling users to specify their research workflow and capture such detailed information with a focus on crucial stages of life science processes viz. experiment design, planning, execution and analysis. Currently available approaches are limited to using text editing tools for research workflow specification. However, the present invention proposes an entity-state change based graphical system for specification of experimental designs.
According to one of the embodiments of the present invention, the system enables users to specify their experimental designs based on/using a plurality of entities (101) and a plurality of state changes (102) that are applied to the respective entities as shown in
According to another embodiment of the present invention, the plurality of entities that are specified in experimental designs as above are represented as ‘nodes’ in the graphical specification format, which are of any shape and/or size. A node/vertex is used to represent a single entity or a batch of multiple entities and their states. The shape of the node is customizable, for example, square, circle and polygon. Further, state changes (102) being applied to respective entities automatically creates a new entity with a modified state (101′).
Such state changes (102) for a physical material can be categorized into quantity manipulation, transformation and measurement. Quantity manipulation includes, without limitation, transfer, adding and subtracting similar or dissimilar quantities of the same or different materials. Transformation includes, without limitation, moving, heating, cooling, incubation and mixing materials. Measurement is used to investigate and estimate different biological, chemical and physical properties of materials and includes, without limitation, weighing, imaging, spectroscopy, sequencing and calorimetry. Changes for a digital file includes data manipulation and visualization wherein data manipulation includes, without limitation, data extraction, wrangling, cleaning, preparation, statistical analysis; and said data visualization further includes generating figures of biological, chemical, medical and experimental data in the form of sequences, structures and models, to identify trends to gain a better understanding of the data. These changes (102) are represented by edges in graphical format, and require change specific parameters to be specified to change the state of an entity.
Edges connect entities to their states, and they represent information which causes the state change in the entity material. Entities can have multiple outgoing and incoming edges connecting to their respective new states.
It is to be noted that the present system focuses on the entity (node) and the automatic generation of its subsequent states upon applying a change (edge). Other specification formats focus on the change (edge) and their processes which are usually tied to specific hardware. The advantage of using an entity-based graphical specification format is that it is independent of hardware (containers, instruments and equipment) used for executing the process. Here, the hardware is mapped separately at a later stage. As a result, the present system allows different types of execution such as manual, semi-automated and automated, depending upon the availability of compatible hardware and drivers.
According to another embodiment of the present invention, entities viz. nodes (201) and their respective state changes viz. edges (202 and 203) together represent a process (200) in a graphical specification [Refer
In one of the illustrations as shown in
In
Further,
According to another embodiment, certain processes are modular in nature viz. a process can be attached to other processes to form a longer complex process. The present system allows only two processes to be attached at a single time. Processes are combined by attaching them to each other by their respective boundary nodes. According to such embodiment, boundary nodes are of two types viz. (i) starting/parent/zero/initial state nodes and (ii) child/edge/final state nodes. As the name suggests, starting/parent/zero/initial state node is a node that is present at the beginning of a process; and child/edge/final state node is a node that is present at the tail end of a process. A parent node of one entity can be switched/replaced with another entity of the same type i.e. physical entity with a physical entity and a digital entity with another digital entity. Depending on the process requirements, the attachment as suggested above is done by way of merging or linking two processes.
With reference to
In accordance with yet another embodiment of the present invention, depending on the requirements, users will be able to time-order a process to minimize the time of execution or to complete/traverse different steps involved in a process by a certain pre-selected timepoint. Such time-ordering is done using standard methods of calculating the duration required for different steps and trying to minimize or maximise the process traversal time. The processes can be traversed/executed subject to conditions which are captured as a part of edges. The conditions can be parameters which need to be satisfied or met or it can be explicit approval required by a user for further traversal of the process. The processes can be automatically sorted to minimize or maximise time of traversal/execution. This is useful to minimise time of execution for complex long processes which have multiple branches.
In accordance with yet another embodiment of the present invention, the graphical specification system as disclosed removes the requirement of specification of hardware during the experimental design specification stage of a process (viz. Stage 2). It allows users to map the hardware of their choice in the planning stage (viz. Stage 3). Depending upon the availability of hardware infrastructure (viz. instruments, software applications, compute and storage), users can map compatible hardware to different steps in a process to execute their experimental designs seamlessly. In other words, during process planning, nodes can be assigned/mapped to materials and data and edges can be mapped to compatible instruments (e.g. thermocyclers, sequencers) and software applications. Any extra parameters required for execution are added during the planning and/or execution stage of such process (viz. Stage 4). During process execution, instrument-specific execution instructions can be automatically sent to the instruments and configurable virtual machines can be deployed on the cloud or locally for applications. The entire virtual machine is saved for future use along with data, dependencies and environment. The data can then be visualised and analysed in the analysis stage in the analysis section using configurable Virtual machines.
The present system enables hardware specific input requirements to be prompted to the researcher (technical operator) performing the experiment. This allows experimental designs to be executed manually by a human operator, or in a semi or fully-automated manner on a robotic platform. Another advantage of this hardware abstraction is to be able to capture information in the context of the hardware used which helps in troubleshooting or debugging the experiment.
Still another embodiment of the present system enables all data generated during the experiment to be captured and stored in the context of the step it was generated. The graphical specification system takes into account the stages of the research workflow and attaches stage specific information to the process. This makes it very useful for debugging/troubleshooting and reproducing the experiment in different stages of the workflow. Researchers can analyse the data (viz. Stage 5 of the process) generated using software applications of their choice with customisable compute configuration like CPU capacity and memory capacity. All the resulting data from the analysis is further automatically y stored in a resulting node.
An illustration of stage-wise representation of a process on the present system is embodied in
In the planning stage (800), 50 μl of Sample A (801) is amplified under a PCR machine (802), the resulting sample A′ (801′) of which is further subjected to a spectrophotometer (803). Here, 50 μl of Sample A of planning stage (801) is mapped to Sample A of design stage (701), and it will be amplified using the PCR machine (802) which is mapped to Thermocycle (702), the resulting sample A′ (801′) is further subjected to a spectrophotometer (803) which is mapped to the Measure OD (703), which results with the final Sample A″ (801″).
In the execution stage (900), 50 μl of Sample A of execution stage (901) is amplified using the PCR machine (902) with the required parameters, which results in an amplified (state changed) Sample A′ (901′). The Sample A′ of execution stage (901′) is further measured using a spectrophotometer with the required parameters (903) which results with the final Sample A″ (901″). The final step also results in a data file/s (901″) which is stored in the context of spectrophotometer edge (903).
In the analysis stage, data file/s generated as result of spectrophotometer state change (903) is analysed. The data file/s represented as a single node (1000) and other relevant/reference files are analysed using a software application of choice (1003) which results in a new node (1002/1001′) where all the modified and any new files generated are stored.
Each stage (Stage 2 to Stage 5) of the research workflow is a separate layer appended to the graphical specification system of the present invention. Attaching a layer to the graphical specification system automatically captures layer-specific information with very granular context, which aids in reproducibility of the research work. This feature is being explained with reference to certain exemplary illustrations as below.
In the Experimental Design layer (viz. Stage 2), along with experimental design, in particular, process details of materials/data and state change applied, versioning of the processes are maintained. Versions behave like standalone and independent experiment designs. Each version has its own set of layers.
In the Planning Layer (viz. Stage 3) for every experimental design version selected to be executed, details of mapped materials/files including booking volumes and mapped hardware including booking durations are captured and maintained as shown in
Further in the execution Layer (viz. Stage 4) for every planned experimental design version, its execution details are captured and maintained.
In the Analysis Layer (viz. Stage 5) all the data generated in the execution layer is highlighted on the graph and is available to be analysed using different tools.
According to still another embodiment of the present invention, the system automatically translates the process specified in the graphical specification format to natural language (viz. English or any other language of choice) and/or formal language/code (viz. XML, Python, JavaScript, etc.) for further processing by a human or a compatible robotic platform respectively. Each step of the process is translated which contains all the information needed to perform the experiment.
According to the other aspect of the invention, the present invention also discloses a computer-implemented method enabling users to specify research work flow and capture such detailed information with a focus on crucial stages of life science processes viz. experiment design, planning, execution and analysis. In particular, the present invention further discloses an entity-state change based graphical method for specification of experimental designs.
In the process design stage, users may specify their experimental designs based on/using a plurality of entities and a plurality of state changes that are applied to the respective entities. In one of the embodiments as in
In another embodiment of the present method (Refer
In yet another embodiment of the method of creating a process (4000) as in
In yet another embodiment (refer
Still another embodiment of the method (as in
Still another embodiment of the method (as in
| Number | Date | Country | Kind |
|---|---|---|---|
| 202141045332 | Oct 2021 | IN | national |
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/IN2022/050885 | 10/4/2022 | WO |