1. Field
The present disclosure relates to selection of enzymes for catalysing biochemical reactions, and more particularly to devices and methods for identifying enzymes for biochemical transformations.
2. Description of Related Art
Advances in metabolic and recombinant genetic engineering have facilitated novel synthetic pathways design. These developments have facilitated biosynthesis of chemicals. Industrially scalable organisms can be optimally designed for biosynthesis of molecules. Often, a proposed synthetic pathway comprises non-native molecules (not naturally observed in considered models system). Biological transformation of non-native molecules is challenging, as it requires selection and engineering of appropriate enzymes to improve thermodynamic and kinetic properties for industrial application. Given their structure and flexibility, individual enzymes are amenable to a degree of adaptation. This impacts efficiency of a process and thus the selection of appropriate enzymes are critical.
Favourability of an enzyme to perform a biochemical reaction is dependent on the enzyme's binding properties towards a given target molecule. Enzyme-target molecule binding is traditionally assessed through a 3-dimensional docking/molecular dynamics study. Application of the approach is computationally intensive and is thus not preferred for screening a large enzyme dataset. A “quantitative structure activity relationship” (QSAR) methodology can also be applied for selecting enzymes. Though rapid in processing, accuracy of the QSAR approach depends on the quality of the prediction model. To enhance the model performance, a well-represented training data set is needed. Thus, obtaining stable QSAR models for a diverse set of Enzyme Commission (EC) numbers is challenging.
Alternatively, a reaction similarity approach can be used for large-scale data size. Unfortunately, these approaches, though efficient, are limited in their accuracy.
Therefore, there is a need for a method to select and identify potential enzymes for biochemical transformation which is efficient as well as accurate.
The present disclosure provides methods and devices for determining one or more enzymes for biochemical transformations.
An embodiment of the present disclosure provides a computer implemented method of determining enzyme(s) for biochemical transformation(s). The method steps include receiving input of reaction(s) and/or target molecule(s) along with data associated with information regarding chemical conversion; determining functional region(s) and linker region(s) in the reaction(s) and/or the target molecule(s); scanning a transformation library for the determined functional region(s) to find similar functional region(s) within the transformation library; assigning the reaction(s) and/or the target molecule(s) to group(s) of the transformation library showing a high similarity of the functional region(s); computing a metabolite similarity score of the reaction(s) and/or the target molecule(s) with respect to reaction(s) of the assigned group(s); and identifying enzyme(s) associated with the reaction(s) of the assigned group(s) having a high metabolite similarity score.
Another embodiment of the present disclosure provides a computer implemented method of determining enzyme(s) for biochemical transformation(s) further includes statistically evaluating flexibility of the identified enzyme(s) for the input of the reaction(s) and/or the target molecule(s).
Another embodiment of the present disclosure provides a device for determining enzyme(s) for biochemical transformation(s). The device includes memory; and processor(s) operatively coupled to the memory. The processor(s) is/are configured to perform the steps of receiving input of reaction(s) or/and target molecule(s) along with data associated with their chemical conversion; determining functional region(s) in the reaction(s) or/and the target molecule(s); scanning a transformation library for the determined functional region(s) to find similar functional region(s) within the transformation library; assigning the reaction(s) or/and the target molecule(s) to group(s) of the transformation library showing a high similarity of the functional region(s); computing a metabolite similarity score of the reaction(s) or/and the target molecule(s) with respect to reaction(s) of the assigned group(s); and identifying enzyme(s) associated with the reaction(s) of the assigned group(s) having a high metabolite similarity score.
Another embodiment of the present disclosure provides a computer implemented method of generating a transformation library. The method steps include obtaining a plurality of reactions and enzyme(s) catalysing the same from knowledgebase as input; identifying transformation region(s) in molecule(s) participating in the plurality of reactions; extracting the identified transformation region(s) in the molecule(s) participating in the plurality of reactions; identifying functional region(s) and linker region(s) for each of the molecule(s) participating in the plurality of reactions based on the extracted transformation region(s); collecting the identified functional region(s) and associated linker region(s) of the molecule(s) participating in the plurality of reactions; selecting the functional region(s) of the plurality of reactions, wherein the functional regions comprise the collected functional regions of the molecule(s) participating in the plurality of reactions; selecting linker region(s) of the plurality of reactions, wherein the linker region(s) comprise the collected linker region(s) of the molecule(s) participating in the plurality of reactions; grouping the plurality of reactions based on similarity of the functional region(s) along with associated information together to create the transformation library; and identifying functional region(s) for each group from the functional region(s) of the reaction(s) comprising the group as representative functional region(s).
Yet another embodiment of the present disclosure provides a device for generating transformation library. The device includes memory; and processor(s) operatively coupled to the memory, the processor(s) is/are configured to perform the steps of obtaining a plurality of reactions and enzyme(s) catalysing the same from biochemical database(s) as input; identifying transformation region(s) in molecule(s) participating in the plurality of reactions; extracting the identified transformation region(s) in the molecule(s) participating in the plurality of reactions; identifying functional region(s) and linker region(s) for each of the molecule(s) participating in the plurality of reactions based on the extracted transformation region(s); collecting the identified functional region(s) and associated linker region(s) of the molecule(s) participating in the plurality of reactions; selecting the functional region(s) of the plurality of reactions, wherein the functional regions comprise the collected functional regions of the molecule(s) participating in the plurality of reactions; selecting linker region(s) of the plurality of reactions, wherein the linker region(s) comprise the collected linker region(s) of the molecule(s) participating in the plurality of reactions; grouping the plurality of reactions based on similarity of the functional region(s) along with associated information together to create the transformation library; and identifying functional region(s) for each group from the functional region(s) of the reaction(s) comprising the group as representative functional region(s).
The aforementioned aspects and other features of the present disclosure will be explained in the following description, taken in conjunction with the accompanying drawings, wherein:
The drawings described herein are for illustration purposes only and are not intended to limit the scope of the present disclosure in any way.
The embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. However, the present disclosure is not limited to the embodiments. The present disclosure can be modified in various forms. Thus, the embodiments of the present disclosure are only provided to explain more clearly the present disclosure to the ordinarily skilled in the art of the present disclosure. In the accompanying drawings, like reference numerals are used to indicate like components.
The specification may refer to “an”, “one” or “some” embodiment(s) in several locations. This does not necessarily imply that each such reference is to the same embodiment(s), or that the feature only applies to a single embodiment. Single features of different embodiments may also be combined to provide other embodiments.
As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless expressly stated otherwise. It will be further understood that the terms “includes”, “comprises”, “including” and/or “comprising” when used in this specification, specify the presence of stated features, integers, steps, operations, elements and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations and arrangements of one or more of the associated listed items.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Screening and selecting enzymes for synthetic biochemical reactions is challenging. The present disclosure provides for method and device of determining one or more enzymes for biochemical transformation.
The term “similarity” in the context of the present disclosure is referred to as chemical similarity or any other equivalent such as, but not limited to, similarity based on structure, functional groups, function, chemical/physical characteristics, etc.
Prediction of Enzymes for Reaction(s) and/or Target Molecules
Screening and selecting enzymes for synthetic biochemical reactions is challenging. The present disclosure provides for methods and devices for determining one or more enzymes for biochemical transformation.
The present disclosure provides embodiments that advantageously predict enzyme(s) for a given biochemical transformation. The prediction is made with respect to inputs, which could include data about reaction(s) and/or target molecule(s) along with information pertaining to their chemical conversion. These inputs are analysed to determine functional region(s) and linker region(s). The determined functional region(s) and linker region(s) are used to evaluate similarity against reactions in a transformation library and to identify the enzyme(s).
The flow diagram as given in
Identification of Functional Reqion(s) and Linker Reqion(s) of the Input(s) (Step 104):
(A) When Reaction(s) is/are Input
Determination of functional region(s) for the input reaction starts with identification of transformation region(s) in the molecule(s). Based on the identified transformation region(s), functional and linker regions are identified.
(B) When Target Molecule(s) is/are Input
Computation of Metabolite Similarity Score (Step 110):
In one of the embodiments for an input molecule, the molecule is represented as a function of two components, the functional region(s) and the linker region(s), given by
A=f(α,β) (1)
where,
Under this specific implementation, a metabolite similarity (MS) score between an input molecule A and a representative reaction within a group in transformation library is defined by
where,
This embodiment uses Tanimoto-coefficient-based similarity. Further, similarity could be assessed or determined through other equivalent metrics such as, but not limited to, root mean square deviation, equivalence overlap, etc., for structural similarity; dice, cosine, etc., for chemical similarity; feature based; etc.
Statistically Evaluating of Flexibility of the Identified Eenzyme(s)
The present disclosure also provides for statistically evaluating flexibility of the identified one or more enzymes for the input(s). Once the enzyme(s) from the list of enzymes associated with the assigned group having high metabolite similarity score(s) are identified, the flexibility of the identified enzyme(s) is evaluated by a statistical approach such as, but not limited to, Z-score, variation, dispersion, etc. Additionally, flexibility can be assessed through structural flexibility of the enzyme structure(s) through root mean square variation at each residual point.
An embodiment is illustrated in
In one of the embodiments, a functional similarity index (ξ) is used for assessing flexibility of the enzymes. The functional similarity index is computed using Z-score.
Functional Similarity Index (FSI) ξ is represented as given below:
where:
The present disclosure also provides for a device for determining enzyme(s) for biochemical transformation.
The processor(s) 804, as used herein, means any type of computational circuit, such as, but not limited to, a microprocessor, a microcontroller, a complex instruction set computing microprocessor, a reduced instruction set computing microprocessor, a very long instruction word microprocessor, an explicitly parallel instruction computing microprocessor, a digital signal processor, or any other type of processing circuit, or a combination thereof.
The memory 802 includes a plurality of modules stored in the form of executable program code which instructs the processor(s) 804 to perform the method steps illustrated in
The input receiving module 806 instructs the processor(s) 804 to perform the step 102 (
The functional and linker region(s) determination module 808 instructs the processor(s) 804 to perform the step 104 (
The transformation library scanning module 810 instructs the processor(s) 804 to perform the step 106 (
The group assigning module 812 instructs the processor(s) 804 to perform the step 108 (
The metabolite similarity score computing module 814 instructs the processor(s) 804 to perform the step 110 (
The enzyme identification module 816 instructs the processor(s) 804 to perform the step 112 (
In another embodiment, the processor(s) 804 is further configured to statistically evaluate flexibility of the identified one or more enzymes for the input of the reaction(s) and/or target molecule(s) by performing the step 714 (
The flexibility evaluation module 822 instructs the processor(s) 804 to perform the step 714 (
Transformation Library
The present disclosure also provides for a method and device for generating the transformation library.
Various reactions with their respective catalysing enzymes from various knowledgebases as input are obtained at step 902. Transformation region(s) in molecule(s) participating in the plurality of reactions is/are identified at step 904. The identified transformation region(s) in the molecule(s) participating in the plurality of reactions are extracted at step 906. Functional region(s) and linker region(s) for the molecule(s) participating in the plurality of reactions based on the extracted transformation region(s) are identified at step 908. The functional region(s) for a molecule comprises either its identified transformation region(s) or the transformation region(s) and region(s) of interest. A region of interest comprises one of an immediate neighbourhood and an extended neighbourhood of the transformation region(s). The linker region(s) comprises region(s) remaining after the identified functional region(s). The identified functional region(s) and associated linker region(s) of the molecule(s) participating in the plurality of reactions are collected at step 910. The functional region(s) of the plurality of reactions, wherein the functional region(s) comprise the collected functional region(s) of the molecule(s) participating in the plurality of reactions, are selected at step 912. The linker region(s) of the plurality of reactions, wherein the linker region(s) comprise the collected linker region(s) of the molecule(s) participating in the plurality of reactions are selected at step 914. The plurality of reactions based on similarity of the functional regions along with associated information are grouped together to create the transformation library at step 916. Finally, functional region(s) for each group is/are derived from the functional region(s) of the reaction(s) comprising the group as representative functional region(s) at step 918. In one of the embodiments, the functional region derived is a maximum common region of functional regions across all the reactions within the group.
The associated information comprises a list of one or more enzymes catalysing the reaction(s) and the extracted functional region(s) and linker region(s) of the reaction(s). The similar chemical transformation is identified by matching up the one or more functional regions of the input reactions.
Therefore, the transformation library includes a plurality of groups of chemical reactions wherein each of the groups includes chemical reactions undergoing similar chemical transformations along with the list of enzyme(s) catalysing each of the reactions as well as the functional and linker region(s) of each of the reactions. Each group has representative functional region(s). The systematic arrangement of groups in the transformation library makes them useful in terms of group assignment and deriving metabolite similarity scores as explained previously in the disclosure.
An embodiment of transformation grouping and transformation library creation is illustrated in
The device 1100 includes processor(s) 1104, and memory 1102 coupled to the processor(s) 1104.
The processor(s) 1104, as used herein, means any type of computational circuit, such as, but not limited to, a microprocessor, a microcontroller, a complex instruction set computing microprocessor, a reduced instruction set computing microprocessor, a very long instruction word microprocessor, an explicitly parallel instruction computing microprocessor, a digital signal processor, or any other type of processing circuit, or a combination thereof.
The memory 1102 includes a plurality of modules stored in the form of executable program code which instructs the processor 1104 to perform the method steps illustrated in
Computer memory elements may include any suitable memory device(s) for storing data and executable program code, such as read only memory, random access memory, erasable programmable read only memory, electrically erasable programmable read only memory, a hard drive, a removable media drive, e.g., for handling memory cards, and the like. Embodiments of the present disclosure may be implemented in conjunction with program modules, including functions, procedures, data structures, and application programs, for performing tasks, or defining abstract data types or low-level hardware contexts. Executable program stored on any of the above-mentioned storage media may be executable by the processor(s) 1104.
The input receiving module 1106 instructs the processor(s) 1104 to perform the step 902 (
The transformation region(s) identification module 1108 instructs the processor(s) 1104 to perform the step 904 (
The transformation region(s) extraction module 1110 instructs the processor(s) 1104 to perform the step 906 (
The functional and linker region(s) identification module 1112 instructs the processor(s) 1104 to perform the step 908 (
The functional region(s) and associated linker region(s) collection module 1114 instructs the processor(s) 1104 to perform the step 910 (
The functional region(s) and linker region(s) selection module 1116 instructs the processor(s) 1104 to perform the steps 912 and 914 (
The reaction grouping module 1118 instructs the processor(s) 1104 to perform the steps 916 (
The representative functional region(s) derivation module 1120 instructs the processor(s) 1104 to perform the steps 918 (
The present embodiments have been described with reference to specific example embodiments; it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the various embodiments. Furthermore, the various devices, modules, and the like described herein may be enabled and operated using hardware circuitry, for example, complementary metal oxide semiconductor based logic circuitry, firmware, software and/or any combination of hardware, firmware, and/or software embodied in a machine readable medium.
The present disclosure embodiments, by efficiently predicting a suitable enzyme for a particular reaction or molecule, facilitates the synthetic pathway design. This makes the introduction of non-native metabolites in the system more feasible by accurately suggesting enzymes capable of catalysing the same. This is useful in designing pathways which would yield the desired chemicals at commercially viable scale.
Enzymes: Enzymes are biomolecules which catalyse biochemical transformations.
EC Numbers: Enzyme Commission Numbers
Transformation: Bond rearrangements associated with biochemical/chemical reactions is termed a Transformation
Transformation region: Atoms undergoing either bond connectivity change or bond order change define a Transformation region
Number | Date | Country | Kind |
---|---|---|---|
3606/CHE/2014 | Jul 2014 | IN | national |
10-2014-0173233 | Dec 2014 | KR | national |
10-2015-0012314 | Jan 2015 | KR | national |