This invention relates to the field of Biology, Life Science, Computational Biology, Biocatalysis and Chemistry
Organofluorine chemistry has a significant impact on various aspects of everyday life and technology. The C—F bond is present in pharmaceuticals, agrochemicals, fluoropolymers, refrigerants, surfactants, anesthetics, material production, nutraceuticals, oil-repellents, and water-repellents, among other applications. Organofluorides constitute approximately 20% of registered pharmaceutical compounds since 1991 (Inoue M., et al., 2020), and about 16% of agrochemicals (Ogawa Y., et al., 2020). The strong binding nature of the C—F bond is highly desirable in developing industrial materials such as thermoplastics, elastomers, membranes, textile finishes, and coatings (Okazoe, T., 2009).
Several common APIs contain the fluorine (F—) ion, including Atorvastatin, known for reducing cholesterol and the associated risk of heart attack. Gefitinib is another molecule renowned for its anti-cancer properties, while Sitagliptin is a type 2 antidiabetic drug that lowers blood sugar levels in adults (
Enzymatic halogenation of organic compounds, including carbon-fluorine and carbon-chlorine bond formation, has been an active area of study. Enzymes such as fluorinases and chlorinases exhibit catalytic capabilities in this regard. Fluorinases, unlike chlorinases, possess an additional 21 amino acid region (AAKGGARGQWASGAGFERAEG) (Deng, H. et al., 2008). Among various enzyme-catalyzed synthesis methods, the direct formation of the C—F bond by fluorinase is the most effective and promising approach. Fluorinase can catalyze the synthesis of 5′-FDA from S-adenosyl-L-methionine (SAM), a natural substrate of the enzyme, and F— ion through nucleophilic attack, resulting in the formation of a C—F bond (
The objective of the present invention is to provide a computer-implemented method for engineering fluorinase enzymes towards the synthesis of fluorophenyl compounds.
By utilizing advanced modeling techniques and designing specific methionine-sulfonium phenyl substrates, the objective is to gain valuable insights into the catalytic binding mode of synthetic substrates and F— ion attack conformation, crucial for enzyme mechanism required in the synthesis of fluorophenyl compounds. The method aims to overcome challenges associated with traditional chemical synthesis methods that including environmental concerns and limited substrate selectivity of fluorinase enzymes.
Another objective is to employ modeling as a powerful tool in engineering fluorinase enzymes, enabling the rational design and optimization of enzyme structures. Through computational analysis and simulations within the active site of the enzyme, the objective is to enhance understanding of the underlying principles governing fluorinase catalysis, thereby guiding the synthesis of fluorophenyl compounds with improved efficiency and selectivity.
This approach holds the potential to revolutionize the field of fluorinase engineering by providing a systematic and efficient framework for enzyme optimization. By harnessing the power of computational modeling, this invention seeks to accelerate the development and commercialization of sustainable and scalable synthesis techniques for fluorophenyl compounds. The proposed method not only addresses the limitations of traditional approaches but also paves the way for the widespread industrial application of fluorophenyl compounds in sectors such as pharmaceuticals, agrochemicals, and materials.
The Fluorinase enzyme was discovered in 2002 from a soil bacterium (O'Hagan, D., et. al., 2002, Sananda, M. et. al., 1986), and since then, scientists have been working on improving its activity. One of the important challenges is the enzyme's narrow substrate specificity and low stability (O'Hagan, D., et. al., 2003). The mechanism of Fluorinase, especially the binding site for F— ion, has not been reported in any experimental structure (Sun, H., et. al., 2016; Thompson, S., et. al., 2016). There is also a lack of information on a complex that could define a catalytic conformation using a synthetic substrate. Especially, where F— ion is in an attacking conformation against a substrate that could yield a fluorophenyl products. To address this, a methionine-sulfonium phenyl substrate was designed to fit into the active site of Fluorinase. The active site of Fluorinase, where the natural substrate binds, is quite voluminous. However, this voluminous structure cannot bind smaller phenyl substrates. Therefore, drug molecules were scanned (
Extensive F— ion diffusion studies were conducted (
The main challenge was to achieve the precise conformation of the phenyl group within the active site of the enzyme. During the interaction between the phenyl group and the F— ion, there is a transfer of electron density from the phenyl group to the F— ion through the 71 electron system. As a result, the modelling of the phenyl moiety in the active site focused on facilitating π-π stacking interactions, which involve the overlap of electron clouds between aromatic rings. These interactions contribute to the stability and shape of the molecular system within the active site but do not directly interact with F— ion. Consequently, this arrangement leaves the C1 of the substrate available for F— ion to initiate an attack (
In this study, QM/MM simulations were conducted over different near-attack conformations of the substrate until the reaction proceeded to form the product, trifluorophenyl moiety (as described in the
Further, a fluorinase enzyme demonstrating stable catalytic binding of the compound named, [(3S)-3-amino-3-carboxypropyl][2,5-difluoro-4-(4-methoxy-2,4-dioxobutyl)phenyl]methylsulfonium in the active site is identified among many fluorinases obtained from a non-redundant database, using a screening protocol that includes metadynamics simulations and free energy surface calculations to identify the most suitable fluorinase enzyme demonstrating stable catalytic binding of the substrate named in the active site. The selected fluorinase enzyme incorporates specific mutations derived using residue-residue contact maps to determine hydrophobic residues contributing to major physical contacts near the active site (
“Computer Implemented Method” refers to methods or processes that are implemented using computer technology; in the present context there are several advantages over other methods of problem-solving such as (1) Speed and Efficiency: processing vast amounts of data and executing complex calculations at high speeds and is particularly valuable as the data is computationally intensive and would be time-consuming or practically infeasible to solve manually, (2) Scalability: efficiently handle large datasets, process numerous iterations, providing scalability that cannot be achieved manually, (3) Automation and Repetition: for tasks such as data analysis, simulations, optimization, and iterative processes, (4) Storage and Retrieval: store large datasets, previous results, and reference materials for quick access and analysis; allows for more comprehensive problem-solving by leveraging previously processed information and facilitating data-driven decision-making, (5) Visualization and Interaction: powerful visualization capabilities, allowing users to represent complex data in meaningful ways. Visualization aids in understanding patterns, relationships, and trends within the data, leading to better insights and decision-making. Additionally, computers enable interactive problem-solving through user interfaces, where users can input data, modify parameters, and observe the immediate impact on the results, (6) Iterative Refinement: iterative process facilitates experimentation and exploration of various scenarios, enabling better optimization and improvement of the problem-solving approach.
“Simulation” refers to the process of using a model to imitate and study the behavior of a real process. In the present context it is used to understand the behaviour of a fluorinase enzyme system which has F— ion and a substrate in the active site. The advantages of simulating such a system includes (1) Cost and Time Efficiency: Simulations allow for rapid and cost-effective exploration of different scenarios and designs without the need for extensive resources, (2) Complexity Handling: Simulations are particularly advantageous when dealing with complex systems or phenomena that are difficult to analyze mathematically or solve analytically. By using computational models, simulations can represent and study intricate relationships, interactions, and behaviors of complex systems. F— ion biochemistry is one such phenomena, (3) Parameter Exploration and Sensitivity Analysis: Simulations enable the exploration of a wide range of parameters and their effects on the system being modelled. Researchers can analyze how changes in variables impact the overall behaviours, performance, or outcomes of the system, (4) Optimization and Design: Simulations support optimization by allowing researchers and engineers to test different design alternatives, configurations, or strategies. In the present context, it was possible to evaluate the performance of various options, identify bottlenecks, and optimize the system's behavior or efficiency, (5) Data Generation and Analysis: Simulations generate large amounts of data that can be analyzed to gain insights and inform decision-making. In the present context, it was possible to analyze the output of simulations to identify patterns, correlations, or anomalies within the simulated system. This data-driven approach enhances understanding and facilitates decision-making.
“Methionine Sulfonium Salts” refers to compounds which contain a tricoordinate sulfur atom bearing a positive charge on sulfur are called sulfonium salts and that which is attached to methionine is called methionine sulfonium salts. In the present context such a moiety is crucial for activity of fluorinase enzyme. The enzyme has no activity against S-adenosyl-homocysteine (SAH), the non-sulfonium analogue of SAM, which is a natural substrate of fluroniase (Sergeev, M. E., et. al., 2013). Therefore, methionine sulfonium moieties are a logical starting point to explore when expanding the substrate scope of fluorinase. Several methods to synthesize sulfonium salts have been described previously, (Aggarwal, V. K. et. al., 1994, Sander, K. et. al., 2015) are adopted to synthesize the methionine sulfonium salts required for studying the substrate scope of the engineered fluorinase described in this embodiment.
The term “wild” or “wild-type” refers to a polypeptide sequence naturally occurring within an organism and can be procured from a source found in nature.
The term “Mutagenesis” refers as changing the function of protein by introducing a mutation on a specific position of the protein. For instance, the natural phenylalanine at position 143 has been changed to tryptophan, this process by which incorporating different amino acid into a protein by mutating a position is known as mutagenesis.
“Molecular dynamics” is a computational simulation method derived from Newtonian physics, used to study the dynamic behavior and movement of atoms and molecules over time. It models the physical interactions between individual particles, considering forces such as electrostatic interactions, van der Waals forces, and bond stretching. By numerically integrating the equations of motion derived from Newton's laws, molecular dynamics simulations provide valuable insights into the structural changes, thermodynamic properties, and dynamic processes of molecular systems. Typically, molecular dynamics simulations consist of multiple steps such as, Energy minimization, NVT (Equilibration of system by maintaining constant volume and temperature of the system), NPT (Equilibration of system by maintaining constant pressure)
“Metadynamics” is an extension to the traditional molecular dynamic simulations designed to explore the properties of multidimensional free energy surfaces (FES) in complex many-body systems, wherein a common approach involves employing coarse-grained non-Markovian dynamics within a reduced space defined by a small set of collective variables. These dynamics exhibit a distinctive attribute, a history-dependent potential term, that gradually fills the minima in the FES over time. This unique characteristic enables efficient exploration and precise determination of the FES with respect to the collective variables.
In this context, the term “Collective Variables” or “CV” refers to set of atoms or a group of atomic coordinates of amino acids used to study metadynamics simulations. The CV plays an important role in metadynamics where the bias potential applies directly to CV atoms or coordinates. The applied bias potential identifies different gaussian wells or bins throughout the simulations over the time.
A “trajectory” is represented as a series of coordinates or states across the simulation time, allowing the visualization and analysis of the object's or system's motion.
“Quantum Mechanics/Molecular Mechanics (QM/MM)” is a hybrid sampling approach that incorporates quantum mechanical calculations simulations to a set number of atoms in the study and applies molecular mechanics terms to the remaining atoms in the system. Studying the biochemical system at the electronic and subatomic level is computationally expensive, on the other hand, the accuracy of molecular mechanics is limited to the atom level, which makes it difficult to understand the transition level events that are rate limiting steps in a reaction. The hybrid approach of QM/MM results in a method that computationally allows for studying reaction sites at the atomic level and the rest of the system at a molecular level by defining a QM-MM boundary condition that separates the Quantum chemical calculation region and the regions considered under molecular mechanics terms.
“Gaussian accelerated Molecular Dynamics (GaMD)” is an extension to conventional molecular dynamics simulation wherein exploration of conformational transitions across the potential energy landscape of the system is achieved through the application of a harmonic boost potential that follows a Gaussian distribution. In this context GaMD is used to study F-ion entry into the active site.
“The General Atomic and Molecular Electronic Structure System (GAMESS)” is a widely used electronic structure software package for computational chemistry. It provides ab initio quantum chemistry calculations, density functional theory calculations, quantum mechanics/molecular mechanics (QM/MM) calculations, and other semi-empirical calculations.
The term “density functional theory (DFT)” is a computational quantum mechanical modelling technique that helps in studying the electronic structure and characteristics of atoms, molecules, and solids.
“AlphaFold” is a convolutional neural network (CNN)-based deep learning program by DeepMind that predicts protein structures with great accuracy based on their amino acid sequences.
pLDDT
“pLDDT” is a per-residue predicted confidence score to determine the confidence and accuracy of prediction of a modelled residue. The predicted confidence score is based on the local distance difference test (LDDT) that is a superimposition free measure of the atoms-atom distances in a modelled structure to validate the accuracy of the structure. The pLDDT confidence score ranges from 0-100, with greater than 90 being expected to be a residue modelled with high accuracy. In this context, low pLDDT means any value lesser than or equal to 75. Low pLDDT score residues were considered as hotspots to be mutated into residues with higher pLDDT score, which in turn indicates a greater confidence in the 3D structure of the protein.
“Substrate binding affinity” refers to the degree of interaction between a substrate molecule and the binding site on an enzyme or receptor is referred to as substrate binding affinity. It influences the effectiveness of enzymatic reactions. In this context refers to the favourable interaction between substrate and active site resides of the enzyme. Better binding affinity is where the steric clashes are minimum.
The “hotspots” are specific amino acid positions on a polypeptide that are chosen after analysis for mutations which can bring about a change in the functional properties of the polypeptide.
The terms “contact score” or “contact map” in this context refers to a method of ranking interactions that evaluates residue-residue interaction as a function of distance and physical van der Waal's contacts. Higher contact score indicates greater physical contacts of a residue with the target substrate or residue.
“Free energy surface (FES) graph or plot” refers to a method of visualizing the output of the metadynamics simulation as a function of the collective variables defined for the experiments. The Collective variables are defined in the x and y axes and the resulting surface is coloured based on the potential energy of the system under study. For the purposes of this embodiment, deeper potential wells and potential wells closer to the origin of the FES graph are considered to be an improvement over the reference FES graph.
In this context, interactions, both favourable and unfavourable, are those interactions that are contributed by the residues in the active site. Favourable interactions refer to those interactions in the environment of the enzyme or protein that can facilitate stronger binding of the target molecule, be it a substrate or residue. Interactions that are favourable are charged electrostatics interactions, hydrogen-bonding interactions, hydrophobic interactions. Unfavourable clashes are those interactions that are caused by overlapping van der Waal's radii. Unfavourable clashes tend force the substrate in an unrealistic or stressed conformation which can be considered as a high energy state. Minimising these high energy states and increasing stronger binding interactions leads to the substrate attaining a better binding mode in the active site of the enzyme.
“Induced fit modes” in this context refers to a method of structurally modelling the substrate into the active site of an enzyme by using ab initio methods to fit the substrate into the active site of generated ensembles of the enzyme active site structure.
In this context, the term “percent identity” or “percentage identical” are used to describe comparisons between polypeptides. To obtain this percentage, two sequences are optimally aligned over a comparison window, which may include gaps (i.e., deletions or additions) in the polypeptide sequence compared to the reference sequence, which does not contain gaps. The percentage is calculated by counting the number of positions in which the same nucleic acid base or amino acid residue appears in both sequences, dividing the number of matched positions by the total number of positions in the comparison window, and multiplying the result by 100 to obtain the percentage of sequence identity.
The acidic amino acids or residues include L-Glu (E) and L-Asp (D), basic amino acids or residues include L-Arg (R) and L-Lys (K), polar amino acids or residues include L-Asn (N), L-Gln (Q), L-Ser (S) and L-Thr (T), non-polar amino acids or residues include L-Gly (G), L-Leu (L), L-Val (V), L-Ile (I), L-Met (M) and L-Ala (A)
hydrophilic amino acids or residues include L-Thr (T), L-Ser (S), L-His (H), L-Glu (E), L-Asn (N), L-Gln (Q), L-Asp (D), L-Lys (K) and L-Arg (R), hydrophobic amino acids or residues include L-Pro (P), L-Ile (I), L-Phe (F), L-Val (V), L-Leu (L), L-Trp (W), L-Met (M), L-Ala (A) and L-Tyr (Y), aromatic amino acids or residues include L-Phe (F), L-Tyr (Y) and L-Trp (W) and aliphatic amino acids or residues include L-Ala (A), L-Val (V), L-Leu (L) and L-Ile (I). Although owing to the pKa of its heteroaromatic nitrogen atom L-His (H) it is sometimes classified as a basic residue, or as an aromatic residue as its side chain includes a heteroaromatic ring.
A “Amino acid difference or residue difference” refers to a change in the residue at a specified position of a polypeptide sequence when compared to a reference sequence. For example, a residue difference at position X116, where the reference sequence has a phenylalanine, refers to a change of the residue at position X116 to any residue other than phenylalanine. As disclosed herein, an enzyme can include one or more residue differences relative to a reference sequence, where multiple residue differences typically are indicated by a list of the specified positions where changes are made relative to the reference sequence.
“Reference sequence” refers to a defined sequence to which another (e.g., altered) sequence is compared. In this context the reference sequence is Fluorinase from Streptomyces cattleya (Accession no. Q70GK9.1, PDB ID: 5FIU)
“Conservative amino acid substitutions or mutations” refer to the interchangeability of residues having similar side chains, and thus typically involves substitution of the amino acid in the polypeptide with amino acids within the same or similar defined class of amino acids.
“Non-conservative substitution” refers to substitution or mutation of an amino acid in the polypeptide with an amino acid with significantly differing side chain properties.
The engineered flourinases used to synthesize the trifluorophenyl compounds are designed computationally as described below.
1 Generation of Reference Enzyme-Substrate Complex:
2 Identification of a Fluorinase Enzyme with Optimal Binding Affinity for the Substrate, [(3S)-3-amino-3-carboxypropyl][2,5-difluoro-4-(4-methoxy-2,4-dioxobutyl)phenyl]methylsulfonium.
3 Engineering of a Fluorinase Enzyme to Enhance Substrate Binding Affinity for [(3S)-3-amino-3-carboxypropyl][2,5-difluoro-4-(4-methoxy-2,4-dioxobutyl)phenyl] methylsulfonium
The mutations on the engineered fluorinases are given in Table 1.
The entire above process from section 1 to 3 is depicted as a process diagram in
The disclosed invention provides a pioneering computer-implemented method for engineering fluorinase enzymes towards the synthesis of fluorophenyl compounds. By leveraging computational modeling, the method offers advantages in terms of efficiency, overcoming challenges of chemical synthesis, expanding substrate scope, rational enzyme design. The approach represents a significant advancement in fluorinase engineering and holds immense potential for widespread industrial use of fluorophenyl compounds. The key advantages are listed here;
Enhanced Efficiency: By designing specific substrates and conducting modeling studies, the method accelerates the identification of optimal enzyme-substrate interactions, leading to more efficient catalytic activity and synthesis of fluorophenyl compounds.
Overcome Challenges of Chemical Synthesis: Traditional chemical synthesis methods for organofluorine compounds often pose environmental concerns and encounter stability issues. By employing this computer-implemented method, the challenges associated with chemical synthesis are addressed, enabling a more sustainable and environmentally friendly approach to fluorophenyl compound production.
Expanded Substrate Scope: The method's focus on engineering fluorinase enzymes allows for the expansion of substrate scope. Through computational modeling and substrate design, the method facilitates the synthesis of a wide range of fluorophenyl compounds, opening doors to various sectors such as pharmaceuticals, agrochemicals, and materials science.
Enzyme Design: The integration of computational modeling enables a rational and targeted approach to enzyme design and optimization. By gaining valuable insights into catalytic binding modes and F— ion attack conformations, the method enables the selection and modification of fluorinase enzymes to enhance their activity and substrate selectivity, resulting in more effective synthesis of fluorophenyl compounds.
Scalable Industrial Applications: The improved stability, substrate scope, and catalytic activity of the engineered fluorinase enzymes make large-scale production of fluorophenyl compounds feasible. This method paves the way for scalable and commercially viable production processes, benefiting industries such as pharmaceuticals, agrochemicals, and materials science.
Number | Date | Country | Kind |
---|---|---|---|
202241029679 | Jun 2022 | IN | national |