Recent years have seen significant improvements in hardware and software platforms for molecular design in computational drug discovery. In particular, existing systems often utilize computing devices and corresponding models to construct molecules with desired characteristics. In addition, existing systems often preserve certain scaffolds or core chemical substructures that serve as the backbone for the computer-based molecular design process because these scaffolds and constraints are crucial to a molecule's biological activity. In many cases, existing systems utilize a molecular string representation, Simplified Molecular Input Line Entry System (SMILES), within a drug discovery system. Although existing systems utilize molecular string representations, such as SMILES, existing systems often have a number of technical shortcomings with regard to the flexibility and accuracy that limit artificial intelligent (AI)-driven molecular design tasks.
Embodiments of the present disclosure provide benefits and/or solve one or more of the foregoing or other problems in the art with systems, non-transitory computer-readable media, and computer-implemented methods for generating a sequential attachment-based fragment embedding (SAFE) molecular string representation that represents a molecular representation as an order agnostic sequence of interconnected fragment blocks. Indeed, the disclosed systems can generate the SAFE molecular string representation for processing via large language models for downstream molecular design tasks. For instance, the disclosed systems can extract fragments (and attachment points) from a molecular string representation (e.g., a SMILES molecular string representation). Moreover, the disclosed systems can concatenate the extracted fragments using separation character connections between the fragments to generate a set of linked fragments (e.g., as a string). In addition, the disclosed systems can iterate over attachment points for the fragments to generate ring link characters in the set of linked fragments to simulate fragment links. Indeed, the resulting SAFE molecular string representation can include an order agnostic sequence of interconnected fragment blocks that represent a molecular compound.
Furthermore, the disclosed systems can utilize the above-mentioned SAFE molecular string representation to enable various downstream fragment-based molecular design tasks via machine learning models (that are not viable using many existing molecular representation notations). For instance, the disclosed systems can train a large language model for the fragment-based molecular design tasks by training the large language model via a measure of loss between a predicted completion of a partial sequence of a training SAFE molecular string representation and the training SAFE molecular string representation. Indeed, the SAFE molecular representation large language model can be utilized for de novo molecular compound generation tasks, scaffold decoration and motif extension tasks, linker design and scaffold morphing tasks, and/or molecular superstructure generation tasks.
Additional features and advantages of one or more embodiments of the present disclosure are outlined in the description which follows, and in part can be determined from the description, or may be learned by the practice of such example embodiments.
The detailed description is described with reference to the accompanying drawings in which:
This disclosure describes one or more embodiments of a digital molecular structure generation system that generates sequential attachment-based fragment embedding (SAFE) molecular string representations representing molecules as order agnostic sequences of interconnected fragment blocks for processing via large language models for downstream molecular design tasks. In one or more implementations, the digital molecular structure generation system converts molecular string representations (e.g., SMILES molecular string representations) into SAFE molecular string representations that include fragments with separation character connections between the fragments and ring link characters to simulate fragment links. In addition, the digital molecular structure generation system can also utilize the SAFE molecular string representations (having order agnostic sequences of interconnected fragment blocks to represent molecular compounds) to train a large language model to generate (or complete) additional SAFE molecular string representations for a variety of fragment-based molecular design tasks (e.g., de novo molecular compound generation tasks, scaffold decoration and motif extension tasks, linker design and scaffold morphing tasks, and/or molecular superstructure generation tasks).
For example,
For instance, as shown in
As illustrated in the act 104, to generate a SAFE molecular string representation 112, the digital molecular structure generation system 1306 can extract fragments from the molecular string representation 102 to generate a set of fragments (e.g., “c1cc([*])ccc1” and “O([*])C”). In some cases, the digital molecular structure generation system 1306 utilizes a bond slicing algorithm to break bonds in a molecular string representation (e.g., the molecular string representation 102) to generate the set of fragments (e.g., “c1cc([*])ccc1” and “O([*])C”). As further shown in
As further illustrated in the act 104, to generate the SAFE molecular string representation 112, the digital molecular structure generation system 1306 concatenates the fragments from the set of fragments (e.g., “c1cc([*])ccc1” and “O([*])C”) utilizing separation characters (e.g., “.”) between the fragments to generate a linked fragment string. Indeed, the digital molecular structure generation system 1306 can generate the linked fragment string (e.g., “c1cc([*])ccc1.O([*])C”) by concatenating the fragments using a separation character. Moreover, as shown in the act 104, the digital molecular structure generation system 1306 also generates ring link characters in the linked fragment string to represent the attachment points for specific fragment links (e.g., to accurately indicate bonds between different fragments). For instance, as shown in the act 104 of
Utilizing the separation character and ring link character in the act 104, the digital molecular structure generation system 1306 generates the SAFE molecular string representation 112 (e.g., “c1cc2ccc1.O2C”) which represents the molecular string representation 102 as an order agnostic sequence of interconnected fragment blocks. Indeed, the digital molecular structure generation system 1306 can generate the SAFE molecular string representation 112 agnostic of fragment block order (e.g., “c1cc2ccc1.O2C” or “O2C.c1cc2ccc1”) while representing the same molecular string representation 102. The digital molecular structure generation system 1306 generating a SAFE molecular string representation is described in greater detail below (e.g., in reference to
Although one or more embodiments described herein illustrate the digital molecular structure generation system 1306 generating a SAFE molecular string representation from a SMILES molecular string representation, the digital molecular structure generation system 1306 can generate a SAFE molecular string representation from a variety of molecular string representation notations in accordance with one or more implementations herein. Furthermore, the digital molecular structure generation system 1306 can generate a SAFE molecular string representation from a molecular string representation having a variety of fragments and/or a variety of bonds per fragments.
As used herein, the term “molecular compound” (sometimes referred to as “compound” or “molecule compound”) refers to a chemical compound having atoms with bonds to form a stable molecule (e.g., a drug or medicine). Indeed, in one or more instances, a molecular compound includes a substance composed of molecules designed to interact with specific biological targets (e.g., proteins, enzymes, or receptors).
Furthermore, as used herein, the term “molecular representation” (sometimes referred to as “molecular string representation”) refers to a notation that depicts a structure, composition, and/or function of a molecular compound. For instance, a molecular representation can include, but is not limited to, a molecular formula, a structural formula, or a chemical notation. In one or more instances, a molecular representation can include a chemical notation that represents molecular structures (e.g., ring structures, attachment points) in a text (or string) format for utilization in computational models. As an example, a molecular string representation can include a Simplified Molecular Input Line Entry System (SMILES), an International Chemical Identifier (InChl), and/or a Group Self-Referencing Embedded Strings (Group SELFIES).
Additionally, as used herein, the term “ring structure” refers to a molecular structure in which one or more atoms are connected in a closed loop. In particular, a ring structure can include an opening ring and a closing ring. Furthermore, as used herein, the term “attachment point” refers to a location within a molecule (or a molecule representation) (e.g., a ring structure) where an atom or group of atoms are attached (or connected).
As used herein, the term “sequential attachment-based fragment embedding molecular representation” (sometimes referred to as “SAFE representation” or “SAFE string representation”) refers to a molecular representation that indicates linked fragment blocks in a string with separation characters and ring link characters. Indeed, a SAFE representation depicts linked fragment blocks in a string of order agnostic fragment blocks designated with separation characters to specify individual fragments and ring link characters to specify fragment links with other fragment blocks in a molecular compound. In particular, a SAFE string representation includes a molecular representation generated in accordance with one or more implementations herein.
As used herein, the term “fragment” refers to a portion or piece of a molecule that represents an independent functional group (e.g., one or more atoms) with an identify and property within a molecule. For instance, a molecular compound can include a set of fragments connected to form a structure. As an example, a fragment can include, but is not limited to, a benzine ring or other pharmaceutical compound, an amine, a ketone, an amino acid (protein), and/or a synthetic compound.
As used herein, the term “separation character” refers to a string character or symbol that indicates a marker within a SAFE string representation to differentiate between fragments of a molecular compound. For instance, a separation character can include a string character or symbol that depicts a partition between two fragments in a molecular compound. A SAFE string representation can include multiple separation characters between multiple fragments. As an example, a separation character can include a “.” character, a “|” character, and/or a “-” character between fragments.
As used herein, the term “ring link character” refers to a string character or symbol that indicates an attachment point or fragment link within a SAFE string representation. In particular, a ring link character can specify a particular linkage between two or more fragments. For example, a ring link character can include a specific linking character that designates a specific link between fragments. In some cases, the ring link character includes one or more digits (e.g., a ring link digit) that each represent different fragment links in a SAFE string representation. In one or more instances, a ring link character can include a variety of characters or symbols, such as, alphanumerical characters, symbols, numerical characters.
As further shown in
As used herein, the term “machine learning model” includes a computer algorithm or a collection of computer algorithms that can be trained and/or tuned based on inputs to approximate unknown functions. For example, a machine learning model can include a computer algorithm with branches, weights, or parameters that changed based on training data to improve for a particular task. Thus, a machine learning model can utilize one or more (deep) learning techniques (e.g., supervised or unsupervised learning) to improve in accuracy and/or effectiveness. Example machine learning models include various types of decision trees, support vector machines, Bayesian networks, random forest models, or neural networks (e.g., deep neural networks, generative adversarial neural networks, convolutional neural networks, recurrent neural networks, large language models, or diffusion neural networks). Similarly, the term “machine learning data” refers to information, data, or files generated or utilized by a machine learning model. Machine learning data can include training data, machine learning parameters, or embeddings/predictions generated by a machine learning model.
As used herein, the term “language machine learning model” refers to a machine learning model that analyzes a language input (e.g., text or verbal input) to generate a predicted output. For instance, a language machine learning model includes a neural network that generates text based on an input text or query. The digital molecular structure generation system 1306 can utilize a variety of architectures for a language machine learning model, such as a large language model or other transformer neural network model.
For instance, a large language model includes one or more neural networks capable of processing natural language text to generate outputs that range from predictive outputs, analyses, one or more SAFE molecular string representations, or combinations of data within stored content items. In particular, a large language model can include parameters trained (e.g., via deep learning) on large data volumes to learn patterns and rules of language for summarizing and/or generating digital content. Examples of large language model include BLOOM, Bard AI, ChatGPT (e.g., GPT-3, GPT-4, etc.), LaMDA, and/or DialoGPT. Moreover, in some embodiments a language transformer model includes bidirectional encoder representations (BERT), Robustly optimized BERT (RoBERTa), and other text transformer models. Indeed, the digital molecular structure generation system 1306 can utilize a large language model trained to learn patterns and rules defined by molecular compound structures to generate SAFE molecular representations and/or to perform various downstream fragment-based molecular design tasks.
As used herein, the term “prompt” refers to a set of input instructions to a large language model (or other machine learning model) to cause the large language model to generate a particular output (or perform a particular task). Indeed, a prompt can include an input string of text that includes request for a large language model (e.g., generate a novel molecular compound, generate a molecular compound that includes properties for Central Nervous System (CNS) penetration). In one or more cases, a prompt can include a text input and/or a voice command.
As mentioned above, although existing systems can utilize molecular string representations, such as SMILES, these conventional systems often have a number of technical shortcomings with regard to flexibility and accuracy. For instance, many conventional systems cannot easily or accurately utilize SMILES molecular string representations for AI-driven molecular design tasks and computational drug discovery. In particular, AI-driven molecular design tasks and computational drug discovery often demand the preservation certain scaffolds or core chemical substructures (which serve as a backbone for molecular design processes). Indeed, preserving these groups and constraints often stems from their crucial role in a molecule's biological activity. In many instances, conventional systems are unable to incorporate such constraints when relying on SMILES molecular string representations (or many other conventional molecular string representations).
To illustrate, in many conventional systems, a SMILES molecular string representation is unable to provide a contiguous representation of molecular substructures. This limitation often hinders tasks, such as adding structures to a molecule's scaffold and connecting fragments. Such limitations also often limit SMILES representations' usefulness in improving potential drug candidates (e.g., during lead optimization efforts, during AI-driven molecular design tasks). Indeed, in many conventional systems, SMILES molecular string representations lack robustness to minor changes and struggle with ensuring validity and integrity of fragments in deep learning-based molecular design. In addition, the SMILES molecular string representations also often underperform in molecular search and substructure matching tasks.
Many other approaches (e.g., Self-Referencing Embedded Strings (SELFIES)), Group SELFIES) aim to resolve the deficiencies of SMILES molecular string representations but also have a number of technical shortcomings with regard to flexibility and accuracy. For instance, SELFIES and Group SELFIES improve on the robustness and validity issues via deep generative modeling through a recursive approach, however such representations lack simplicity, are difficult to interpret, and are not compact. Furthermore, these approaches often also fail to consistently uphold the integrity of scaffolds and fragments for several molecular generation tasks. In addition, such approaches fail to facilitate deep generative fragment-based molecule design without extensive, task-specific engineering of training processes and molecule generation steps, bespoke model architectures, or goal-directed optimization frameworks.
In some instances, many conventional systems utilize graph-based methods to create molecular representations that facilitate AI-driven molecular design tasks. However, many graph-based methods encounter difficulties when extending design tasks to scaffold-based generation, linker-design, and generating molecules with unseen building blocks. Indeed, many of these approaches experience difficulties in creating novel cyclic structures not seen during training. Furthermore, some conventional systems utilize graph-based models that are trained on the SMILES molecular string representations, however these models often fail to guarantee validity of generated molecules and the presence of input scaffold constraints. In particular, many conventional systems are unable to (e.g., due to incapability or due to additional required engineering) facilitate one or more molecular design tasks, such as de novo molecular compound generation tasks, scaffold decoration and motif extension tasks, linker design and scaffold morphing tasks, and/or molecular superstructure generation tasks.
As suggested by the foregoing, the digital molecular structure generation system 1306 provides a variety of technical advantages relative to conventional systems. Indeed, the digital molecular structure generation system 1306 generates and utilizes sequential attachment-based fragment embedding (SAFE) molecular string representations that represent molecules as order agnostic sequences of interconnected fragment blocks that can flexibly and accurately be utilized with large language models for downstream molecular design tasks. Indeed, the digital molecular structure generation system 1306 can generate SAFE molecular string representations by converting molecular string representations (e.g., SMILES molecular string representations) as an order agnostic sequence of interconnected fragment blocks while maintaining compatibility with existing molecular string representation parsers (e.g., SMILES parsers).
By being order agnostic sequences, the SAFE molecular string representations enable the digital molecular structure generation system 1306 to flexibly and accurately utilize the SAFE molecular string representations with generative models for one or more molecular design tasks. Indeed, the SAFE molecular string representations preserve the integrity of molecular scaffolds and fragments. Additionally, the digital molecular structure generation system 1306 can easily utilize SAFE molecular string representations as simple sequence completion problems that enable accuracy and flexibility in molecular design tasks, such as de novo molecular compound generation tasks, scaffold decoration and motif extension tasks, linker design and scaffold morphing tasks, and molecular superstructure generation tasks. Moreover, the SAFE molecular string representations (generated by the digital molecular structure generation system 1306) also facilitate autoregressive generation which flexibly bypasses the necessity for intricate decoding schemes or graph-based models (in molecular design generative tasks).
Additionally, the digital molecular structure generation system 1306 generates SAFE molecular string representations as a collection of connected fragments that remain valid as other molecular string representations (e.g., a SMILES representation). Accordingly, while enabling many AI-driven molecular design tasks (which is not often possible in conventional systems), the SAFE molecular string representations generated by the digital molecular structure generation system 1306 remain compatible with other molecular string representations (e.g., a SMILES representation), such that the SAFE molecular string representations are backward compatible with many existing molecular string representation parsers (e.g., SMILES parsers). For instance, the underlying molecular graph remains unaffected by the arrangement of fragments within a SAFE molecular string representation to ensure that data augmentation techniques for generative models (corresponding to other molecular string representations), such as randomization, remain applicable to the SAFE molecular string representation.
Indeed, experimental results illustrated in
As mentioned above, the digital molecular structure generation system 1306 can generate sequential attachment-based fragment embedding (SAFE) molecular string representations from other molecular string representations (e.g., SMILES molecular string representations). For example,
As shown in
To convert the SMILES molecular string representation 202 into the SAFE molecular string representation 212, as shown in act 206 of
Furthermore, as shown in an act 208 of
In addition, as shown in an act 210 of
In some cases, the digital molecular structure generation system 1306 utilizes ring link digits (e.g., 1, 2, 3, 4, 5) as the ring link characters. For instance, in the act 206, the digital molecular structure generation system 1306 can generate a ring link character (or digit) of “1” for “RL1” and a ring link character (or digit) of “2” for “RL2.” Although one or more embodiments described herein utilize specific ring link characters (or digits), the digital molecular structure generation system 1306 can utilize a variety of ring link characters (e.g., alphanumerical characters, symbols, numerical characters).
As shown in
Furthermore, in one or more instances, the digital molecular structure generation system 1306 can generate a SAFE molecular string representation (or linked fragment string) utilizing a varying order of the fragments. For instance, the digital molecular structure generation system 1306 can generate, in the act 208, a linked fragment string by concatenating the set of fragments as, but not limited to, “Fragment1.FragmentN.Fragment2” or “Fragment2.Fragment1.FragmentN”) and also generate corresponding ring link characters in the act 210. As an example, the digital molecular structure generation system 1306 can generate the SAFE molecular string representation 212 utilizing varying permutations, such as “N18CCCCC1.O=C6C#CC8.N67.c17ccc2ncnc4cnc1.c15cccc(Br)c1.N45”. Indeed, the digital molecular structure generation system 1306 generates and utilizes a SAFE molecular string representation that is permutable while preserving the same fragment link connections (because the ring link characters continue to specify fragment links in different arrangements of the fragment blocks).
Furthermore,
In one or more instances, the digital molecular structure generation system 1306 utilizes a bond slicing algorithm to determine fragments on a desired set of bonds from ring structures represented in a molecular string representation (e.g., a SMILES molecular string representation). For instance, in one or more implementations, the digital molecular structure generation system 1306 utilizes a breaking of retrosynthetically interesting chemical substructures (BRICS) algorithm as the bond slicing algorithm as described in Degen et al., On the Art of Compiling and Using ‘Drug-like’ Chemical Fragment Spaces, ChemMedChem: Chemistry Enabling Drug Discovery, 3(10):1503-1507 (2008) (hereinafter “Degen”), which is incorporated herein by reference in its entirety. Although one or more implementations of the digital molecular structure generation system 1306 utilizes a BRICS algorithm, the digital molecular structure generation system 1306 can utilize various bond slicing algorithms, such as, but not limited to, match molecular pair method as described in Hussain et al., Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPS) in Large Data Sets, Journal of Chemical Information and Modeling, 50(3):339-348 (2010) (hereinafter “Hussain”), RECAP as described in Lewell et al., Recap Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry, Journal of Chemical Information and Computer Sciences, 38(3):511-522 (1998) (hereinafter “Lewell”), and/or custom patterns, each of which are incorporated herein by reference in their entirety.
Furthermore, as shown in an act 310 of
Additionally, as shown in
Moreover, as shown in an act 308 of
Furthermore, as shown in an act 314, the digital molecular structure generation system 1306 generates a linked fragment string from the set of fragments with separation characters (in accordance with one or more implementations herein). Furthermore, as shown in act 316 of
In one or more instances, the digital molecular structure generation system 1306 can generate a SAFE molecular string representation in accordance with the following Algorithm 1:
using a bond slicing algorithm
Find the next possible ring digits
To illustrate, in the above-mentioned Algorithm 1, the digital molecular structure generation system 1306 can extract unique ring identifiers from a molecule and fragment the molecule on a desired set of bonds (e.g., using a bond slicing algorithm). Indeed, the fragment substructures can represent synthetically accessible building blocks that are present in drug-like compounds. Moreover, the digital molecular structure generation system 1306 can sort the extracted fragments by size. Furthermore, the digital molecular structure generation system 1306 can concatenate the fragments using a separation character “.” to mark new fragments in the representation (while preserving their corresponding attachment points). To construct the SAFE string representation, the digital molecular structure generation system 1306 can iterate over the numbered attachment points and replace them with a ring link character (e.g., i) to simulate fragment linking. The ring link characters create virtual connections between fragments resulting in a set of linked fragments (indicated by the separation character).
Furthermore, in some cases, the digital molecular structure generation system 1306 can canonicalize SAFE string representations such that multiple valid forms of a molecular representation yield a unique representation by enforcing a decoding order on SMILES characters within fragment and on fragment orders within the converted SAFE string representation.
Additionally, as mentioned above, the digital molecular structure generation system 1306 can utilize SAFE molecular string representations to enable various downstream fragment-based molecular design tasks via large language models. For instance,
As shown in
Additionally, in some cases, the digital molecular structure generation system 1306 can utilize permutations of SAFE molecular string representations as the training SAFE string representation(s). For instance, the digital molecular structure generation system 1306 can generate randomized training SAFE string representation(s) by generating random permutations of SAFE molecular string representations (e.g., by randomizing fragment locations within the string representation). For example, the digital molecular structure generation system 1306 can generate varying permutations of SAFE string representation(s) as described above (e.g., in reference to
Furthermore, as shown in
Indeed, as further illustrated in
As further shown in
Indeed, the digital molecular structure generation system 1306 can utilize the comparison to generate the measure of loss 414 to quantify errors (or inaccuracies) between the predicted sequence of the SAFE molecular string representation 416 and the original (ground truth) training SAFE molecular string representation 410. Moreover, the digital molecular structure generation system 1306 utilizes the measure of loss 414 with the large language model 412 to adjust (or modify) parameters of the large language model 412 (e.g., via back propagation). In one or more instances, the digital molecular structure generation system 1306 iteratively repeats the determination and utilization of measures of losses as shown in
In some cases, the digital molecular structure generation system 1306 utilizes a dataset to generate training SAFE string representation(s). For instance, the digital molecular structure generation system 1306 can generate (or identify) SMILES strings from a dataset of molecules. As an example, the digital molecular structure generation system 1306 can utilize a dataset of molecules, such as the ZINC library as described in Irwin et al., ZINC—A Free Database of Commercially Available Compounds for Virtual Screening, Journal of Chemical Information and Modeling, 45(1):177-182 (2005) (hereinafter “Irwin”) and the UniChem library as described in Chambers, et al., UniChem: A Unified Chemical Structure Cross-Referencing and Identifier Tracking System, Journal of Cheminformatics, 5(1):3 (2013) (hereinafter “Chambers”), each of which are incorporated herein by reference in their entirety. Moreover, the digital molecular structure generation system 1306 can convert the SMILES strings from the dataset of molecules (described above) into SAFE molecular string representations in accordance with one or more implementations herein.
In addition, the digital molecular structure generation system 1306 can generate tokens (or fragments) for the training SAFE string representation(s) to utilize in a generative model (e.g., a large language model). For instance, in some cases, the digital molecular structure generation system 1306 can identify expressions (e.g., common regular expressions) for SMILES representations as described in Schwaller et al., Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction, ACS Central Science, 5(9):1572-1583 (2019), which is incorporated herein by reference in its entirety. Indeed, the digital molecular structure generation system 1306 can utilize the expressions to generate a vocabulary of tokens represented within a dataset of SMILES representations (or SAFE representations).
Furthermore, the digital molecular structure generation system 1306 can utilize a tokenizer to generate tokens that represent various expressions for molecular representation syntax (e.g., SMILES syntax and/or SAFE syntax). To illustrate, the digital molecular structure generation system 1306 can utilize a tokenizer to generate tokens to represent the above-described expressions or vocabulary from molecular representation syntax. Indeed, in some cases, the digital molecular structure generation system 1306 utilizes a variety of tokenizers, such as, a byte-pair encoding (BPE) tokenizer, wordpiece tokenization approaches, and/or sentence piece tokenization approaches.
In some instances, the digital molecular structure generation system 1306 also generates one or more special tokens. In particular, the digital molecular structure generation system 1306 can generate tokens to represent an end-of-sequence (e.g., EOS) to indicate the end of a SMILES or SAFE representation, a beginning-of-sequence (e.g., BOS) to indicate the beginning of a SMILES or SAFE representation, a mask token (e.g., MASK) to represent a masked token. Indeed, the digital molecular structure generation system 1306 can generate a variety of special tokens (e.g., EOS, BOS, UNK, MASK, PAD).
In one or more implementations, the digital molecular structure generation system 1306 utilizes the generative model (e.g., the large language model) to generate (or learn) a token distribution for a predicted token of a (partial) SAFE molecular string representation. In particular, the digital molecular structure generation system 1306 utilizes the generative model to generate (or output) a probability distribution across SAFE tokens (e.g., a token vocabulary as described above) to represent a predicted probability for each SAFE token being the SAFE token that completes a partial SAFE string representation (or fills a subsequent token). Indeed, in some cases, the generative model generates a vector (or array) having a size representative of a token vocabulary with a predicted probability for each token in a token vocabulary being the predicted SAFE token for the partial SAFE string representation (or the next or subsequent SAFE token). In addition, the probability distribution can also include special tokens (e.g., end-of-sequence tokens, beginning-of-sequence tokens) to indicate a probability of the predicted token being an end of sequence (e.g., a prediction that a valid molecule is represented by the predicted sequence of SAFE tokens). In one or more implementations, the digital molecular structure generation system 1306 utilizes the SAFE generative model to generate (or learn) a token distribution utilizing batch training (e.g., of multiple training SAFE string representations) (to train for past sequence context across multiple training SAFE string representations).
In addition, the digital molecular structure generation system 1306 can utilize predicted token distributions (e.g., as the predicted sequence of the SAFE molecular string representation) and a ground truth training SAFE molecular string representation sequence of tokens to determine whether the predicted token distributions correctly (or incorrectly) indicate the correct ground truth token. Indeed, the digital molecular structure generation system 1306 can utilize the comparison to generate a measure of loss that rewards and/or penalizes the generative model (e.g., the large language model 412) based on the token probability distribution indicating the correct or incorrect predicted SAFE token.
Moreover, the digital molecular structure generation system 1306 can utilize various types of losses to train a SAFE generative model (e.g., the large language model 412). For instance, in some cases, the digital molecular structure generation system 1306 generates a cross-entropy measure of loss as the measure of loss (e.g., the measure of loss 414). In one or more instances, the digital molecular structure generation system 1306 can also utilize a variety of other loss measures, such as, but not limited to, mean-squared error losses and/or negative log-likelihood (NLL) losses.
In some embodiments, the digital molecular structure generation system 1306 can fine-tune a SAFE generative model on a specialized chemical space (e.g., for target tasks). For instance, the digital molecular structure generation system 1306 can fine-tune a SAFE generative model for a particular drug utilizing a fragment-constrained target. Furthermore, in some cases, the digital molecular structure generation system 1306 can fine-tune a SAFE generative model for multi-property optimization (MPO) scenarios, including the integration of a prediction head into the SAFE generative model architecture for simultaneous molecular generation and property prediction.
Although one or more embodiments illustrate the digital molecular structure generation system 1306 utilizing tokens, the digital molecular structure generation system 1306 can utilize string representations (e.g., masked string representations as partial sequences) to train a SAFE generative model in accordance with one or more implementations herein.
Indeed, the digital molecular structure generation system 1306 can train a generative model (e.g., a large language model) to create (or complete) SAFE molecular string representations that represent a valid molecular compound. For instance, as mentioned above, the digital molecular structure generation system 1306 can utilize a SAFE generative model (trained in accordance with one or more implementations herein) to perform a variety of downstream fragment-based molecular design tasks, such as, but not limited to, de novo molecular compound generation tasks, scaffold decoration and motif extension tasks, linker design and scaffold morphing tasks, and/or molecular superstructure generation tasks.
For example,
For instance, the molecular design task input(s) 504 can include a linker generation task 508 and/or a scaffolding morphing tasks 510. The digital molecular structure generation system 1306 can utilize the inputs of the linker generation task 508 and/or a scaffolding morphing task 510 to complete a SAFE molecular compound sequence representation from a partial (or incomplete) SAFE molecular compound sequence in the linker generation task 508 and/or the scaffolding morphing tasks 510. Indeed, the digital molecular structure generation system 1306 can complete the SAFE molecular compound sequence representation (as shown in the linker generation task 508 and/or the scaffolding morphing tasks 510) to generate the SAFE molecular string representation 520.
In particular, the digital molecular structure generation system 1306 can utilize a SAFE generative mode to perform linker generation tasks and/or scaffolding morphing tasks as sequence completion tasks. For instance, the digital molecular structure generation system 1306 can utilize input fragments (in a SAFE string) with a request to link the fragments as an initial sequence for the SAFE generative model. Subsequently, the SAFE generative model can generate a predicted tokens for the missing linker in the input fragments to generate (or complete) a SAFE molecular string representation. Indeed, the digital molecular structure generation system 1306 can utilize the SAFE generative model (trained in accordance with one or more implementations herein) to perform a sequential completion because the order of fragments in a SAFE molecular string representation doesn't affect the underlying molecular graph (e.g., the linked fragments are order agnostic).
In some cases, the digital molecular structure generation system 1306 can utilize a constrained beam search to perform a linker generation task (with a SAFE generative model as described above). For instance, the digital molecular structure generation system 1306 can utilize a constrained beam search to ensure the presence of each fragment (of a molecular compound) in a final molecular representation. During a scaffold morphing task, the digital molecular structure generation system 1306 can generate new molecular representations (or new molecules) for one or more fragments with connectivity constraints (after which the scaffold is inferred and linked to other fragments).
In addition, as shown in
For instance, the digital molecular structure generation system 1306 can utilize the SAFE generative model to frame the motif extension task and/or the scaffold decoration task as a sequential completion task by predicting a new token to generate novel fragments using the SAFE molecular string representation. Indeed, the digital molecular structure generation system 1306 can begin with an initial sequence corresponding to a scaffold or motif (e.g., as shown in the input motif extension task 512 and/or the scaffold decoration task 514) with marked attachment points to predict fragments to add to generate a completed SAFE molecular string representation that represents a valid (novel or known) molecular compound.
Additionally, as shown in
Furthermore, as shown in
As also shown in
Furthermore, the digital molecular structure generation system 1306 can utilize a target molecular compound constraint(s) 506 to optimize a SAFE generative model to fit one or more of the target profiles defined by the target molecular compound constraint(s) 506. In some cases, the digital molecular structure generation system 1306 can generate a set (or library) of molecule compounds using SAFE representations from the SAFE generative model using a target molecular compound constraint(s) 506.
In some instances, the digital molecular structure generation system 1306 can utilize a SAFE generative model (for downstream fragment-based molecular design tasks) using SMILES (or other molecular string representation) inputs. For instance, the digital molecular structure generation system 1306 can convert the SMILES (or other molecular string representation) inputs into SAFE representations (in accordance with one or more implementations herein). Then, the digital molecular structure generation system 1306 can utilize the converted SAFE representations as inputs for the SAFE generative model to accomplish a downstream fragment-based molecular design task (in accordance with one or more implementations herein).
In one or more instances, the digital molecular structure generation system 1306 can utilize a SAFE generative model (as described herein) and/or a SAFE molecular string representation (as described herein) for a molecule compound with a variety of tech-bio exploration tools of the tech-bio exploration system 1304. For instance, the digital molecular structure generation system 1306 can utilize the SAFE generative model (as described herein) and/or a SAFE molecular string representation to provide (and/or generate) molecule compounds and utilize the molecule compounds as input (or as a component) of the variety of tech-bio exploration tools of the tech-bio exploration system 1304. For instance, the digital molecular structure generation system 1306 can utilize the SAFE generative model (as described herein) and/or a SAFE molecular string representation (as described herein) for tech-bio exploration tools, such as, but not limited to, bio-activity heatmap models as described in UTILIZING MACHINE LEARNING MODELS TO SYNTHESIZE PERTURBATION DATA TO GENERATE PERTURBATION HEATMAP GRAPHICAL USER INTERFACES, U.S. patent application Ser. No. 18/526,707, filed Dec. 1, 2023, ADMET prediction models and/or drug-likeness matching tools as described in UTILIZING COMPOUND-PROTEIN MACHINE LEARNING REPRESENTATIONS TO GENERATE BIOACTIVITY PREDICTIONS, U.S. patent application Ser. No. 18/505,728, filed Nov. 9, 2023, compound exploration program models as described in UTILIZING BIOLOGICAL MACHINE LEARNING REPRESENTATIONS AND A LANGUAGE MACHINE LEARNING MODEL FOR INITIATING COMPOUND EXPLORATION PROGRAMS, U.S. patent application Ser. No. 18/521,910, filed Nov. 28, 2023, digital maps of biology models as described in UTILIZING MACHINE LEARNING AND DIGITAL EMBEDDING PROCESSES TO GENERATE DIGITAL MAPS OF BIOLOGY AND USER INTERFACES FOR EVALUATING MAP EFFICACY, U.S. patent application Ser. No. 18/392,989, filed Dec. 21, 2023, and/or microscopy representation autoencoder models as described in UTILIZING MASKED AUTOENCODER GENERATIVE MODELS TO EXTRACT MICROSCOPY REPRESENTATION AUTOENCODER EMBEDDINGS, U.S. patent application Ser. No. 18/545,399, filed Dec. 19, 2023, each of which are incorporated by reference in their entirety herein.
In some instances, the digital molecular structure generation system 1306 can identify a molecular compound of interest (e.g., a molecular compound from a compound exploration program as described in U.S. patent application Ser. No. 18/521,910 (incorporated by reference above)). Moreover, the digital molecular structure generation system 1306 can utilize the molecular compound of interest as a constraint (e.g., a target molecule compound constraint) for the SAFE generative model to cause the SAFE generative model to generate molecule compound representations (e.g., SAFE representations) that are related to (or variations) of the molecular compound of interest. In some cases, the digital molecular structure generation system 1306 can identify a database of particular compounds (e.g., enamines, amines) and utilize the SAFE generative model to generate (or synthesize) molecule compounds based on the database of particular compounds as a target molecule compound constraint.
Although one or more particular tech-bio exploration tools are described above, in one or more instances, the digital molecular structure generation system 1306 can also enable the SAFE generative model (or SAFE molecular string representations) to interact with a variety of other tech-bio exploration tools. In addition, the digital molecular structure generation system 1306 can also enable the SAFE generative model (or SAFE molecular string representations) to interact with a variety of third-party (or external) tools, such as, third-party vendor systems, third-party automated lab tools, and/or third-party image editing tools.
Furthermore, experimenters utilized an implementation of a SAFE generative model of the digital molecular structure generation system 1306 (as described above) to generate sample output SAFE molecular representations for a variety of tasks (e.g., linker design, scaffold morphing, motif extension, scaffold decoration, superstructure) for fragment-constrained inputs based on a particular molecular compound (representing the drug Maribavir). For instance,
Additionally, experimenters examined an implementation of a SAFE generative model's (of the digital molecular structure generation system 1306 as described above) ability to perform fragment-constrained generative design tasks, such as scaffold decoration, scaffold morphing, linker generation, motif extension, and superstructure generation. Indeed, the experimenters designed a benchmark that involved working with scaffolds and fragments from 10 existing drugs to demonstrate the accuracy of the implementation of the SAFE generative model (via validity, diversity, uniqueness, distance, and synthetic accessibility scores). Indeed, the experimenters utilized a 1000 molecules sampled in each of the above-mentioned fragment-constrained design task using an implementation of the SAFE generative model to determine averaged validity, diversity, and uniqueness scores for the outputs of the SAFE generative model. In addition, the experimenters also determined an average Tanimoto distance between the generated molecules to the original drug molecules, along with the average synthetic accessibility (SA) scores (as described in Ertl et al., Estimation of Synthetic Accessibility Score of Drug-Like Molecules Based on Molecular Complexity and Fragment Contributions, Journal of Cheminformatics, 1:1-11 (2009) (hereinafter “Ertr”)). As shown in the following Table (e.g., Table 1), the implementation of the SAFE generative model maintained full validity for the sampled molecules under constraints, while achieving high internal diversity and novelty compared to the original drugs. Moreover, as shown in Table 1, the generated molecules exhibited a low SA score, indicating their ease of synthesis.
Furthermore, the experimenters also evaluated an implementation of the SAFE generative model in goal-directed generation. For instance, the experimenters optimized an implementation of the SAFE generative model toward specific values for key molecular properties to assess the model's ability for goal-directed generation. For instance, the experimenters optimized towards specific values for molecular properties, including Topological Polar Surface Area (TPSA), Molecular Weight (MW), Calculated Log P (C LOG P), and Quantitative Estimation of Drug-likeness (QED). Indeed, the experimenters utilized an implementation of the SAFE generative model using Proximal Policy Optimization (PPO) (as described in Schulman et al., Proximal Policy Optimization Algorithms, arXiv Preprint arXiv:1707.06347 (2017)(hereinafter “Schulman”)) with Adaptive KL Penalty to train a policy for generating molecular samples with the targeted property value. The experimenters further fine-tuned agents (of an implementation of the SAFE generative model) for two target values on each molecular property and evaluated their performance. Indeed, the generated samples, from the experiment, were valid and unique.
Indeed,
Additionally, experimenters also utilized an implementation of the SAFE generative model on an optimization task aimed at improving the Central Nervous System (CNS) penetration of EGFR Tyrosine Kinase Inhibitors (e.g., addressing the challenge of CNS metastases in non-small cell lung cancer). Indeed, the experimenters evaluated for a CNS-MPO score, a comprehensive metric that assesses physico-chemical properties associated with CNS penetration (with a higher CNS-MPO score indicating better desirability). In addition, the experimenters introduced additional constraints to our optimization task which required that generated molecules feature a scaffold that has demonstrated activity against EGFR.
Indeed,
Furthermore, experimenters utilized an implementation of the SAFE generative model to generate de novo molecule representations. Indeed,
Furthermore, the experimenters also compared de novo molecule generation between an implementation of the SAFE generative model (e.g., SAFE-GPT-20M) and a Group SELFLIES model (GSELFIES-GPT-20M), both trained on a MOSES dataset). Indeed,
Additionally, as mentioned above, existing systems that utilizes GSELFIES representations lack simplicity, are difficult to interpret, and are not compact. In contrast, experimenters demonstrate that an implementation of the SAFE generative model (e.g., the SAFE-GPT-20M described above) generates SAFE representations that are compact and easier to interpret. For instance,
As shown in
For instance, the tech-bio exploration system 1304 can generate and access experimental results corresponding to gene sequences, protein shapes/folding, protein/compound interactions, phenotypes resulting from various interventions or perturbations (e.g., gene knockout sequences or compound treatments), and/or in-vivo experimentation on various treatments in living animals. By analyzing these signals (e.g., utilizing various machine learning models), the tech-bio exploration system 1304 can generate or determine a variety of predictions and inter-relationships for improving treatments/interventions.
To illustrate, the tech-bio exploration system 1304 can generate maps of biology indicating biological inter-relationships or similarities between these various input signals to discover potential new treatments. For example, the tech-bio exploration system 1304 can utilize machine learning and/or maps of biology to identify a similarity between a first gene associated with disease treatment and a second gene previously unassociated with the disease based on a similarity in resulting phenotypes from gene knockout experiments. The tech-bio exploration system 1304 can then identify new treatments based on the gene similarity (e.g., by targeting molecular compounds the impact the second gene). Similarly, the tech-bio exploration system 1304 can analyze signals from a variety of sources (e.g., protein interactions, or in-vivo experiments) to predict efficacious treatments based on various levels of biological data.
The tech-bio exploration system 1304 can generate GUIs comprising dynamic user interface elements to convey tech-bio information and receive user input for intelligently exploring tech-bio information. Indeed, as mentioned above, the tech-bio exploration system 1304 can generate GUIs displaying different maps of biology that intuitively and efficiently express complex interactions between different biological systems for identifying improved treatment solutions. Furthermore, the tech-bio exploration system 1304 can also electronically communicate tech-bio information between various computing devices.
As shown in
As shown in
As also illustrated in
Furthermore, in one or more implementations, the client device(s) 1310 includes a client application. The client application can include instructions that (upon execution) cause the client device(s) 1310 to perform various actions. For example, a user of a user account can interact with the client application on the client device(s) 1310 to initiate, generate, or access one or more SAFE molecular representations or SAFE generative models (e.g., via prompts) in accordance with one or more implementations herein.
As further shown in
In one or more implementations, the digital molecular structure generation system 1306 generates and accesses one or more SAFE molecular representations and/or SAFE generative models. As shown, in
While
For instance,
In one or more instances, the series of acts 1400 can include identifying a molecular string representation comprising ring structure identifiers that indicate virtual connections between atom representations of a molecular compound, generating a set of fragments from the molecular string representation, and generating a sequential attachment-based fragment embedding (SAFE) molecular string representation that represents the molecular string representation as an order agnostic sequence of interconnected fragment blocks by: concatenating fragments from the set of fragments utilizing a separation character between the fragments to generate a linked fragment string and generating ring link characters in the linked fragment string to represent attachment points for fragment links.
Furthermore, the series of acts 1400 can include generating the set of fragments by utilizing a bond slicing algorithm with the molecular string representation. In addition, the series of acts 1400 can include generating the linked fragment string by ordering the fragments from the set of fragments based on fragment size.
In addition, the series of acts 1400 can include generating the SAFE molecular string representation by extracting attachment point indicators from the molecular string representation and utilizing the attachment point indicators to generate the linked fragment string. Moreover, the series of acts 1400 can include generating the SAFE molecular string representation by replacing the attachment point indicators in the linked fragment string with the ring link characters.
Furthermore, the series of acts 1400 can include generating an additional SAFE molecular string representation from the SAFE molecular string representation by reordering fragment blocks comprising the fragments and the ring link characters, wherein the additional SAFE molecular string representation represents the molecular string representation. For example, a ring link character(s) can include a ring digit(s).
Moreover, the series of acts 1400 can include generating, utilizing a large language model from the SAFE molecular string representation, an additional SAFE molecular string representation representing an additional molecular compound. In addition, the series of acts 1400 can include generating, utilizing a large language model from the SAFE molecular string representation, a complete SAFE molecular compound sequence representation from a partial SAFE molecular compound sequence representation. Furthermore, the series of acts 1400 can include generating, utilizing a large language model from the SAFE molecular string representation, a linking SAFE molecular string representation for two or more molecular compound sequences. Additionally, the series of acts 1400 can include generating, utilizing a large language model from the SAFE molecular string representation, a molecular compound sequence based on one or more target molecule compound constraints.
Furthermore,
In some implementations, the series of acts 1500 include generating, for a molecular compound, a training sequential attachment-based fragment embedding (SAFE) molecular string representation comprising order agnostic fragment blocks represented by fragment strings, separation characters, and ring link characters and training a large language model to generate SAFE molecular string representations by: generating, utilizing the large language model, a predicted token for the training SAFE molecular string representation from a tokenized partial sequence of the training SAFE molecular string representation and modifying parameters of the large language model utilizing a comparison between the predicted token and the training SAFE molecular string representation.
Furthermore, the series of acts 1500 can include generating the training SAFE molecular string representation by converting a molecular string representation comprising ring structure identifiers that indicate virtual connections between atom representations of the molecular compound. In addition, the series of acts 1500 can include generating the training SAFE molecular string representation by concatenating fragments identified from the molecular string representation utilizing the separation characters and representing attachment points for fragment links of the fragments utilizing the ring link character.
Additionally, the series of acts 1500 can include generating, utilizing the large language model, the predicted token by utilizing the large language model to generate a SAFE notation token probability distribution and/or selecting the predicted token from the SAFE notation token probability distribution.
Moreover, the series of acts 1500 can include training the large language model to generate the SAFE molecular string representations by determining a measure of loss between the predicted token and the training SAFE molecular string representation and/or modifying the parameters of the large language model utilizing the measure of loss.
Furthermore, the series of acts 1500 can include generating, utilizing the large language model, an end-of-sequence token as the predicted token to indicate a predicted completed molecule representation.
Additionally, the series of acts 1500 can include utilizing the large language model to complete a partial molecular compound sequence or generate a linking SAFE molecular string representation for two or more molecular compound sequences. Furthermore, the series of acts 1500 can include generating, utilizing the large language model, a SAFE molecular string representation based on a prompt requesting a target molecular compound.
Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Implementations within the scope of the present disclosure also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. In particular, one or more of the processes described herein may be implemented at least in part as instructions embodied in a non-transitory computer-readable medium and executable by one or more computing devices (e.g., any of the media content access devices described herein). In general, a processor (e.g., a microprocessor) receives instructions, from a non-transitory computer-readable medium, (e.g., a memory, etc.), and executes those instructions, thereby performing one or more processes, including one or more of the processes described herein.
Computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are non-transitory computer-readable storage media (devices). Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, implementations of the disclosure can comprise at least two distinctly different kinds of computer-readable media: non-transitory computer-readable storage media (devices) and transmission media.
Non-transitory computer-readable storage media (devices) includes RAM, ROM, EEPROM, CD-ROM, solid state drives (“SSDs”) (e.g., based on RAM), Flash memory, phase-change memory (“PCM”), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmissions media can include a network and/or data links which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
Further, upon reaching various computer system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to non-transitory computer-readable storage media (devices) (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer system RAM and/or to less volatile computer storage media (devices) at a computer system. Thus, it should be understood that non-transitory computer-readable storage media (devices) can be included in computer system components that also (or even primarily) utilize transmission media.
Computer-executable instructions comprise, for example, instructions and data which, when executed by a processor, cause a general-purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. In some implementations, computer-executable instructions are executed on a general-purpose computer to turn the general-purpose computer into a special purpose computer implementing elements of the disclosure. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.
Those skilled in the art will appreciate that the disclosure may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, and the like. The disclosure may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
Implementations of the present disclosure can also be implemented in cloud computing environments. In this description, “cloud computing” is defined as a model for enabling on-demand network access to a shared pool of configurable computing resources. For example, cloud computing can be employed in the marketplace to offer ubiquitous and convenient on-demand access to the shared pool of configurable computing resources. The shared pool of configurable computing resources can be rapidly provisioned via virtualization and released with low management effort or service provider interaction, and then scaled accordingly.
A cloud-computing model can be composed of various characteristics such as, for example, on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, and so forth. A cloud-computing model can also expose various service models, such as, for example, Software as a Service (“SaaS”), Platform as a Service (“PaaS”), and Infrastructure as a Service (“IaaS”). A cloud-computing model can also be deployed using different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so forth. In this description and in the claims, a “cloud-computing environment” is an environment in which cloud computing is employed.
In particular implementations, processor 1602 includes hardware for executing instructions, such as those making up a computer program. As an example and not by way of limitation, to execute instructions, processor 1602 may retrieve (or fetch) the instructions from an internal register, an internal cache, memory 1604, or storage device 1606 and decode and execute them. In particular implementations, processor 1602 may include one or more internal caches for data, instructions, or addresses. As an example and not by way of limitation, processor 1602 may include one or more instruction caches, one or more data caches, and one or more translation lookaside buffers (TLBs). Instructions in the instruction caches may be copies of instructions in memory 1604 or storage device 1606.
Memory 1604 may be used for storing data, metadata, and programs for execution by the processor(s). Memory 1604 may include one or more of volatile and non-volatile memories, such as Random Access Memory (“RAM”), Read Only Memory (“ROM”), a solid state disk (“SSD”), Flash, Phase Change Memory (“PCM”), or other types of data storage. Memory 1604 may be internal or distributed memory.
Storage device 1606 includes storage for storing data or instructions. As an example and not by way of limitation, storage device 1606 can comprise a non-transitory storage medium described above. Storage device 1606 may include a hard disk drive (HDD), a floppy disk drive, flash memory, an optical disc, a magneto-optical disc, magnetic tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Storage device 1606 may include removable or non-removable (or fixed) media, where appropriate. Storage device 1606 may be internal or external to computing device 1600. In particular implementations, storage device 1606 is non-volatile, solid-state memory. In other implementations, Storage device 1606 includes read-only memory (ROM). Where appropriate, this ROM may be a mask programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), electrically alterable ROM (EAROM), or flash memory or a combination of two or more of these.
I/O interface 1608 allows a user to provide input to, receive output from, and otherwise transfer data to and receive data from computing device 1600. I/O interface 1608 may include a mouse, a keypad or a keyboard, a touch screen, a camera, an optical scanner, network interface, modem, other known I/O devices or a combination of such I/O interfaces. I/O interface 1608 may include one or more devices for presenting output to a user, including, but not limited to, a graphics engine, a display (e.g., a display screen), one or more output drivers (e.g., display drivers), one or more audio speakers, and one or more audio drivers. In certain implementations, I/O interface 1608 is configured to provide graphical data to a display for presentation to a user. The graphical data may be representative of one or more graphical user interfaces and/or any other graphical content as may serve a particular implementation.
Communication interface 1610 can include hardware, software, or both. In any event, communication interface 1610 can provide one or more interfaces for communication (such as, for example, packet-based communication) between computing device 1600 and one or more other computing devices or networks. As an example and not by way of limitation, communication interface 1610 may include a network interface controller (NIC) or network adapter for communicating with an Ethernet or other wire-based network or a wireless NIC (WNIC) or wireless adapter for communicating with a wireless network, such as a WI-FI.
Additionally or alternatively, communication interface 1610 may facilitate communications with an ad hoc network, a personal area network (PAN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), or one or more portions of the Internet or a combination of two or more of these. One or more portions of one or more of these networks may be wired or wireless. As an example, communication interface 1610 may facilitate communications with a wireless PAN (WPAN) (such as, for example, a BLUETOOTH WPAN), a WI-FI network, a WI-MAX network, a cellular telephone network (such as, for example, a Global System for Mobile Communications (GSM) network), or other suitable wireless network or a combination thereof.
Additionally, communication interface 1610 may facilitate communications various communication protocols. Examples of communication protocols that may be used include, but are not limited to, data transmission media, communications devices, Transmission Control Protocol (“TCP”), Internet Protocol (“IP”), File Transfer Protocol (“FTP”), Telnet, Hypertext Transfer Protocol (“HTTP”), Hypertext Transfer Protocol Secure (“HTTPS”), Session Initiation Protocol (“SIP”), Simple Object Access Protocol (“SOAP”), Extensible Mark-up Language (“XML”) and variations thereof, Simple Mail Transfer Protocol (“SMTP”), Real-Time Transport Protocol (“RTP”), User Datagram Protocol (“UDP”), Global System for Mobile Communications (“GSM”) technologies, Code Division Multiple Access (“CDMA”) technologies, Time Division Multiple Access (“TDMA”) technologies, Short Message Service (“SMS”), Multimedia Message Service (“MMS”), radio frequency (“RF”) signaling technologies, Long Term Evolution (“LTE”) technologies, wireless communication technologies, in-band and out-of-band signaling technologies, and other suitable communications networks and technologies.
Communication infrastructure 1612 may include hardware, software, or both that couples components of computing device 1600 to each other. As an example and not by way of limitation, communication infrastructure 1612 may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a front-side bus (FSB), a HYPERTRANSPORT (HT) interconnect, an Industry Standard Architecture (ISA) bus, an INFINIBAND interconnect, a low-pin-count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCIe) bus, a serial advanced technology attachment (SATA) bus, a Video Electronics Standards Association local (VLB) bus, or another suitable bus or a combination thereof.
In the foregoing specification, the invention has been described with reference to specific example embodiments thereof. Various embodiments and aspects of the invention(s) are described with reference to details discussed herein, and the accompanying drawings illustrate the various embodiments. The description above and drawings are illustrative of the invention and are not to be construed as limiting the invention. Numerous specific details are described to provide a thorough understanding of various embodiments of the present invention.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. For example, the methods described herein may be performed with less or more steps/acts or the steps/acts may be performed in differing orders. Additionally, the steps/acts described herein may be repeated or performed in parallel to one another or in parallel to different instances of the same or similar steps/acts. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.
This application claims priority to and the benefit of U.S. Provisional Patent Application No. 63/618,172, filed on Jan. 5, 2024, which is incorporated herein by reference in its entirety.
| Number | Date | Country | |
|---|---|---|---|
| 63618172 | Jan 2024 | US |