This disclosure is accompanied by a Computer Program Listing Appendix that includes a compact disk containing the files for an exemplary implementation of an enumerator. A listing of the files appears at the end of the printed specification. The contents of the Appendix are hereby incorporated by reference in this disclosure.
This invention relates to methods for analyzing the content encoded by general chemical structure descriptions such as, for example, Markush structures.
It is well known that the properties of matter, whether man made or natural, are defined by its chemical composition. Thus, it is common practice to use chemical structure representations in descriptions of property or utility information associated with compositions of matter. Chemical structures characterizing compositions of matter can be altered by changing:
Methods for constructing these general chemical structure representations can be divided into two groups:
1. those that are based on the reactions and precursors used to synthesize chemical library constituents, and
2. methods that describe the structures of reaction products.
These general chemical structure representations usually consist of instructions for:
(a) attaching various substituent groups (also called R-groups) with different secondary structure elements to core scaffolds;
(b) allowing or restricting variations of the atom compositions of certain constituents of either R-group collections or generic structure cores;
(c) identification of specifying attachment points in structure fragments, and
(d) instructions for varying the composition of matter by attaching substituent groupings to these attachment points in a combinatorial manner.
Since the purpose of chemical composition of matter patents is to disclose information on the utilities associated with compositions having a particular chemical structure design and to prevent others from making using or selling products with the same or similar molecular architecture, these general chemical structure representations are also frequently used for claiming compositions of matter and/or their utility. See, e.g., Markush E. A., U.S. Pat. No. 1,506,316.
Based on the case law, the term “Markush structure” is frequently used for chemical structure representations describing the content of claims in composition of matter patent applications. The term “Markush structure” is frequently also used for describing generic chemical structure representations defining the content of combinatorial chemical libraries and the contents of libraries containing collections of proteins, carbohydrate, DNA and RNA sequences. However there are fundamental differences between these two concepts.
For example, the content of combinatorial libraries and the contents of libraries containing collections of proteins, carbohydrate, DNA and RNA sequences encoded by generic chemical structure representations can be approximated by
1. selecting a random combination of structure fragments (substituent groups) and
2. by randomly attaching these structure fragments at various attachment points of a common structure core.
Using this “Chemical Structure Space” filling strategy enables these random enumeration processes to create a homogenous distribution of matter around a common structure core.
In contrast, Markush structure definitions in patent claims reflect the knowledge associated with specific structure property relationship and therefore claims define an inhomogeneous distribution of matter in chemical structure space. Thus while two inventions may share common structure cores, the claims in these different patents may specify the production of entirely different molecules and produce a non-overlapping distribution of matter in chemical structure space. Accordingly, inventors may be entitled to a patent for particular compositions of matter despite the fact that two different inventions operate in the same section of the Markush structure space. This situation is created if the claims in different patent applications are drafted in such a way that non-overlapping composition of matter can be created. Detecting if the claim language in different patents produces overlapping inventions is one of the key objectives in chemical patent examination.
Accordingly, the examination of composition of matter patent applications devotes a great deal of effort for identifying and examining patents with similar Markush structure contents. The same is true for inventors and applicants for patent applications which need to examine if issued patents provide enough freedom to claim non-overlapping compositions. Since the likelihood that a new application describes matter that encroaches on already patented chemical structure space is highest for compositions of matter exhibiting similar Markush structure cores, machine methods have been developed for identifying prior art documents that exhibit similar Markush structure cores. Currently two machine readable data sources are available for conducting these prior art Markush structure searches: One of them is the Marpat database (see, e.g., U.S. Pat. No. 4,642,762) and the other is the MMS database as described in EPO451049. It is not uncommon that Markush structure similarity searches conducted in these two databases identify hundreds of documents. Since the scope of intellectual property is defined by the Markush structure claims in each of these documents, determining if any one of the Markush structures claim in these prior art document collections defines chemical structure space that overlaps with a new invention requires the examination of hundreds of documents. Since no machine methods are available for this examination, each Markush structure in each document needs to be scrutinized manually and this process is currently entirely based on mental enumeration. The term “enumeration” hereinafter used refers to a process for constructing individual chemical structures (compounds) based on the known chemical bonding principles for connecting structure fragments defined in the Markush structure patent claims.
With reference to
However, no matter how much time is devoted to this process, all of these mental enumeration processes are incomplete and subjective because generic chemical structure descriptions generally can encode a near infinite number of compositions. Thus, because of this open-ended nature, it becomes impossible to list the near infinite number of possible enumeration products for hundreds of examined patent documents.
Accordingly, all known methods for analyzing the content encoded by Markush structures rely on partial enumeration. (Anton Fliri, Discovery Knowledge & Informatics 2007, Presentation Apr. 24, 2007; Szabolcs Csepregi, et. al. UGM 2007 Presentations, Jun. 21, 2007).
A further limitation of manual patent examination arises due to the complexity of Markush structure definitions in patent claims. Moreover, because there are no standards for defining the nomenclature for Markush structure definitions in patent claims; the comparison of documents of different origins requires translation of the terminology that is used in different documents into common formats. This translation step requires expert knowledge because the determination of structural equivalencies between different terminologies requires assessment of topological relationships between different Markush structures. This evaluation is further complicated because this analysis may encounter open-ended and indefinite terminology for describing collections of chemical structure fragments with similar physicochemical properties. For example, the generic term “alkyl” is frequently used to describe an infinite number of arrangements between an infinite number of carbon atoms each bearing potentially four different flavors of chain lengths and carbon atom arrangements. Likewise the generic term “heteroaryl” is used to encode a near infinite number of aromatic carbon based ring systems each containing one or more hetero atoms. (See e.g., Burton A. Leland et. al. J. Chem. Inf. Comput. Sci.; Volume 3, Issue, 1997, pages 62-70). Adding to the complexity of these chemical topology descriptions, the claim text in patents frequently restricts the scope of these indefinite terminologies in a non-standardized way by defining discrete subsets of these terminologies. The definitions of these subsets, in turn, may be influenced by an inventor's motive to identify specific structure property relationship in the claim language or reflect requirements imposed by patent examiners. In this respect, the precise identification of these open ended and indefinite definitions may take on the form of an independent Markush structure analysis.
Because of the complexities involved, the identification and comparison of chemical matter defined by different Markush structure claims represents one of the most resource-consuming activities of chemical patent examination. Equally complex and time consuming is the analysis of the freedom to operate and the interpretation of structure function information encoded by general chemical structure representations. Moreover, since the production of mental enumeration results is a tiring, time consuming and error prone process, it is well recognized that mistakes made during the examination of chemical composition of matter patents affect the quality and value of the claimed intellectual property.
Thus, despite the immense value of the intellectual property encoded in the form of Markush structures, access to this information is currently limited. Making matters worse, modern manufacturing methods and processes rapidly increase the volume of new information encoded by generic chemical structure representations. Accordingly, there is a need for developing machine methods that could assist in Markush structure claim analysis.
Since the current processes employed for this purpose are based on the result of mental enumeration, machine methods enabling Markush structure enumeration would reduce the uncertainty associated with mental analysis of Markush structure claims and hence decrease the concomitant risk of patent litigation. Likewise, methods that create machine processable renderings of Markush structure descriptions in different patent documents would reduce the time of the comparison of Markush structure claim descriptions in different documents. Moreover, machine methods for enumerating Markush structures would be useful for identifying and comparing the distribution characteristics of compositions of Markush structures defined by patent claims, and hence make patent examination more precise. Moreover, methods for enumerating Markush structures would also be useful for crafting the claim language in new patent applications. Likewise, methods for enumerating Markush structures would also be useful for any person desiring thorough analysis of the freedom to operate. In addition, methods for enumerating Markush structures would also be useful for identifying structure function information encoded in generic chemical structure representations such as combinatorial libraries. Finally, methods for rendering the indefinite and open-ended semantic terminologies in Markush structure text instructions into machine readable forms can be used not only for enumerating general chemical structure representations, but also provide machine processable standards that enable the comparison of the contents of general chemical structure representations. The term content in this respect refers to the sum of all of the individual chemical structures that could be made following the descriptions and text instructions associated with general chemical structure descriptions.
a-5m show examples of libraries containing enumerable Markush structure topology descriptors (structure fragments libraries) used for translating Markush structure topology information such as for example the MKST topology information that is used in the MMS data base into enumerable form.
a and 6b illustrate an example of a descriptor as rendered by the definer.
a and 8b illustrate examples of user-generated claim rules for enumerating a commercially available database.
a-9n illustrate examples of structure fragment libraries for translating a semantic term into enumerable form.
Since these renderings create machine readable renderings of Markush structure representations from different documents, these processes are useful for the machine-assisted translation of different semantic expressions of Markush structure claims into common formats.
Another aspect of the invention provided by process 5 of the general schema is the reduction of uncertainties associated with the outcome of mental analysis of complex Markush structure information by exacting, through claim specific Markush structure enumeration, the determination of the scope of Markush structure claims in patent applications. Thus, this aspect of the invention has implicit utility for determining risks associated with patent litigation and utility for estimating the value of intellectual property.
A further aspect of the invention provided by process 6 of the general schema is utility in data mining applications through the extraction of structure property relationship information encoded in Markush structure representations which, for example, may involve calculation of molecular properties associated with enumerated species and the identification of structure property similarity relationships.
Yet another aspect of the invention provided by process 2 of the general schema is the ability to render generic, indefinite and open-ended terminology that is used in formulating Markush structure claim descriptions in documents and in the MMS and Marpat databases into machine readable form. Accordingly, the invention has utility for creating chemical structure fragment topology descriptors and databases containing these structure fragment collections, and implicitly has also utility in processes for enumerating general chemical structure representations.
Another aspect of the invention pertains the use of processes 5b and 4c in effecting analysis of complex structure utility and structure property relationships through the creation of characteristic fingerprints of the text and chemical structure fragment based instructions that are used in the construction of Markush structure claims in chemical composition of matter patent claims, or general chemical structure representations describing general structure function observations in documents. Process 5b of the general schema effects the comparison of information associated with enumerated species by using fingerprints of structure fragments and fingerprints of text mining derived information, such as information associated with claim origin, properties or utilities of claimed inventions, expressed as co-invention frequencies of utilities in compositions of matter patents. Hence, process 5b has utility for determining associations between utilities and chemical structure designs in certain technology areas. Therefore, the invention has utility for characterizing landscapes and scopes of innovations disclosed in patent documents.
A further aspect of process 6 of the general schema is its utility for effecting the comparison of structure fragment fingerprints of enumerated species with structure fragment fingerprints specified by the claim text of different patent documents. This aspect enables the simultaneous consideration of a broad spectrum of claim relationships defined by a plurality of Markush structures. This aspect of the invention is depicted in
Accordingly, the invention has utility in comparative analysis of patent claims defining compositions of matter in various areas of innovation. The general schema comprises of a combination of processes 1-6. Process 1 consists of a combination of steps 1a-1d, to generate and store Markush structure descriptors. Step 1a of the general schema effects the sending of Markush structure related search results, originating from either structure or text queries entered via a user interface, to Markus structure databases containing Markush structure topology information for search results, such as, for example, the MMS or Marpat Markush structure databases, or equivalents thereof. Step 1b of the general schema effects the importing of the Markush structure topology information from the Markush structure databases into a Markush structure topology definer. The Markush structure topology definer effects the rendering of the Markush structure topology information into enumerable Markush structure topology descriptors. An example of such descriptors is illustrated in
Process 2 of the general schema consists of a combination of steps 2a-d, to create and store substituent fragment topology descriptors. Step 2a effects the recognition of generic, indefinite and open-ended terminology of substituent definitions that have been imported either by a Markush structure topology definer from a Markush structure database, such as, for example, the “MMS” or “Marpat” Markush structure databases. In the alternative, step 2a may also effect the recognition of generic, indefinite and open-ended terminology found in the claim text of patent documents, in the claim text of patent applications and in user rendered descriptions of general chemical structure descriptions. Step 2b of the general schema constitutes a “superatom definer” for effecting the automatic or the user guided rendering of the generic, open-ended and indefinite terminologies into enumerable substituent fragment topology descriptors. This is done by replacing the generic, open-ended and indefinite substituent definitions with a list of substituent fragment topology descriptors consisting of structure fragments described in prior art patent applications within the scope of the generic, open-ended and indefinite substituent definitions, or within the scope of composition of matter inventions considered for analysis by a user.
Step 2c of the general schema effects the exporting of the substituent fragment topology descriptors into one or multiple databases effecting the storing, retrieving and processing of the structure fragment topology descriptors. Step 2d of the general schema effects the importing of the substituent structure fragment topology descriptors from the databases by Markush Structure Enumerators.
Process 3 of the general schema consists of a third combination of steps 3a-d, for user-guided creation and storage of enumeration-ready topology descriptors. Step 3a effects the rendering of general chemical structure topology descriptions in enumeration-readied form by using commercially available software enabling the drawing of chemical structure representations, such as for example Chemdraw, Isis or Marvin. Step 3a provides also an alternative for creating the renderings by importing Markush structure topology information from the MMS or Marpat databases and by translating the imported Markush structure topology information into enumeration-ready form by creating Markush structure topology descriptors using either machine-aided translation processes or user guided means, such as, for example Chemdraw, MDL derived tools, STN, DARC, the KMS indexing station or Marvin. Step 3b of the general schema effects the generation of enumeration rules by creating associations between enumeration-ready topology descriptors and Markush structure text information. For example, user guided associations can be created between specific Markush structure core topology descriptors (genus). User-guided associations can also be created between attachment points in the core Markush structure topology descriptors and the topology descriptors of substituent groupings according to the text of chemical composition of matter patent claims.
Process 4 of the general schema consists of a combination of steps 4a-b, for the creation and storage of claim rule fingerprints. Step 4a of the general schema consists of processes for the automated construction of “enumeration rules” by effecting the identification of core Markush structure topology descriptors (genus), the identification attachment points of core Markush structure topology descriptors and the identification of combinations of substituent structure fragment descriptors on the attachment points, using machine readable renderings of text instructions provided by structure function information associated with generic chemical structure representations or by the patent claims in chemical composition of matter patents. Step 4b of the general schema effects the storing and retrieving of the enumeration rules in enumeration rule databases. Step 4c of the general schema effects the generation of structure fragment topology fingerprints of the enumeration rules and the storing of the structure fragment topology fingerprints in claim rule fingerprint databases. An example of a fingerprint is illustrated in
Process 5 of the general schema consists of a combination of steps 5a-b to create and name individual species. Step 5a of the general schema effects the importing of the enumeration-readied Markush structure topology descriptors from the database records produced by steps 1d and 3d into a Markush structure enumerator. It also effects the importing of substituent structure fragment topology descriptors resulting from step 2d and the importing of enumeration rules resulting from step 4e. Step 5a further includes a method for iteratively attaching the enumeration-readied Markush structure topology descriptors to the substituent structure fragment topology descriptors using a random selection of substituent groupings in a manner defined by the enumeration rules. Step 5b of the general schema effects the assignment of registration codes to enumerated species and the association of the registration numbers with the enumeration rule information yielding the species, the topology descriptors defining the chemical structures and the information defining the origin of the Markush structure descriptors yielding the enumerated species. Step 5b further includes a method for exporting the associated information into an enumerated compound database. Step 5b also effects the creation of chemical structure topology fingerprints of enumerated species in the enumerated compound database. This feature comprises a method for associating the structure topology fingerprints of a species and the enumeration rules yielding that species with the information defining the origin of the Markush structure claiming the species. The associated information is stored in a database, for fingerprint analysis.
The Computer Program Listing Appendix that accompanies this disclosure contains program files for a Java implementation of an one embodiment of an enumerator that performs the foregoing procedures.
Process 6 consists of a combination of steps 6a-b to identify the relationships between enumerated structures and render the results for viewing. Step 6a of the general schema effects the importing of structure topology fingerprints from the database into a fingerprint and rule analyzer, and the importing of enumeration rule fingerprints readied for fingerprint analysis. Step 6a also comprises a method for identifying fingerprint similarities such as for example methods for fingerprint profile comparison or the hierarchical clustering of fingerprints using commercially available clustering algorithms, such as Wards method or UPGMA, in combination with commercially available data analysis platforms such as, for example, Spotfire. Step 6b effects the rendering of results derived from fingerprint similarity relationship analysis for visual display at the user interface or made accessible to the end-user using other reporting instruments.
An example shown in
For purpose of file size limitation only, the examples shown in
Referring to
The first example of
The second example of
The last set of examples, starting with
In another part of the patent claims, similar heterorings may be defined, but they are able to carry a given number of optional substituents defined in the patent, in this case as R13. Then, the resulting graphs will not be the same as those of the previous figures, but similar with the presence of this optional substituent. The examples of
Thus, there are possible gateways between the superatoms database and the patent database. The name GHR13 has been built dynamically in a way to maintain the superatoms database completely independent from the patent database. It will be noticed that the G13 graph contains references to alkyl and haloalkyls substructures, which are represented by other superatoms belonging to the chains segment of the superatoms database.
A fourth aspect of the invention is the creation of Superatom libraries containing, for example, collections of structure fragments describing building blocks occurring, for example, in natural and unnatural amino acids or derivatives thereof, collections of structure fragments describing building blocks of Protein sequences, collections of structure fragments describing building blocks for DNA sequences and derivatives thereof, collections of structure fragments describing building blocks for RNA sequences and collections of structure fragments describing building blocks for carbohydrates or derivatives thereof enabling the combination of pertinent structure fragments or building blocks for creating specific protein, DNA, RNA or carbohydrate sequences if pertinent Markush structure descriptions so require.
All publications and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. While the invention has been described in terms of various preferred embodiments, the skilled artisan will appreciate that various modifications, substitutions, omissions, and changes may be made without departing from the spirit thereof. Accordingly, it is intended that the scope of the present invention be limited solely by the scope of the following claims, including equivalents thereof.
Number | Date | Country | |
---|---|---|---|
60960835 | Oct 2007 | US |