Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human—computer interaction and the understanding and interpretation of words, sentences, and grammars. Some of the challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation, such as for interactive voice response (IVR) systems.
One aspect of understanding and/or interpreting language involves the construction of a model or representation of a string of words, such as a sentence. The model or representation may be based on an underlying set of rules or relationships that define how communication is conducted using a language, such as a specific grammar. The model or representation may be constructed using a process or operation termed “parsing” or as the output of the operation of an element known as a parser. A natural language parser is a software program that may be used to analyze the grammatical structure of sentences, for instance, which groups of words go together (as “phrases”) and which words are the subject or object of a verb. Probabilistic parsers use knowledge of language gained from hand-parsed (and presumably correct) sentences to try to produce the most likely analysis of new sentences. This typically involves the development of a training set of sentences that have been correctly parsed and then used as examples of correct outputs for the parser to learn from.
Parsing or syntactic analysis is the process of analyzing a string of symbols, either in natural language or in a computer language, conforming to the rules of a formal grammar. The term has slightly different meanings in different branches of linguistics and computer science. Traditional sentence parsing is often performed as a method of understanding the meaning of a sentence, sometimes with the aid of devices such as sentence diagrams. It typically emphasizes the importance of grammatical elements such as subject and predicate. Within computational linguistics the term is used to refer to the formal analysis by a computer of a sentence or string of words into its constituents, and may produce a parse tree or other structure showing their syntactic relation to each other, which may also contain semantic and other information. As a result, the efficient and accurate generation of a parse tree or other representational structure is an area of research, as it is a tool used in other aspects of NLP work.
A “treebank” is a parsed text corpus that annotates syntactic or semantic sentence structures. Treebanks are often created on top of a corpus that has already been annotated with part-of-speech tags. In turn, treebanks are sometimes enhanced with semantic or other linguistic information. Treebanks can be created completely manually, where linguists annotate each sentence with syntactic structure, or semi-automatically, where a parser assigns some syntactic structure which linguists then check and, if necessary, correct. In practice, fully checking and completing the parsing of natural language corpora is a labor-intensive project that can take teams of graduate linguists several years. The level of annotation detail and the breadth of the linguistic sample determine the difficulty of the task and the length of time required to build an acceptable treebank. Treebanks can be used as training data for a parser and as a source of research data in their own right for purposes of linguistic analysis, etc.
Typically, a parser is a computer implemented process or set of operations that takes a string of words as an input and uses a selected grammar (which is represented by the specific operations, rules, etc. that are implemented by the process) to determine the relationships between the words and represent the string as a tree or other structure. The parser may function to select a specific operation on (or manipulation of) one or more of the words in the process of determining the relationship(s) that satisfy the definitions and requirements of the grammar. The selected operation or manipulation may be the result of applying a set of rules or conditions that satisfy or define the grammar, and represent allowable, required, or impermissible relationships between words or sequences of words or elements.
Parsers are typically “trained” using a set of input data that represent what are considered to be “correctly” parsed sentences or strings, such as the previously mentioned “treebank”. However, there are a limited number of sets of such correctly parsed sentences/strings, as it requires a substantial amount of work to create them. This has the unfortunate side effect that many parsers are optimized to produce correct outputs based on a set of inputs that is representative of a particular type or category of sentences or strings (and which may satisfy a specific grammar), but may not include sufficient examples of strings or relationships that occur in other areas (such as other forms of logical relationships, statements, questions, dependent phrases, grammars, etc.). The result is to produce a parser that is generally accurate for inputs that are sufficiently close to or related to the training set, but that may introduce errors for other types of input sentences, strings, grammars, or structures. Since a parser is used to generate the output data that serves as the basis for constructing a parse tree (and hence a treebank), this means that the parse trees created using parsers trained in such a manner will also have errors.
Conventional approaches to generating a parse tree or treebank typically rely on using a parser that was trained on one of a limited number of sets of training data. While useful, this approach is inherently limited as the parser becomes optimized for sentences or data strings that are closer to, or share certain characteristics with, the training set. This can result in errors in the parse trees constructed for the actual inputs, if those inputs differ in certain ways from the training set. As a result, a treebank built from a specific corpus may also contain errors, or at least be sub-optimal in terms of its accuracy and utility. Thus, systems and methods are needed for more efficiently and correctly generating training data, parse trees, and a treebank from a corpus of text that differs from the data used to train an existing parser. Embodiments of the invention are directed toward solving these and other problems individually and collectively.
The terms “invention,” “the invention,” “this invention” and “the present invention” as used herein are intended to refer broadly to all of the subject matter described in this document and to the claims. Statements containing these terms should be understood not to limit the subject matter described herein or to limit the meaning or scope of the claims. Embodiments of the invention covered by this patent are defined by the claims and not by this summary. This summary is a high-level overview of various aspects of the invention and introduces some of the concepts that are further described in the Detailed Description section below. This summary is not intended to identify key, required, or essential features of the claimed subject matter, nor is it intended to be used in isolation to determine the scope of the claimed subject matter. The subject matter should be understood by reference to appropriate portions of the entire specification of this patent, to any or all drawings, and to each claim.
Embodiments of the invention are directed to systems, apparatuses, and methods for generating a parser training set and ultimately a correct treebank for a corpus of text, based on using an existing parser that was trained on a different corpus, and in some cases, a corpus of a different type or character (e.g., using a parser initially trained on speeches to parse a corpus comprised of hypothetical questions). In some embodiments this is achieved by modifying the operation of the previously trained parser through the introduction of one or more constraints on the output parse tree it creates, and then performing one or more re-iterations of the parsing operation. This causes the parser to be re-trained on samples of the new corpus in a more efficient manner than by use of conventional approaches (which are typically very labor intensive).
Embodiments of the invention are also directed to systems, apparatuses, and methods for improving the operation of a parser in the situation of using a less familiar set of training data than is typically used to train a conventional parser. These implementations of the invention can be used to generate a more effective and accurate parser for a new corpus of inputs (and hence produce more accurate parse trees) with significantly less effort than would be required if it was necessary to generate a standard size training set.
In one embodiment, the invention enables the input of an instruction, signal, or command that operates to cause the parser to prevent the formation of a specified connection between inputs. In one embodiment, the invention enables the input of an instruction, signal, or command that operates to cause the parser to require a certain connection between inputs. As a result of the instruction, signal, or command, when the parser “re-parses” the input it generates a more accurate representation of an input sentence with less reliance on a typical sized training set. In some embodiments the invention may be used to generate a treebank based on a new corpus of text in a more efficient manner than by use of conventional approaches to constructing a treebank.
In one embodiment, the invention is directed to a method for modifying the operation of a parser, where the method includes:
receiving data representing an input sentence;
generating a display of a structure representing the input sentence based on a specific parsing process;
receiving one or more inputs representing changes to the displayed structure;
generating a corrected structure representing the input sentence based on the specific parsing process as modified by the received inputs; and
training a parser to reliably learn a parsing process based on the specific parsing process as modified by the one or more received inputs.
In another embodiment, the invention is directed to an apparatus for An apparatus, comprising:
an electronic data processing element;
a set of instructions stored on a non-transient medium and executable by the electronic data processing element, which when executed cause the apparatus to
In yet another embodiment, the invention is directed to a system comprising:
a data storage element containing data representing one or more sentences or strings of characters;
an electronic data processing element;
a set of instructions stored on a non-transient medium and executable by the electronic data processing element, which when executed cause the system to
Other objects and advantages of the present invention will be apparent to one of ordinary skill in the art upon review of the detailed description of the present invention and the included figures.
Embodiments of the invention in accordance with the present disclosure will be described with reference to the drawings, in which:
Note that the same numbers are used throughout the disclosure and figures to reference like components and features.
The subject matter of embodiments of the present invention is described here with specificity to meet statutory requirements, but this description is not necessarily intended to limit the scope of the claims. The claimed subject matter may be embodied in other ways, may include different elements or steps, and may be used in conjunction with other existing or future technologies. This description should not be interpreted as implying any particular order or arrangement among or between various steps or elements except when the order of individual steps or arrangement of elements is explicitly described.
Embodiments of the invention will be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show, by way of illustration, exemplary embodiments by which the invention may be practiced. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy the statutory requirements and convey the scope of the invention to those skilled in the art.
Among other things, the present invention may be embodied in whole or in part as a system, as one or more methods, or as one or more devices. Embodiments of the invention may take the form of a hardware implemented embodiment, a software implemented embodiment, or an embodiment combining software and hardware aspects. For example, in some embodiments, one or more of the operations, functions, processes, or methods described herein may be implemented by one or more suitable processing elements (such as a processor, microprocessor, CPU, controller, etc.) that is part of a client device, server, network element, or other form of computing or data processing device/platform and that is programmed with a set of executable instructions (e.g., software instructions), where the instructions may be stored in a suitable data storage element. In some embodiments, one or more of the operations, functions, processes, or methods described herein may be implemented by a specialized form of hardware, such as a programmable gate array, application specific integrated circuit (ASIC), or the like. The following detailed description is, therefore, not to be taken in a limiting sense.
Embodiments of the present invention are directed to systems, apparatuses, and methods for more efficiently generating a set of parse trees or a treebank from a corpus of text, by modifying the operation of a parser that has previously been trained on a different corpus. In some embodiments, a human annotator may provide a correction or instruction that is used by the parser to modify/correct a parsing operation when it re-parses a previously input and parsed string of characters or elements. The correction or instruction may be in the form of a requirement that the output parse tree(s) contain a specific connection (or arc) between input elements (such as words) or that the output parse tree(s) not contain a certain connection between input elements (i.e., such a connection or relationship is forbidden). Other forms of correction, modification, conditions, or instruction are also possible (such as those mentioned later herein). The information provided by the annotator assists in training the parser more quickly on the new corpus of text, and hence in producing a correct set of parse tress and treebank based on the new corpus.
In some embodiments, the correction, modification, or instruction may be provided to the parser in the form of a control signal that is generated by a process that applies one or more rules or evaluations (such as by a cost or value function, or by a machine learning technique) to the parser output. The control signal may be part of an adaptive feedback process that causes the parser to converge towards correct operation on inputs representing the new corpus. In such embodiments, the parser operation may be modified by a process that may rely less on human inputs, if at all.
As shown in the
As mentioned,
A natural language parser is a software/computer implemented process or component that takes natural language text as an input and produces a hierarchical data structure as an output. Typically, this data structure represents syntactic or semantic information that is conveyed implicitly by the input text (based on its arrangement and the assumed underlying grammar). The parsing operation may be preceded by a separate lexical analyzer (sometimes referred to as a “tokenizer”), which creates “tokens” from the sequence of input characters.
The final phase of the illustrated process is semantic parsing or analysis 310, which involves determining the implications of the expression that was reconstructed/validated, and taking the appropriate action. In the case of a calculator or interpreter, the action is typically to evaluate the expression or program; a compiler, on the other hand, would be expected to generate some kind of instruction set or code. Note that attribute grammars can also be used to define these actions.
In some sense, the operation of a generic parser may be described (at a high level) as implementing a sequence of data processing steps, functions or operations. These may include one or more of:
Note that the description of the operation of a generic parser relies on a set of rules, constraints, functions, etc. that may not be optimal or even suitable for certain grammars and/or domains. The parser's “learning” of the grammar and ability to construct an accurate network representation of a new string/sentence after being trained on a set of correctly parsed strings/sentences means that the trained parser operates in accordance with (i.e., makes decisions based on) the rules/patterns of the specific grammar and/or acceptable practices of a certain domain. However, those rules, patterns and/or practices may not be optimal, relevant, or applicable for a different domain (such as text that represents a different category of information or has a different sentence structure). This is one reason why a parser that is trained on a specific domain may not produce sufficiently accurate results when used to evaluate a string/sentence from another domain.
To resolve this problem, when attempting to construct a set of treebanks or network diagrams for a specific domain, in one embodiment, the invention permits the introduction of a new rule or constraint based on an input provided by a person or one generated by an automated learning process. The new rule or constraint causes a change in the operation of the process that evaluates the “value” of a specific arrangement of “units”/nodes and connections. This typically alters the final structure of the network or “tree” that is determined to maximize/minimize/optimize the cost or value function for that arrangement of “units”/nodes. As will be described in greater detail, in some embodiments, the constraint may prevent a certain connection, require a certain connection, set a certain fixed or variable value for a certain connection, place a minimum or maximum threshold value on a certain connection, or apply other suitable constraint, rule, requirement or condition.
This approach permits the parser to adaptively and efficiently alter/modify its operation to take into account the new rule or constraint, and as a result, to generate a new parse tree or other representative structure (such as a network diagram, etc.) for a string/sentence from a different domain. Embodiments of the inventive system and methods utilize a user/person/annotator (and/or a machine learning process that functions in a similar manner) to more efficiently (as compared to building a new parser) train the parser on the new domain.
This is of great value when applying a previously trained parser to a new type of input, such as that from a different category or type of input than was used to initially train the parser (such as a type of input having a different grammar or set of controlling rules than the training set). As a result, a treebank or other form of output may be generated more quickly than by use of conventional approaches to building and training natural language parsers.
In natural language parsing, one task of a parser is to recover the most probable latent hierarchical structure from a flat representation of a sentence or string of characters (i.e., to construct a parse tree or other representation of nodes and connections from the flat structure). There are at least two ways in which this is conventionally done:
Current state-of-the-art dependency parsers are statistical in nature. That is, they learn how to parse from examples. Specifically, given a treebank (i.e., a database of sentences and their correct parsing), these systems train statistical models that are then used to parse new sentences, and often with a relatively high degree of accuracy. However, one of the disadvantages of current statistical parsers is their reliance on the Penn Treebank, a database of roughly 40,000 hand-parsed sentences from the Wall Street Journal. Since most freely available parsers train nearly exclusively on this treebank, they tend to be good at parsing news articles (as would be expected, given the source), but poorer at operating in other domains in which different grammars or terminology may apply.
Although there have been smaller-scale efforts to build more hand-parsed treebanks to be used for training a parser, the total number of publicly available hand-parsed trees is a relatively small number (e.g., they number in only the tens of thousands). Unfortunately, people have not developed more varied and larger treebanks because constructing parse trees can be difficult and time-consuming. As noted, embodiments of the inventive system and methods described herein are intended to address this problem by, among other things, providing a way to adaptively modify a parser trained on one set of correctly parsed inputs so that it may operate more effectively and accurately on a set of inputs from a different domain and/or that follow a different grammar.
“Oil lamp”=[S, R, L, S].
A transition-based dependency parser parses a sentence by finding the most likely sequence of transition operators, according to its trained statistical models. In a sense the parser is attempting to find that sequence of operators (where application of an operator enables a transition from a first node/token to a second node/token) that results in what it has “learned” to be the optimum or “best” parsing of the input string (based on evaluating a correctly parsed training set, and typically a comparison control set). Note that because of the ability to express the dependency parse of an N-token sentence/string as a 2N-length sequence of transition operators, the total number of possible operations can be determined based on the number of tokens in the input. This provides guidance on the estimated computational resources needed to parse a set of inputs and to correctly construct a treebank (and may be compared to the results provided by alternative approaches, such as an implementation of the inventive system and methods).
In a simplified form, a transition-based parser might implement a form of the following algorithm or process:
Parse (n-length sentence):
Transitions=[ ]
For i=1 to 2n
Return transitions
The operators modify a stack-and-buffer until a single parse tree is formed on the stack. For instance, in the example given:
Note that one step in the algorithm or heuristic is to select or choose the “correct” transition operator from a set of allowable operators, as governed by one or more rules or constraints, and where the “correct” choice may depend on determination of an associated cost or value (such as the parsing being correct or incorrect). Thus, a separate concern is that of how to train or configure a parsing system that implements the algorithm to choose the correct transition operator. This aspect (that of training a classifier to identify the “best” or “correct” decision with regards to the appropriate transition operator) is typically addressed by some form of adaptive feedback system, an example of which will be described in greater detail herein.
As shown in the figure, a base parser 502 (that is, a parser or parsing engine previously trained on a different corpus of documents) is used to parse a set of sentences derived from a new corpus (contained in the “unparsed sentences” data storage element 504). An input, such as a control signal or instruction 505 (e.g., one generated by a human annotator or a control signal generated by an automated machine-learning or decision process) is provided to the “banker” 506 which operates to generate the parse trees and the resultant treebank by controlling, modifying, or instructing the operation of parser 502. The outputs of banker 506 are a set of parse trees (i.e., a treebank) that represent better or more correct parsing of the input strings 504, as denoted by “gold parses” 508 in the figure.
In a general sense, the banker 506 is receiving information from a user or model (in the form of an instruction 505) that causes the base parser 502 to more accurately parse inputs from a domain 504 that was not previously used to train the parser. The output (508) represents a more correctly parsed set of inputs (504) than would be obtained by the action of parser 502 in the absence of input 505. This is a form of re-training or adaptively modifying the behavior of parser 502 by providing it with incremental changes to its operation, rather than requiring a more extensive training set for the new domain (which, as noted, may not exist or be reliable enough for these purposes). One result of the inventive methods is thus to generate a set of correctly parsed input strings (a treebank) for the domain.
In one embodiment, the action of the annotator or control signal 505 may cause banker 506 to modify the operation of parser 502 by implementing one or more constraints, modifications, conditions, requirements, exclusions, or rules on the operation of the parser, such as the following examples:
ForbiddenArc(W,X): this means that in the final tree, do not permit an arc between words W and X; or
RequestedArc(W,X): this means that in the final tree, guarantee an arc between words W and X.
The ForbiddenArc and RequestedArc constraints operate to force the parser to exclude or include a particular connection between “units”, nodes, tokens, words, or elements in the output of the parser, which is a representation of the parsed input string or sentence. This may produce a different network/tree structure than would occur without introduction of the constraint. Thus, in some embodiments, the new condition or constraint functions to introduce knowledge from an “expert” (such as the annotator or a machine learning output) into the operation of the parser (via the interpretive or other operations performed by the banker), and thereby modify its behavior. As mentioned, the knowledge may be an input provided by a person (who is in effect using their expert knowledge/learning about grammar and sentence structure to indicate errors in the parser's operation on the input string) or by a machine learning, neural network, statistical analysis, or other automated decision process.
In some embodiments, a set of correctly parsed sentences (commonly termed “gold parses”) may be constructed using the inputs of an annotator. In other embodiments, a set of correctly (or in some instances, more correctly) parsed sentences may be constructed using the inputs of an automated decision process and/or annotator. Note that if an automated decision process is used, it will base its evaluation of whether a sentence parsing is correct (or more nearly correct) based on the value of a metric, goal function, rating, etc. Thus, the accuracy and predictive value of the decision process will depend to some extent upon how the metric or goal function is defined and constructed.
Given a set of correctly parsed sentences, this set may be used as examples or inputs to a machine learning or other automated process that uses the gold parses as examples for training purposes. This can be used to enable the parser to “learn” from the correct parsing(s) in order to intelligently adapt its operation, and become capable of efficiently constructing correct parses of sentences with little or no inputs from an annotator or automated evaluating process. A large enough set of such correctly parsed sentences may then be used as a treebank. This learning capability of the parser may be introduced through use of an adaptive feedback loop or “on-line learner” (e.g., perceptron or MIRA, two techniques that adapt the weights of a log-linear model in response to new training data).
As mentioned, in some embodiments, an automated decision process may be used by itself or in conjunction with the inputs of an annotator to construct a set of correctly parsed sentences. In one embodiment, the automated decision process may be an adaptive feedback process that is used to replace or partially replace the inputs provided by the annotator. This can be an effective method of generating a larger set of correctly (or generally correctly) parsed sentences in situations where the reasoning of the annotator can be encapsulated in one or more explicit metrics, goal functions, rules, or other forms of evaluation. For example,
As shown in
The output 806 may be sampled, interpreted, modelled, evaluated, etc. and compared in some manner to a correctly parsed version of the input (as suggested by element or process 808 and 812 in the figure). In some embodiments, this may be done by scoring or otherwise quantifying how the parsed input 806 compares to a known correctly parsed version 812 of that same input. This may be accomplished by generating a “score” or other metric that represents the result of comparing the parsed input to its known correct parsing, using a suitable scoring method, algorithm, heuristic, rule, condition, etc. that is implemented by element or process 808.
As one example, such a scoring method may be what is known as the “Unlabeled Attachment Score” (UAS). This method takes advantage of the property that every node of a rooted directed tree (except for the root) has exactly one parent. This permits re-expressing a parse tree in terms of a set of node-parent relationships. The UAS method constructs the node-parent relationships for both the parsed output and the known correct parsing, and then compares the two sets of relationships to generate a score (which may be the percentage of correct relationships that the parsed output contains).
In the situation in which the parse trees include labels (such as grammar parts), a scoring method known as “Labeled Attachment Score” (LAS) may be used. This method operates in a similar manner to UAS, but is able to take labels on arcs that connect nodes/tokens into consideration. In some sense, it evaluates the accuracy of the parser in identifying the correct label for a token or string element.
Given the comparison score or metric generated by element or process 808, adaptive feedback control loop 800 then generates a control signal or modified instruction for parser 804 using a suitable element or process 810 (e.g., a condition, constraint, rule, requirement, threshold, etc.). This control signal or modified instruction alters the operation of parser 804 (and in some embodiments, may implement certain of the same functions or processes as banker 508 in
As mentioned, after parser 804 is able to generate sufficiently accurate parsing(s) of a set of inputs (based on inputs of an annotator and/or an automated decision process), a set of correctly labeled parse trees (which form the contents of a new treebank) may be used to train a classifier. The classifier (in this example, a 4-way classifier) operates to select the “best” transition operator (e.g., Shift, Reduce, Left Arc, Right Arc) given appropriate input data representing a characteristic of a node/label combination.
Below are additional examples of possible constraints, rules, conditions or instructions that may be applied to the operation of a base parser to improve its operation on example inputs from a new corpus. Certain of these possible constraints, rules, conditions or instructions may be relevant or most applicable for specific types of domains, grammars, sentences, sentence structures, sentence elements, characters, etc.:
Use of an embodiment of the invention can significantly expedite the improvement of a parser's operation on a new input type, category, or grammar that it was not previously or fully trained on. In this way, a parser that was trained on a standard training set (such as the Penn Treebank) may be modified or adapted to operate correctly and effectively on a new corpus of inputs (that may differ from those used to generate the Penn Treebank in terms of domain, category, type, grammar, input element characteristics, etc.) much more quickly than by starting with an untrained parser and trying to create a sufficiently large set of input data to properly and reliably train it.
In general, embodiments of the inventive system and methods relate to introducing constraints/controls into the operation of a parser that has previously been trained on a corpus of documents in order to more efficiently train the parser on a new and different corpus of documents. Note that a constraint or condition placed on the operation of the parser may depend in part or in whole, and directly or indirectly on a cost, value, parameter, a result of evaluating a function or process, a combination of parameters or variables and one or more logical operations (e.g., Boolean), etc.
In some embodiments, the value may be a cost or value as determined by a cost or value function that is used to determine the value of a connection between nodes in a network structure and/or the overall arrangement of the structure. The cost or value function may depend on one or more of context, implied meanings, domain type, etc. For example, when constructing a parse tree, the presence or absence of a connection between two words/nodes may depend on the value for the connection as determined by an applicable cost/value function for the network. This might be used to train a parser to avoid connections that are considered “weak” or “possible but considered improper” (e.g., slang, colloquial terms, etc.).
As shown in
Next, the annotator is asked to select/click on any incorrect link that may exist in the automatically generated parse, as shown in
As will be described with further reference to
As shown in
In an embodiment in which machine learning or other automated process is used to evaluate the correctness of a proposed parsing to replace inputs from (or use in conjunction with) the actions of an annotator (e.g., create or generate the new rule, condition, or constraint to apply to the operation of the parser), this may be accomplished by a process such as the following:
As mentioned, once a human annotator and/or automated learning process decides on how to correct an input, the banker module of
The trained classifier may then be used to modify the parsing algorithm discussed as follows:
Parse (n-length sentence):
Transitions=[ ]
For i=1 to 2n
Return transitions
(where a feature vector is a multidimensional numeric encoding of the current state of the parser).
Note that the inventive system and methods provide one or more of the following benefits or advantages, and may be used in one or more of the indicated contexts or use cases:
Note further that for each unbanked sentence, there are two basic phases of operation of the inventive system and methods:
Each application module or sub-module may correspond to a particular function, method, process, or operation that is implemented by the module or sub-module; for example, a function or process related to pre-processing input data (a sentence or string) for use by the parser, applying one or more rules or conditions based on the applicable grammar, identifying the role or purpose of certain input elements (such as words), identifying the relationship between certain input elements, generating a representation of the parser output, etc. Such function, method, process, or operation may also include those used to implement one or more aspects of the inventive system and methods, such as for:
The application modules and/or sub-modules may include any suitable computer-executable code or set of instructions (e.g., as would be executed by a suitably programmed processor, microprocessor, or CPU), such as computer-executable code corresponding to a programming language. For example, programming language source code may be compiled into computer-executable code. Alternatively, or in addition, the programming language may be an interpreted programming language such as a scripting language.
As described, the system, apparatus, methods, processes, functions, and/or operations for implementing an embodiment of the invention may be wholly or partially implemented in the form of a set of instructions executed by one or more programmed computer processors such as a central processing unit (CPU) or microprocessor. Such processors may be incorporated in an apparatus, server, client or other computing or data processing device operated by, or in communication with, other components of the system. As an example,
It should be understood that the present invention as described above can be implemented in the form of control logic using computer software in a modular or integrated manner. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement the present invention using hardware and a combination of hardware and software.
Any of the software components, processes or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, Javascript, C++ or Perl using, for example, conventional or object-oriented techniques. The software code may be stored as a series of instructions, or commands on a computer readable medium, such as a random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a CD-ROM. Any such computer readable medium may reside on or within a single computational apparatus, and may be present on or within different computational apparatuses within a system or network.
All references, including publications, patent applications, and patents, cited herein are hereby incorporated by reference to the same extent as if each reference were individually and specifically indicated to be incorporated by reference and/or were set forth in its entirety herein.
The use of the terms “a” and “an” and “the” and similar referents in the specification and in the following claims are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms “having,” “including,” “containing” and similar referents in the specification and in the following claims are to be construed as open-ended terms (e.g., meaning “including, but not limited to,”) unless otherwise noted. Recitation of ranges of values herein are merely indented to serve as a shorthand method of referring individually to each separate value inclusively falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate embodiments of the invention and does not pose a limitation to the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to each embodiment of the present invention.
Different arrangements of the components depicted in the drawings or described above, as well as components and steps not shown or described are possible. Similarly, some features and sub-combinations are useful and may be employed without reference to other features and sub-combinations. Embodiments of the invention have been described for illustrative and not restrictive purposes, and alternative embodiments will become apparent to readers of this patent. Accordingly, the present invention is not limited to the embodiments described above or depicted in the drawings, and various embodiments and modifications can be made without departing from the scope of the claims below.
This application claims the benefit of U.S. Provisional Application No. 62/128,275, entitled “System and Methods for Generating Treebanks for Natural Language Processing by Modifying Parser Operation through Introduction of Constraints on Parse Tree Structure,” filed Mar. 4, 2015, which is incorporated by reference herein in its entirety (including the Appendix) for all purposes.
Number | Date | Country | |
---|---|---|---|
62128275 | Mar 2015 | US |