A programmer utilizing a programming language creates the instructions comprising a computer program. Typically, source code is specified or edited by a programmer manually and/or with help of an integrated development environment (IDE) comprising numerous development services (e.g., editor, debugger, auto fill, intelligent assistance . . . ). By way of example, a programmer may choose to implement source code utilizing an object-oriented programming language (e.g., C#, VB, Java . . . ) where programmatic logic is specified as interactions between instances of classes or objects, among other things. Subsequently, the source code can be compiled or otherwise transformed to another form to facilitate execution by a computer or like device.
A compiler conventionally produces code for a specific target from source code. For example, some compilers transform source code into native code for execution by a specific machine. Other compilers generate intermediate code from source code, where this intermediate code is subsequently interpreted dynamically at run time or compiled just-in-time (JIT) to facilitate execution across computer platforms, for instance. Further yet, some compilers are utilized by IDEs to perform background compilation to aid programmers by identifying actual or potential problems, among other things.
In general, compilers perform syntactic and semantic program analysis. Syntactic analysis involves verification of program syntax. In particular, a program is lexically analyzed to produce tokens, and these tokens are parsed into syntax trees (or some other representation internal to the compiler) as a function of a programming language grammar. Typically, a parse tree is constructed during this compilation phase. A parse tree is made up of several nodes and branches where interior nodes correspond to non-terminals of the grammar and leaves correspond to terminals. The parse tree is subsequently employed to perform semantic analysis, which concerns determining and analyzing the meaning of a program.
Syntactic analysis or tree generation is performed by a parser or parse system. Parsers enable programs to either recognize or transcribe patterns matching formal grammars. A parser can be written by hand or by feeding a formal specification of a language grammar into a parser generator, which in turn produces necessary code.
It is desirable to write language grammars in a way that is natural for humans to read. Unfortunately, this means there are often ambiguities in the grammar or places where the generated parser cannot tell, based on the grammar alone, which grammar rule should be processed. Consider the following classic example of an ambiguous grammar: “S→if E then S else S|if E then S”. The parser generated from this grammar will not be able to process the input “if a then if b then s1 else s2”, because that parser cannot determine based on the grammar alone if the “else” belongs to the first “if” or the second. Furthermore, it might even be the case that a grammar is inherently ambiguous, for instance determining in certain situations if an identifier denotes a type or a variable.
Conventional systems require a user to rewrite the grammar to eliminate the ambiguity thereby producing a grammar that is harder to read than the original. Alternatively, a fixed set of static strategies can be employed that handle ambiguities in a pre-determined manner. For instance, a notation can exist with respect to the grammar to indicate that the “else” in the previous example is always associated with either the first or second “if.”
Error recovery in existing systems operates similarly. Either the system employs no error recovery, employs a fixed set of strategies that handle errors in a pre-determined manner, or requires changes to the grammar specification that alters the language understood by the resulting parser. Further, sometimes programmers may need to tweak the generated parse code by hand.
The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed subject matter. This summary is not an extensive overview. It is not intended to identify key/critical elements or to delineate the scope of the claimed subject matter. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
Briefly described, the subject disclosure pertains to language and/or data processing in light of various conflicts, ambiguities, and/or errors. Although not limited thereto, in one embodiment, processing refers to parsing and parser system generation. A variety of strategies or options are available to grammar authors for handling conflicts or ambiguities and errors including conventional static strategies as well as strategies that are more dynamic. In accordance with one aspect of the disclosure, a strategy can invoke code, a service, or a process external to a parsing system, for example. In this manner, an implementation of a conflict resolution or error recovery strategy can be changed without altering the parser or parser specification (e.g., the grammar). Further, various mechanisms can be employed to control the interaction between a system and external or outside code to ensure general type safety. Other strategies can also employ similar mechanisms with like results including one that employs a parser itself to explore potential actions and/or one that swaps parsers to resolve conflicts or ambiguities and/or recover from errors.
To the accomplishment of the foregoing and related ends, certain illustrative aspects of the claimed subject matter are described herein in connection with the following description and the annexed drawings. These aspects are indicative of various ways in which the subject matter may be practiced, all of which are intended to be within the scope of the claimed subject matter. Other advantages and novel features may become apparent from the following detailed description when considered in conjunction with the drawings.
a is a block diagram of a representative static strategy component according to a disclosed aspect.
b is a block diagram of a representative dynamic strategy component in accordance with an aspect of the disclosed subject matter.
Systems and methods pertaining to conflict resolution and error recovery, among other things, are described in detail hereinafter. Numerous static and dynamic strategies are available for handling conflicts, ambiguities, errors, and the like. Grammar authors can specify such strategies with respect to a grammar rather than constructing a more convoluted grammar addressing issues such as conflicts and errors. In accordance with one aspect, a generated parser can be directed to external resolution or recovery code that enables a change of strategy or implementation thereof without altering the parser or grammar from which the parser is generated. Further, interactions between the parser and code can be formalized to prevent undesirable or erroneous parser behavior.
Various aspects of the subject disclosure are now described with reference to the annexed drawings, wherein like numerals refer to like or corresponding elements throughout. It should be understood, however, that the drawings and detailed description relating thereto are not intended to limit the claimed subject matter to the particular form disclosed. Rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the claimed subject matter.
Referring initially to
It is desirable to specify grammars in a natural, easy to read form. This allows interested individuals to read a specification and understand a described language, which is particularly helpful for specification drafting and testing, for example. Otherwise, it is unclear what programs a parser and its associated grammar recognizes. Conventional hand-written parsers, including most industrial compilers, are a classic example of a lack of clarity since a variety of other code is included and added in an ad hoc manner resulting in a very convoluted code base. Parser generation systems are a bit better in this regard, as they force specification of a formal grammar. Nevertheless, the most natural way of describing languages often include grammatical ambiguities or conflicts. While grammars can sometimes be rewritten to exclude the ambiguities, this generally makes grammars harder to read. Alternatively, a set of one or more strategies 124 (a component as described herein) can be coupled with a grammar 122 (also a component as described herein) to resolve conflicts.
Conventionally, a set of fixed and/or static strategies is available to address ambiguities in a pre-determined manner. Consider, again, the standard example of an ambiguous grammar: “S→if E then S else S|if E then S”. The parser 140 generated from this grammar will not be able to process the input “if a then if b then s1else s2”, because that parser cannot determine based on the grammar alone if the “else” belongs to the first “if” or the second. This classic shift-reduce conflict can be addressed by a strategy that indicates that the “else” is always associated with the first or second “if.” This corresponds to a static rule that once captured in a parser 140 via parser generator 130 cannot be changed without altering the parser 140. Furthermore, the limited set of strategies 124 is not necessarily effective across all languages and language constructs.
In accordance with one aspect of the claimed subject matter, a wide range of strategies or options can be provided and made available to grammar authors to facilitate conflict and/or ambiguity resolution, among other things. Further yet, multiple strategies can be associated with a particular conflict or ambiguity to ensure resolution, to an extent, should other strategies fail. In one implementation, a unique name can be generated for each conflict as a function of grammar rules involved, and the author can specify in the grammar file 120 what strategy to use for each conflict. With respect to the example above, an author can specify that a “shift” should be chosen rather than a “reduce” by adding a command such as the following: “% OnConflict ShiftElseKeyword, ReduceIfStatement Prefer ShiftElseKeyword”. Here, the command identifies the two available conflicting options and indicates a preference for one.
Turning attention to
As shown in
b depicts exemplary dynamic strategies 220 in accordance with an aspect of the claimed subject matter including parse resolution component 320, code invocation component 322, and parser swap component 324. The parse resolution component 320 employs a strategy that in essence employs parsing system functionality to resolve a conflict. More specifically, when a conflict is encountered a technique such as GLR (Generalized Left-to-Right) parsing can be employed to parse each option and identify the best. Usually only one solution is correct which allows continued parsing all the way through. This usually works but can be potentially inefficient. In simple cases, a static strategy might be more appropriate. By way of example, this strategy can be employed with respect to a comma ambiguity using a command such as but not limited to “% OnConflict ShiftComma, ReduceSomeEnumMemberDeclaration List, Resolve”.
The code invocation component 322 enables execution of some external or outside code, process, or service that can resolve a conflict. In other words, the code provides implementation of the strategy, which can be altered without requiring a change in a parser itself or the original grammar Furthermore, the strategy can actually correspond to another strategy. Accordingly, the strategies are composable.
Turning briefly to
It is noted that by allowing a parser to employ arbitrary code to resolve conflicts, opportunity exists for compromising a parser. To address this issue various interaction protocols can be employed. For instance, the external code can be allowed to return what amounts to a suggestion. In other words, it can simply identify an action to be taken such as a path with which to continue processing. In this manner, control need not be relinquished to the external code thereby reducing the likelihood that the arbitrary code will break or corrupt a parser. Additionally, relevant context information required to resolve a conflict can be provided to external code as a copy or immutable version so that state is not unexpectedly or undesirably altered by the code. Still further yet, the parser can determine acceptable results and compare them to results provided by the arbitrary code to further ensure the code does not misguide the parser. These formalized communication protocols ensure that the parser is generally type safe.
In one implementation, where outside code is to be called to resolve a conflict the following non-limiting command can be provided in the grammar file: “% OnConflict ShiftComa, ReduceVariableInitList Run ShiftCommaOrReduceVAriableInitList”. In this case, the generated parser can have an abstract method that is subsequently overridden and implemented:
Here, the parsing system also generates an enumerator list or set of named constants specifying only meaningful choices for the conflict in question.
Further, the outside code does not have access to the internals of the parsing system, but rather is passed copies or immutable versions of relevant state. Together these mechanisms guarantee the outside code cannot break the parser as a whole and must make a valid choice for handling the conflict or ambiguity.
Returning to
Turning attention to
It should be appreciated that the aforementioned aspects can be employed in different contexts. For example, the aspects can be employed in furtherance of recovering from errors. Fundamentally, a conflict corresponds to an inability to parse due to the lack of a unique processing path. Errors also cannot complete parsing for the same reason. The difference is that with respect to conflicts or ambiguities there is more than one available processing path while with errors there is less than one. However, recovery from an error can correspond to returning from an erroneous path and/or selecting another path that allows parsing, for example, to continue.
Referring to
Error recover refers to the ability to change the state of a parser in order to continue parsing. In other words, error recovery enables output to be produced, such as a parse tree, among other things, despite the fact that there are errors in a program or program input. A closely related concept is error diagnosis or reporting which concerns production of good error messages for user when errors do occur. Accordingly, error reporting can be coupled to error recover such that upon detection and/or recovery from an error a message can be produced identifying the error.
Further, the disclosed subject matter is not limited to conflict resolution and error recovery. In fact, aspects are applicable to any automation, state machine or like scenarios. By way of example and not limitation, consider workflow systems, which are basically state machines. Conflicts can exist regarding which of several actions to take and strategies can be employed to resolve such conflicts including external code invocation, which can allow dynamic selection of a resolution strategy.
Referring to
The aforementioned systems, architectures, and the like have been described with respect to interaction between several components. It should be appreciated that such systems and components can include those components or sub-components specified therein, some of the specified components or sub-components, and/or additional components. Sub-components could also be implemented as components communicatively coupled to other components rather than included within parent components. Further yet, one or more components and/or sub-components may be combined into a single component to provide aggregate functionality. Communication between systems, components and/or sub-components can be accomplished in accordance with either a push and/or pull model. The components may also interact with one or more other components not specifically described herein for the sake of brevity, but known by those of skill in the art.
Furthermore, as will be appreciated, various portions of the disclosed systems above and methods below can include or consist of artificial intelligence, machine learning, or knowledge or rule based components, sub-components, processes, means, methodologies, or mechanisms (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines, classifiers . . . ). Such components, inter alia, can automate certain mechanisms or processes performed thereby to make portions of the systems and methods more adaptive as well as efficient and intelligent. By way of example and not limitation, various strategies can employ such mechanisms to facilitate conflict resolution and/or error recovery, among other things. Further, the inference component 740 can employ this type of technology to aid a user in identifying a strategy to address some issue such as a conflict or error.
In view of the exemplary systems described supra, methodologies that may be implemented in accordance with the disclosed subject matter will be better appreciated with reference to the flow charts of
Referring to
Turning attention to
It is to be appreciated that various examples and discussion supra focus on programmatic code solely for purpose of clarity and understanding. Various systems and methods associated with parsing, conflict resolution, and error recovery can be employed with respect not only to computer or programmatic code but also to natural languages as well as data (e.g., XML (eXtensible Markup Language, JSON (JavaScript Object Notation, comma-separated values . . . )
The word “exemplary” or various forms thereof are used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Furthermore, examples are provided solely for purposes of clarity and understanding and are not meant to limit or restrict the claimed subject matter or relevant portions of this disclosure in any manner. It is to be appreciated that a myriad of additional or alternate examples of varying scope could have been presented, but have been omitted for purposes of brevity.
As used herein, the term “inference” or “infer” refers generally to the process of reasoning about or inferring states of the system, environment, and/or user from a set of observations as captured via events and/or data. Inference can be employed to identify a specific context or action, or can generate a probability distribution over states, for example. The inference can be probabilistic—that is, the computation of a probability distribution over states of interest based on a consideration of data and events. Inference can also refer to techniques employed for composing higher-level events from a set of events and/or data. Such inference results in the construction of new events or actions from a set of observed events and/or stored event data, whether or not the events are correlated in close temporal proximity, and whether the events and data come from one or several event and data sources. Various classification schemes and/or systems (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines . . . ) can be employed in connection with performing automatic and/or inferred action in connection with the subject innovation.
Furthermore, all or portions of the subject innovation may be implemented as a method, apparatus or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed innovation. The term “article of manufacture” as used herein is intended to encompass a computer program accessible from any computer-readable device or media. For example, computer readable media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), smart cards, and flash memory devices (e.g., card, stick, key drive . . . ). Additionally it should be appreciated that a carrier wave can be employed to carry computer-readable electronic data such as those used in transmitting and receiving electronic mail or in accessing a network such as the Internet or a local area network (LAN). Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope or spirit of the claimed subject matter.
In order to provide a context for the various aspects of the disclosed subject matter,
With reference to
The system memory 1416 includes volatile and nonvolatile memory. The basic input/output system (BIOS), containing the basic routines to transfer information between elements within the computer 1412, such as during start-up, is stored in nonvolatile memory. By way of illustration, and not limitation, nonvolatile memory can include read only memory (ROM). Volatile memory includes random access memory (RAM), which can act as external cache memory to facilitate processing.
Computer 1412 also includes removable/non-removable, volatile/non-volatile computer storage media.
The computer 1412 also includes one or more interface components 1426 that are communicatively coupled to the bus 1418 and facilitate interaction with the computer 1412. By way of example, the interface component 1426 can be a port (e.g., serial, parallel, PCMCIA, USB, FireWire . . . ) or an interface card (e.g., sound, video, network . . . ) or the like. The interface component 1426 can receive input and provide output (wired or wirelessly). For instance, input can be received from devices including but not limited to, a pointing device such as a mouse, trackball, stylus, touch pad, keyboard, microphone, joystick, game pad, satellite dish, scanner, camera, other computer and the like. Output can also be supplied by the computer 1412 to output device(s) via interface component 1426. Output devices can include displays (e.g., CRT, LCD, plasma . . . ), speakers, printers and other computers, among other things.
The system 1500 includes a communication framework 1550 that can be employed to facilitate communications between the client(s) 1510 and the server(s) 1530. The client(s) 1510 are operatively connected to one or more client data store(s) 1560 that can be employed to store information local to the client(s) 1510. Similarly, the server(s) 1530 are operatively connected to one or more server data store(s) 1540 that can be employed to store information local to the servers 1530.
Client/server interactions can be utilized with respect with respect to various aspects of the claimed subject matter. By way of example and not limitation, the code invocation strategy can invoke code associated with a network service. For instance, a parser executing on a client 1510 can access a conflict resolution and/or error recovery implementation resident on a server 1530 or another client 1510 across the communication framework 1550. Further, all or a portion of that implementation can delegate functions to other processes or services on yet other clients 1510 and/or servers 1530. Further, components such as the parser generator 130 or parser 140 can be network services.
What has been described above includes examples of aspects of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations of the disclosed subject matter are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the terms “includes,” “contains,” “has,” “having” or variations in form thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.