An automaton is an abstract model for a finite state machine (FSM) or simply a state machine. A state machine consists of a finite number of states, transitions between those states, as well as actions. States define a unique condition, status, configuration, mode, or the like at a given time. A transition function identifies a subsequent state and any corresponding action given current state and some input. In other words, upon receipt of input, a state machine can transition from a first state to a second state, and an action or output event can be performed as a function of the new state. A state machine is typically represented as a graph of nodes corresponding to states and optional actions and arrows or edges identifying transitions between states.
A pushdown automaton is an extension of a regular automaton that includes the ability to utilize memory in the form of a stack. While a normal automaton can transition as a function of input and current state, pushdowns can transition based on the input, current state, and stack value. Furthermore, a pushdown automaton can manipulate the stack. For example, as part of a transition a value can be pushed to or popped off a stack. Further yet, the stack can simply be ignored or left unaltered.
In one instance, automata can provide bases for various compiler components such as parsers. Parsers include scanners or lexers that first perform lexical analysis on a program to identify language tokens. Subsequently or concurrently, parsers can perform syntactic analysis of the tokens. Parsers can be implemented utilizing an automaton that only accepts strings in accordance with a language grammar. Input and tokens can either be accepted or rejected based on a resultant state upon stopping of the automaton. In other words, the input can be either recognized or unrecognized. In many cases, the parser employs recognized input to generate a parse tree of tokens to enable subsequent processing (e.g., code generation, programmatic assistance, versioning . . . ).
Errors can occur with respect to execution of a pushdown automaton. Typically, error recovery is handled by inserting error productions that essentially work as exceptions, where an error is the exception and the exception is handled by the presence of an error-non-terminal production or an ancestor production in a grammar, for example.
The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed subject matter. This summary is not an extensive overview. It is not intended to identify key/critical elements or to delineate the scope of the claimed subject matter. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
Briefly described, the subject disclosure pertains to error recovery and diagnosis for pushdown automata, among other things. Upon detection of an error associated with execution of an automaton, a recovery strategy can be selected and executed to recover from the error. In accordance with one aspect of the disclosure, recovery strategies can include a configuration defining applicability and an action. Where a configuration associated with an error matches a strategy configuration, the strategy can be employed to recover from the error by modifying or replacing an error configuration with a recovery configuration. According to another aspect, error diagnosis can be computed as a function of the difference between error and recovery configurations.
A number of recovery strategies are available to recover from errors. In fact, recovery strategies are pluggable enabling users to easily fine tune error recovery by way of a self-defined or third party strategy in accordance with yet another aspect of the disclosure. Various mechanisms are also provided to facilitate selection of an appropriate recovery strategy. According to one aspect, strategy selection can be learned.
Further yet, recovery is not limited to pushdown automaton errors. According to an aspect, similar functionality can also be applied with respect to runtime exception handling and ambiguity resolution.
To the accomplishment of the foregoing and related ends, certain illustrative aspects of the claimed subject matter are described herein in connection with the following description and the annexed drawings. These aspects are indicative of various ways in which the subject matter may be practiced, all of which are intended to be within the scope of the claimed subject matter. Other advantages and novel features may become apparent from the following detailed description when considered in conjunction with the drawings.
a illustrates an exemplary state stack and input buffer associated with a parsing error.
b depicts an exemplary state stack and input buffer associated with an error recovery strategy.
c illustrates an exemplary state stack and input buffer associated with a recovery.
Systems and methods pertaining to error recovery and diagnosis, among other things, are described in detail hereinafter. Upon detection of an error in pushdown automaton execution, a recovery strategy can be selected from amongst a plurality of strategies and dispatched to recover from an error by modifying or replacing an error configuration with a recovered configuration. In fact, in one embodiment, strategies can be specified with applicable configurations, matched to a current configuration, and executed to produce a new configuration. Error diagnosis or identification can be accomplished by computing the difference between the error configuration and the recovered configuration. Error messages can be presented based on an identified error to provide meaningful feedback to enable the error to be fixed. In accordance with one aspect, the error recovery and diagnosis can be separate from or independent of an associated pushdown automaton to provide flexibility in applicability of recovery strategies and diagnosis functionality without altering the automaton.
Various aspects of the subject disclosure are now described with reference to the annexed drawings, wherein like numerals refer to like or corresponding elements throughout. It should be understood, however, that the drawings and detailed description relating thereto are not intended to limit the claimed subject matter to the particular form disclosed. Rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the claimed subject matter.
Referring initially to
The automaton component 110 is communicatively coupled to an error detection component 120 and an error recovery component 130. The detection component 120 is a mechanism that identifies errors or error states in automaton component processing. Upon error detection via one or more known or novel means, automaton configuration, such as a stack and input buffer, are made available for inspection and modification by the error recover component 130. The error recovery component 130 recovers from error by modifying or replacing a current configuration associated with an error with a new, error free configuration. As will be described further infra, the recovery component 130 can resolve errors by utilizing one or more standard and/or custom error recovery strategies
Pushdown automata such as parsers often deal with errors and recovery from error. When performing parsing for a compiler and/or integrated development environment (IDE), for instance, broken code is not only encountered often, it is the norm. When someone is typing along, code is almost always unable to be compiled and run. Bunches of error states are encountered while a user authors code. While a correct state might be reached, it is likely followed by more error states. Accordingly, error recovery is important in terms of being able to support/work in such an environment.
Conventionally, error recovery for auto-generated parsers, for instance, is handled by inserting error productions in a grammar that essentially function as exceptions handled by the presence of an error non-terminal production or ancestor production. This is very inflexible and has lead to a trend away from employment of automatic parser generators for this and other reasons. In particular, industrial compilers and/or IDEs generally require robust error recovery typically fined tuned for specific scenarios. By way of example, suppose a user types in the following erroneous code snippet in C#:
The problem is not discovered until the “in” keyword is reached, but the problem actually occurs with the introduction of the “for” keyword, which should be a “foreach” keyword. This is incredibly difficult to handle with traditional error recovery strategies such as error productions. The system 100 can address this by way of the recovery component 130 functioning external to the automaton component 110, as previously described, so as not to require modification of the automaton component 110.
The recovery component 130 can further comprise a registration component 220 coupled to the strategy store 210. The registration component 220 registers a strategy 212 with the recovery component 130 and saves the strategy or a reference thereto on the store 210. Consequently, users can plug in their own or third party strategies to fine tune recovery. In accordance with one aspect, an arbitrarily rich program language can be provided to facilitate strategy specification. For example, a pattern matching language can be developed to aid specification of strategies that specify applicability based on patterns and matching thereof. Of course, generalized recovery patterns not dependent upon pattern matching but search, for instance, can be developed and registered with the recovery component 130 as well.
Furthermore, the representative recovery component 130 includes a dispatch component 230 communicatively coupled to the strategy store 210. The dispatch component 230 is a mechanism that dispatches or invokes a particular recovery strategy to deal with an error. Accordingly, the dispatch component 230 can includes some mechanism to facilitate selection of the appropriate strategy amongst a plurality of strategies as a function of a configuration, for example.
Turning attention to
The match component 320 performs pattern matching to identify applicable strategies. As previously mentioned, strategies can specify configurations to which they apply. Configuration can refer to state of a stack, input buffer, and/or variables, among other things. The match component 320 seeks to match a current or error configuration with a specified strategy configuration. The match need not be exact since strategy configurations may seek to match multiple scenarios. As such, the strategy configuration can include a limited amount of information and/or wildcards, among other things.
The evaluation component 330 evaluates execution of one or more strategies. Before committing to a strategy, a number of candidate strategies can be executed to determine their value compared to others. For example, new configurations can be produced by all candidate strategies and evaluated. Each configuration will likely remedy a particular error for which it was designed and allow processing to proceed, but some strategies may cause other errors. For example, where code is parsed and a “foreach” loop is incorrectly specified as a “for” loop there are at least two strategies for resolving the error. One way to recover is to remove or delete the entire loop structure. Another strategy is to replace the “for” keyword with a “foreach” keyword. The former recovery strategy is a quite naïve and unsophisticated approach. As such, it might result in further errors down the line. Accordingly, evaluation component 330 would score the former strategy below the latter.
The ranking component 340 can ascribe a ranking to a plurality of strategies as a function of a variety of factors. In particular, the ranking can pertain to relevance of a strategy to a particular error as well as the “goodness” of the resolution. Information utilized to produce a ranking can come from internal or external sources. For example, the ranking component can receive input from the evaluation component 330. Additionally or alternatively, the ranking component 340 can receive or retrieve information regarding ranking from external ranking services, social networks, or the like.
The dispatch component 230 can also include a learning component 350 that can influence strategy selection by learning which strategies are most applicable based on user programming habits, preference, and/or historical information, among other things. In accordance with one embodiment, applicability of an error recovery strategy can be learned by recording error configurations and matching recovery strategies. Over time, the component 350 can learn to choose recovery strategies automatically for future unseen error configurations.
Furthermore, it is to be noted that errors can be personal in nature. Accordingly, information can be utilized to infer appropriate strategies given a particular user or entity. By way of example, consider a user who cannot remember how to specify a “for” loop condition in a particular language. Once an error is observed and resolved for this user many times, the component 350 can learn the fact that the user has trouble with encoding this particular construct and recovery strategies previously employed. Subsequently, if the same or similar error appears a recovery strategy can be easily identified and/or preferred over others.
Referring to
It is to be appreciated that error messages and/or suggestions are easily modifiable by users. Accordingly, should a developer feel that an error message is not as helpful as first thought, it can be modified. Similar error recovery, error diagnosis component 530 can also be pluggable such that it can receive specification of errors and messages and employ them where appropriate.
It is to be noted that error diagnosis and error recovery are related yet distinct. Error diagnosis concerns provisioning good error messages to users about what they should probably do to fix their code so that it is in a working state. Error recovery, on the other hand, pertains to placing code in a condition in order to proceed with analysis, for example when parsing code as it is entered into an IDE. In particular, error recovery assumes some things. For example, code might not be structurally correct or might not have any structure. Accordingly, structure is forced on it to a point that allows further analysis so that a user can obtain meaningful information despite the broken state.
What follows is a brief example of how aspects of the claimed invention can be employed with respect to recovery and diagnosis. The sole purpose is to aid understanding of aspects of the claims. The example is not intended to limit the scope or spirit of the claims in any manner.
Consider an automatically generated left-to-right (LR) parser. When an error occurs, a call is made to a recovery mechanism and the current parsing configuration is passed. After recovery, diagnosis of the error is computed from the difference between the erroneous configuration and the recovered configuration. Below is an exemplary code snippet capturing the above:
Parsing configuration include a state stack and an input/lookahead buffer. The recovery method dispatches a particular recovery strategy based on the current parser configuration. Such dispatch can employ tree pattern matching to find the best applicable recovery strategy for a given error, among other things. For instance, reconsider the error previously given when a user types:
Turning attention to
A recovery strategy configuration that matches the error is depicted in
Since the strategy configuration matches the current error configuration, the recovery strategy is dispatched, which will return a new or modified configuration as illustrated in
Users can easily specify tree patterns that correspond to errors and recover from those errors. This enables users to fine-tune their error recovery instead of only specifying broad patterns. Such a system can include a number of specific recovery cases as well as general recovery strategies, which are all based on prioritized tree matching.
The aforementioned systems, architectures, and the like have been described with respect to interaction between several components. It should be appreciated that such systems and components can include those components or sub-components specified therein, some of the specified components or sub-components, and/or additional components. Sub-components could also be implemented as components communicatively coupled to other components rather than included within parent components. Further yet, one or more components and/or sub-components may be combined into a single component to provide aggregate functionality. Communication between systems, components and/or sub-components can be accomplished in accordance with either a push and/or pull model. The components may also interact with one or more other components not specifically described herein for the sake of brevity, but known by those of skill in the art.
Furthermore, as will be appreciated, various portions of the disclosed systems above and methods below can include or consist of artificial intelligence, machine learning, or knowledge or rule based components, sub-components, processes, means, methodologies, or mechanisms (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines, classifiers . . . ). Such components, inter alia, can automate certain mechanisms or processes performed thereby to make portions of the systems and methods more adaptive as well as efficient and intelligent. By way of example and not limitation, such mechanism can be employed to learn when a recovery strategy is applicable by saving a configuration when an error occurs and then detecting common parts of the configuration over several such errors.
In view of the exemplary systems described supra, methodologies that may be implemented in accordance with the disclosed subject matter will be better appreciated with reference to the flow charts of
Referring to
By way of example, consider a parser associated with an integrated development environment (IDE). Here, parsing is performed to generate parse trees that can be employed to provide assistance to programmers by way of automatic statement completion, intelligent suggestion, formatting, and/or colorization among other things. Parsing is employed as a user types. However, at any given moment, specified code is in a broken state as it is under development. Further yet, users often make mistakes authoring code. Accordingly, the parser needs to be able to operate in the presence of errors. To accomplish this error recovery can be employed. In this instance, a plurality of recovery strategies can be specified identifying applicability configurations. Upon identification of an error, a recovery strategy can be identified that matches a current erroneous configuration. The identified strategy can then be executed providing a new error free configuration that allows parsing to continue. In this manner, program assistance can be afforded for other code further down the line that the erroneous code.
Referring to
As previously mentioned, aspects of the claimed subject matter can be applied outside a parser setting. For example, such configurable error recovery can be applied to runtime exceptions. A call stack of instructions not yet executed corresponds to a stack of tokens not yet reduced. Further, a parser lookahead buffer can correspond to an instruction pointer of where execution is at with respect to instructions not yet executed. In these terms, if an exception occurs recovery from exceptions can be attempted based on pattern matching with respect to a call stack and instruction pointer.
Conventionally, exceptions are handled completely context independent. The same exception can be thrown in the context of very different call stacks and there is no way to make the handling dependent on context. The only thing provided is the exception and dispatch is provided on the exception type. Now, exception handling can be context dependent.
Further yet, it is to be appreciated error recovery can be implemented on a deployed program without altering the code. For example, error recovery strategies can be specified to address unhandled exceptions after the fact. When such an error is identified utilizing a specified configuration, then a corresponding recovery strategy can be dispatched. By way of example, consider an application programming interface (API) designed to perform file access that throws file open and file not exist exceptions, among others and over time there is a change to a network file system. Now suddenly, there can be a time out exception that was not present in the original API. Having exception handling outside a program can take care of new exceptions that where not there before or maybe there is a bug or certain exceptions handling was not implemented.
Still further yet, it is to be noted that the above described error recovery and diagnosis systems and methods need not be limited to errors or exceptions. The same or similar mechanism can be employed to deal with ambiguities. Where an error provides no option to proceed, ambiguities provide more than one option. Accordingly, an ambiguity strategy can be specified as a function of contextual configuration information that specifies a particular path. As per diagnosis, instead of an error message an explanation can be provided as to how an ambiguity was resolved.
The word “exemplary” or various forms thereof are used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Furthermore, examples are provided solely for purposes of clarity and understanding and are not meant to limit or restrict the claimed subject matter or relevant portions of this disclosure in any manner. It is to be appreciated that a myriad of additional or alternate examples of varying scope could have been presented, but have been omitted for purposes of brevity.
As used herein, the term “inference” or “infer” refers generally to the process of reasoning about or inferring states of the system, environment, and/or user from a set of observations as captured via events and/or data. Inference can be employed to identify a specific context or action, or can generate a probability distribution over states, for example. The inference can be probabilistic—that is, the computation of a probability distribution over states of interest based on a consideration of data and events. Inference can also refer to techniques employed for composing higher-level events from a set of events and/or data. Such inference results in the construction of new events or actions from a set of observed events and/or stored event data, whether or not the events are correlated in close temporal proximity, and whether the events and data come from one or several event and data sources. Various classification schemes and/or systems (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines . . . ) can be employed in connection with performing automatic and/or inferred action in connection with the subject innovation.
Furthermore, all or portions of the subject innovation may be implemented as a method, apparatus or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed innovation. The term “article of manufacture” as used herein is intended to encompass a computer program accessible from any computer-readable device or media. For example, computer readable media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), smart cards, and flash memory devices (e.g., card, stick, key drive . . . ). Additionally it should be appreciated that a carrier wave can be employed to carry computer-readable electronic data such as those used in transmitting and receiving electronic mail or in accessing a network such as the Internet or a local area network (LAN). Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope or spirit of the claimed subject matter.
In order to provide a context for the various aspects of the disclosed subject matter,
With reference to
The system memory 1116 includes volatile and nonvolatile memory. The basic input/output system (BIOS), containing the basic routines to transfer information between elements within the computer 1112, such as during start-up, is stored in nonvolatile memory. By way of illustration, and not limitation, nonvolatile memory can include read only memory (ROM). Volatile memory includes random access memory (RAM), which can act as external cache memory to facilitate processing.
Computer 1112 also includes removable/non-removable, volatile/non-volatile computer storage media.
The computer 1112 also includes one or more interface components 1126 that are communicatively coupled to the bus 1118 and facilitate interaction with the computer 1112. By way of example, the interface component 1126 can be a port (e.g., serial, parallel, PCMCIA, USB, FireWire . . . ) or an interface card (e.g., sound, video, network . . . ) or the like. The interface component 1126 can receive input and provide output (wired or wirelessly). For instance, input can be received from devices including but not limited to, a pointing device such as a mouse, trackball, stylus, touch pad, keyboard, microphone, joystick, game pad, satellite dish, scanner, camera, other computer and the like. Output can also be supplied by the computer 1112 to output device(s) via interface component 1126. Output devices can include displays (e.g., CRT, LCD, plasma . . . ), speakers, printers and other computers, among other things.
The system 1200 includes a communication framework 1250 that can be employed to facilitate communications between the client(s) 1210 and the server(s) 1230. The client(s) 1210 are operatively connected to one or more client data store(s) 1260 that can be employed to store information local to the client(s) 1210. Similarly, the server(s) 1230 are operatively connected to one or more server data store(s) 1240 that can be employed to store information local to the servers 1230.
Client/server interactions can be utilized with respect with respect to various aspects of the claimed subject matter. By way of example and not limitation, various components can be embodied as network services wherein one or more servers 1230 provide functionality to one or more clients 1210 across the communication framework. In one particular instance, one or more error recovery strategies can be provided by a server 1230 for download and employment with respect to a client 1210 based error recovery system or component. Further, yet information can be acquired from one or more clients 1210 and servers 1230 with respect to evaluating, prioritizing, and/or ranking strategies, among other things.
What has been described above includes examples of aspects of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations of the disclosed subject matter are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the terms “includes,” “contains,” “has,” “having” or variations in form thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.