The present invention relates to optimising a grammar and compilation parser for parsing a computer programming language used in arbitrary applications.
Computer programs are engineered using a programming language, often referred to as source code. The source code for a given arbitrary application program is then compiled or interpreted in order to be run on a given computer processor. The compilation or interpretation process, performed by the compiler or interpreter application programs, commonly comprises a parsing process in which a body of program code in a given programming language is checked for compliance with the respective grammar for the programming language. In other words, the body of code is analysed to ensure that it conforms to the grammar rules or productions for the relevant programming language. If the body of program code complies with the relevant grammar then its processing can proceed to the next stage in the compilation or interpretation process. If the body of code does not comply with the grammar then a parsing error can be signalled.
One problem is that the grammars for some programming languages are large and complex and thus result in correspondingly large and complex parser functionality either as stand-alone parser programs or within compiler or interpreter programs.
Therefore, there is a need in the art to address the aforementioned problem.
In an illustrative embodiment, an apparatus is provided for optimising a compilation parser for parsing arbitrary application code. The apparatus comprises a first generate component for generating a first parser for parsing a programming language in accordance with a first grammar comprising a first set of grammar productions; a run component for running the first parser against a first sample of the programming language; an identify component for identifying the subset of the first set of grammar productions used for parsing the first sample of the programming language; and a second generate component for generating a second parser for parsing the programming language in accordance with a second grammar, of reduced scope relative to the first grammar, comprising the identified subset of the first set of grammar productions.
In another illustrative embodiment, a computer implemented method is provided for optimising a compilation parser for parsing computer program code. The method comprises creating a first parser for parsing a programming language in accordance with a first grammar comprising a first set of grammar productions; running the first parser against a first sample of the programming language; identifying the subset of the first set of grammar productions used for parsing the first sample of the programming language; and creating a second parser for parsing the programming language in accordance with a second grammar, of reduced scope relative to the first grammar, comprising the identified subset of the first set of grammar productions.
In another illustrative embodiment, a computer program product is proivded for optimising a compilation parser for parsing computer program code, the computer program product comprising a computer readable storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method for performing the steps of the invention.
Preferred embodiments of the invention will now be described, by way of example only, with reference to the following drawings in which:
With reference to
With reference to
The parser optimisation application program 204 is arranged to run the input parser 107 against the code sample 206 and to collect the data 208 generated by the instrumentation code 109 during this parsing process. The data 208 generated by the instrumentation code 109 indicates the subset of production rules 108′ that were exercised or used by the running of the parser 107 against the code sample 206. In the illustrative embodiment, the parser optimisation application program 204 uses the data 208 from the instrumentation code 109 to identify the grammatical constructs 106 of the grammar 105 which were not exercised and then removes these unused grammatical constructs 106 from the grammar to create an optimised grammar 105′. The parser optimisation application program 204 then generates an optimised parser 107′ by inputting the optimised grammar 105′ to the compilation parser generation application program 104. The optimised parser 107′ is thus optimised to operate in accordance with the optimised grammar 105′ as represented by the code sample 206. In the illustrative embodiment, the optimised parser 107′ is produced without any added instrumentation code.
In some cases the optimised parser 107′ may not be able to parse a given body of the programming language due to one or more grammar productions 106 present in the given body of the programming language having been optimised out of the optimised parser 107′. In the illustrative embodiment, a reversion process is provided for reverting from the use of the optimised parser 107′ to the non-optimised parser 107. In the illustrative embodiment, the compilation parser generation application program 104 is thus arranged to produce a non-instrumented version of the non-optimised parser 107 so as to enable the parsing of the given body of the programming language.
An apparatus for optimising a compilation parser for parsing arbitrary application code comprises various optional components: a first generate component; a run component; an identify component; a second generate component; a third generate component; a revert component; an instrumenting component; a de-instrumenting component; and a further run component.
The processing performed by the compilation parser generation application program 104 when producing a parser 107/107′ from a grammar 105/105′ will now be described further with reference to
The processing performed by the parser optimisation application program 204 when optimising a parser will now be described with reference to the flow chart of
The processing performed by the parser optimisation application program 204 when reverting the scope of an optimised parser will now be described with reference to the flow chart of
In another embodiment, the run component runs an instrumented master or instrumented optimised parser against further respective code samples in accordance with the method of
In a further embodiment, when a given parser is optimised, instead of removing the unused code elements, the unused code elements are maintained in the parser but disabled. The reversion process then comprises re-enabling one or more selected code elements in the parser.
In another embodiment, the optimisation process comprises removing only a selected subset of the unused code elements identified when running the parser against a given code sample.
In a further embodiment, the parser being optimised comprises an LL(*) or LL(k) parser and the unused grammar productions are used to identify unnecessary predicates and to reduce look-ahead in the optimised parser.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method, computer program product or computer program. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java®, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
For the avoidance of doubt, the term “comprising”, as used herein throughout the description and claims is not to be construed as meaning “consisting only of”.
It will be understood by those skilled in the art that the apparatus that embodies a part or all of the present invention may be a general purpose device having software arranged to provide a part or all of an embodiment of the invention. The device could be a single device or a group of devices and the software could be a single program or a set of programs. Furthermore, any or all of the software used to implement the invention can be communicated via any suitable transmission or storage means so that the software can be loaded onto one or more devices.
While the present invention has been illustrated by the description of the embodiments thereof, and while the embodiments have been described in considerable detail, it is not the intention of the applicant to restrict or in any way limit the scope of the appended claims to such detail. Additional advantages and modifications will readily appear to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details of the representative apparatus and method, and illustrative examples shown and described. Accordingly, departures may be made from such details without departure from the scope of applicant's general inventive concept.
Number | Date | Country | Kind |
---|---|---|---|
1221449.0 | Nov 2012 | GB | national |