A programmer utilizing a programming language creates the instructions comprising a computer program. Typically, source code is specified or edited by a programmer manually and/or with help of an integrated development environment (IDE) comprising numerous development services (e.g., editor, debugger, auto fill, intelligent assistance . . . ). By way of example, a programmer may choose to implement source code utilizing an object-oriented programming language (e.g., C#, VB, Java . . . ) where programmatic logic is specified as interactions between instances of classes or objects, among other things. Subsequently, the source code can be compiled or otherwise transformed to another form to facilitate execution by a computer or like device.
A compiler conventionally produces code for a specific target from source code. For example, some compilers transform source code into native code for execution by a specific machine. Other compilers generate intermediate code from source code, where this intermediate code is subsequently interpreted dynamically at run time or compiled just in time (JIT) to facilitate execution across computer platforms, for instance. Further yet, some compilers are utilized by IDEs to perform background compilation to aid programmers by identifying actual or potential problems, among other things.
Compilers perform lexical, syntactic, and semantic analysis as well as code generation. A scanner or lexer performs lexical analysis to convert a sequence of characters into tokens based on a program language specification. A parser performs syntactic analysis of tokens provided by the lexer in an attempt to determine structure and often captures such structure in a parse tree in accordance with a formal language grammar. Subsequently, semantic analysis can be performed with respect to the parse tree to determine meaning associated with the code as well as perform type checking and binding, among other things. Finally, a code generator produces code in a target language as a function of the analysis performed.
Program languages can support language-integrated queries and literals, among other things as native or first-class constructs. For example, an object-oriented language such can host a markup language expression. A compiler associated with the host language can ensure that the foreign constructs are transformed into host language constructs at compile time.
The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed subject matter. This summary is not an extensive overview. It is not intended to identify key/critical elements or to delineate the scope of the claimed subject matter. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
Briefly described, the subject disclosure pertains to optimization of an imperative implementation of compositional content. More specifically, a compiler transforms compositionally specified input into statements associated with an imperative application-programming interface in a manner that not only captures the functionality and/or structure specified thereby but also optimizes performance in terms of execution, resource utilization, and the like. In accordance with one aspect of the disclosure, the input elements are constructed in a manner that avoids unnecessary memory allocation and copying. According to another aspect, namespace and/or names are constructed as reusable variable objects that are cached to avoid at least expensive table lookups and string comparisons. In accordance with yet another aspect, redundant namespace declarations are removed statically at compile time and/or dynamically at runtime.
To the accomplishment of the foregoing and related ends, certain illustrative aspects of the claimed subject matter are described herein in connection with the following description and the annexed drawings. These aspects are indicative of various ways in which the subject matter may be practiced, all of which are intended to be within the scope of the claimed subject matter. Other advantages and novel features may become apparent from the following detailed description when considered in conjunction with the drawings.
Systems and methods pertaining to program optimization are described in detail hereinafter. Code and/or data constructed in a compositional manner is mapped to an imperative implementation in a manner that optimizes performance and resource utilization, among other things. In one instance, this can be accomplished by employing a constructor that does not utilize a parameter array, and calling add methods in conjunction with child elements and/or attributes. In addition, namespaces and/or names can be embodied as objects and cached to facilitate reuse. Further yet, redundant namespace declarations can be removed at compile time and/or runtime.
Various aspects of the subject disclosure are now described with reference to the annexed drawings, wherein like numerals refer to like or corresponding elements throughout. It should be understood, however, that the drawings and detailed description relating thereto are not intended to limit the claimed subject matter to the particular form disclosed. Rather, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the claimed subject matter.
Referring initially to
It is to be appreciated that the system 100 can be employed in many different scenarios and/or embodiments. However, solely for purposes of clarity and understanding this detailed description with focus on one particular context namely integration of an extensible markup language in general and more specifically the extensible markup language (XML) open standard that provides a general purpose specification for custom markup languages. Of course, integration of other languages and different embodiments are also possible and contemplated. For example, in addition to XML, the system 100 and aspects described hereinafter can operate similarly or analogously with respect to object initializers and strings.
In this context, the input to the system 100 can be markup code integrated within a host language. The imperative API 140 can provide functionality for implementing the markup code or XML in the context of a host language such as Visual Basic, for example. This is advantageous at least because it is composable such that simple expressions can be combined to create complex XML documents and the code is easily readable and maintainable in that expressions are structurally similar to XML. By way of example and not limitation, consider the following code sample that illustrates how XML can be created within a host language utilizing functionality provided the imperative API 130.
This program creates the following structure:
The input received by the system 100 need not be in this imperative form rather it can be compositional. More specifically, XML literals can be employed that hide calls to the API 130 and let a programmer use XML as a built-in data type much like the way programmers use numbers, strings, and other intrinsic data types. Using XML literals, the above function can be written as follows:
A naive implementation of XML literals transforms the code in function “f2” directly into the code in function “f1”. However, this implementation is not optimal in terms of performance and resource utilization. For example, a conventional constructor uses a parameter array to handle an optional number of parameters that are passed to the constructor. In this case, the compiler allocates an object array for each call and initializes it with a constructed element's content. This array is then passed to the constructor, which immediately removes the parameters from the array. Using the parameter array is convenient for programmers, but it adds the cost of an extra memory allocation and parameter copies. Instead of using such a constructor, an alternate constructor is provided as part of the imperative interface or API component 140, for instance, that does not include the parameter array but solely a name of the element as an argument, for example. Employment of this constructor is described in the next section.
Turning attention to
Typically, names of content elements or attribute are atomized in a name table including instances of the name objects and namespace object in order to reduce the amount of string comparisons. However, by computing a unique object for each name string, the names can be compared using object identity, which is a constant time operation rather than string comparison, which is a linear time operation.
More specifically, strings can be used to create and refer to names as well as namespace objects. The following code snippets shows a simple way to create an XML element:
One way to improve performance is to create variables that hold references to actual “XName” and “XNamespace” objects. Using such variables instead of strings has substantial performance improvement benefits. For example, the above code can be transformed into the following:
Here, the string “http://mynamespace” is first converted to an “XNamespace” object and the string “Name” is looked up in the “XNamespace” to return the “XName” object.
In an XML document, namespaces and names are often used multiple times. For example, it is common for all names to in be in one or two namespaces. Rather than look up the name and namespace on each constructor call, it is advantageous to hoist the namespace and names in variables so that they can be reused. While name reuse is less common than namespace reuse, it still occurs frequently in sections of repeating elements such as:
The example above can then be compiled as follows to leverage caching of both namespace and names as follows:
Referring to
The namespace component 230 includes an identifier component 510 that identifies namespaces in an input. Identified namespaces are provided or made available to redundancy checker component 520, which analyzes the namespaces in an attempt to identify redundant or otherwise unnecessary namespace declarations. For example, if a parent element and a child element both include namespace declarations, one of them is unnecessary. Removal component 530 is communicatively coupled to the redundancy checker component 520 and is able to remove identified redundant or unnecessary namespace declarations. In some cases, the removal component 530 can simply remove a declaration such as in the previous example where the child element declaration can be eliminated. However, in some other cases, the redundancy checker component 520 can employ injection component 540 alone or in conjunction with remove component 530. The injection component 540 adds a namespace declaration to a location. Together with the remove component 530, a movement of a namespace declaration from a first location to a second location can be effected. For instance, consider a scenario in which an identical namespace declaration is present on multiple child elements. In this case, the namespace declaration can be injected into a parent element and removed from the children.
In accordance with one aspect, various rules can be provided to configure namespace removal and/or movement. By way of example and not limitation consider the following rules: 1) Only move global namespaces that are defined using an imports statement, and local declarations that match global declarations; 2) When matching namespaces match both a prefix and the namespace itself; 3) Within a literal move the namespaces at compile time when the compiler can statically determine that a declaration can be moved; and 4) If the compiler cannot statically determine if a declaration can be moved, generate a call to a runtime function to perform the same check at runtime. Of course, many other and/or different rules can be utilized to tailor performance to different systems and/or scenarios.
What follows is a set of examples to aid clarity in understanding how namespace removal and/or movement can be performed. Of course, these are only examples and as such are not intended to limit the scope of the claimed subject matter in any manner but merely to aid understanding through example.
The first example illustrates a simple movement or injection of the declaration into the top level. Consider the following code snippet:
The output shows injection of the namespace declaration at the top level “level0”:
The second example illustrates implementation of a rule that only globally declared naemspace declarations will be bubbled up. As shown below, there is a local declaration “b” and global declaration “a”
The output indicates that only the global declaration “a” is bubbled to the top, “level0”, while the local declaration “b” remains at “level1”.
The third example illustrates how different namespace prefixes affect a removal/move process. As specified below, the global namespace and a local namespace are equivalent by include a different prefix:
The output shows that where namespace prefixes do not match, but an alternate prefix is in scope on the element, then the local declaration is removed and the alternate prefix is employed.
The next example includes embedded expressions. Consider the following code snippet:
The output shows movement of the namespace declaration to the top level “level0”
The follow example illustrates a case where the namespace cannot is removed at runtime since it cannot be removed:
The output is as follows:
In accordance with an aspect of the claimed subject matter, a remove namespace method can be invoked by a compiler system to remove redundant namespace declarations. The following a non-limiting exemplary method signature that can be employed with respect XML.
The calling code passes in a list of all in scope namespace attributes, an empty list of attributes and an element to process. The “RemoveNamespace” method will look at each namespace attribute on “e” and process it according to the table below. During the processing the “RemoveNamespace” method will remove namespace attributes from “e” and possibly add them to the in scope namespace list and the “addtoroot” list. All namespaces in the “addtoroot” list will be added to the root of the XML fragment once the fragment is constructed.
As mentioned previously, the transform component 130 is not limited to the optimizations described above as well as below. Various other tweaks can be made to the manner in which input is transformed. By way of example and not limitation, parallelism or more parallelism can be introduced with respect to input processing.
In addition, to optimization the subject techniques can guard against changes in an underlying API. Suppose, for instance, that a different API is developed or changes like bug fixes occur. These changes can be incorporated into a compiler so as to effect the changes for all programs by changing the code generation. Alternatively, if code were hand-written, every programmer would have to update his/her code. This is not some much an optimization but it leverages the fact that the compiler is generating actual calls to the underlying API.
Further yet, functionality associated with or described with respect to a particular context can be abstracted and employed for alternate and even disparate use. For instance, the system 600 of
The aforementioned systems, architectures, and the like have been described with respect to interaction between several components. It should be appreciated that such systems and components can include those components or sub-components specified therein, some of the specified components or sub-components, and/or additional components. Sub-components could also be implemented as components communicatively coupled to other components rather than included within parent components. Further yet, one or more components and/or sub-components may be combined into a single component to provide aggregate functionality. Communication between systems, components and/or sub-components can be accomplished in accordance with either a push and/or pull model. The components may also interact with one or more other components not specifically described herein for the sake of brevity, but known by those of skill in the art.
Furthermore, as will be appreciated, various portions of the disclosed systems above and methods below can include or consist of artificial intelligence, machine learning, or knowledge or rule based components, sub-components, processes, means, methodologies, or mechanisms (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines, classifiers . . . ). Such components, inter alia, can automate certain mechanisms or processes performed thereby to make portions of the systems and methods more adaptive as well as efficient and intelligent. By way of example and not limitation, a compiler system can utilize such mechanism to infer application of appropriate optimization. In one particular instance, inferences can be made compile time to aid identification and removal of redundant namespace declarations that otherwise would not be able to be done statically.
In view of the exemplary systems described supra, methodologies that may be implemented in accordance with the disclosed subject matter will be better appreciated with reference to the flow charts of
Referring to
Referring to
The word “exemplary” or various forms thereof are used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Furthermore, examples are provided solely for purposes of clarity and understanding and are not meant to limit or restrict the claimed subject matter or relevant portions of this disclosure in any manner. It is to be appreciated that a myriad of additional or alternate examples of varying scope could have been presented, but have been omitted for purposes of brevity.
As used herein, the term “inference” or “infer” refers generally to the process of reasoning about or inferring states of the system, environment, and/or user from a set of observations as captured via events and/or data. Inference can be employed to identify a specific context or action, or can generate a probability distribution over states, for example. The inference can be probabilistic—that is, the computation of a probability distribution over states of interest based on a consideration of data and events. Inference can also refer to techniques employed for composing higher-level events from a set of events and/or data. Such inference results in the construction of new events or actions from a set of observed events and/or stored event data, whether or not the events are correlated in close temporal proximity, and whether the events and data come from one or several event and data sources. Various classification schemes and/or systems (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines . . . ) can be employed in connection with performing automatic and/or inferred action in connection with the subject innovation.
Furthermore, all or portions of the subject innovation may be implemented as a method, apparatus or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed innovation. The term “article of manufacture” as used herein is intended to encompass a computer program accessible from any computer-readable device or media. For example, computer readable media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), smart cards, and flash memory devices (e.g., card, stick, key drive . . . ). Additionally it should be appreciated that a carrier wave can be employed to carry computer-readable electronic data such as those used in transmitting and receiving electronic mail or in accessing a network such as the Internet or a local area network (LAN). Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope or spirit of the claimed subject matter.
In order to provide a context for the various aspects of the disclosed subject matter,
With reference to
The system memory 1316 includes volatile and nonvolatile memory. The basic input/output system (BIOS), containing the basic routines to transfer information between elements within the computer 1312, such as during start-up, is stored in nonvolatile memory. By way of illustration, and not limitation, nonvolatile memory can include read only memory (ROM). Volatile memory includes random access memory (RAM), which can act as external cache memory to facilitate processing.
Computer 1312 also includes removable/non-removable, volatile/non-volatile computer storage media.
The computer 1312 also includes one or more interface components 1326 that are communicatively coupled to the bus 1318 and facilitate interaction with the computer 1312. By way of example, the interface component 1326 can be a port (e.g., serial, parallel, PCMCIA, USB, FireWire . . . ) or an interface card (e.g., sound, video, network . . . ) or the like. The interface component 1326 can receive input and provide output (wired or wirelessly). For instance, input can be received from devices including but not limited to, a pointing device such as a mouse, trackball, stylus, touch pad, keyboard, microphone, joystick, game pad, satellite dish, scanner, camera, other computer and the like. Output can also be supplied by the computer 1312 to output device(s) via interface component 1326. Output devices can include displays (e.g., CRT, LCD, plasma . . . ), speakers, printers and other computers, among other things.
The system 1400 includes a communication framework 1450 that can be employed to facilitate communications between the client(s) 1410 and the server(s) 1430. The client(s) 1410 are operatively connected to one or more client data store(s) 1460 that can be employed to store information local to the client(s) 1410. Similarly, the server(s) 1430 are operatively connected to one or more server data store(s) 1440 that can be employed to store information local to the servers 1430.
Client/server interactions can be utilized with respect with respect to various aspects of the claimed subject matter. By way of example and not limitation, one or more of the components or systems can be employed as a network or web service provided by one or more servers 1430 to one or more clients 1410 over the communication framework 1450. In particular, the entire compilation system 100 of
What has been described above includes examples of aspects of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations of the disclosed subject matter are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the terms “includes,” “contains,” “has,” “having” or variations in form thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.