In general, a computer software compiler (or “compiler”) transforms source code written in a source programming language into a target computer language often in a binary form known as object code. A common objective for transforming source code to object code is to create an executable program for a specific computer processor. A compiler is likely to perform many different operations such as lexical analysis, preprocessing, parsing, semantic analysis, code generation, and/or code optimization. Program faults caused by incorrect compiler behavior, however, can be very difficult to track down and work around, and thus well-founded compiler implementations may comprise features for ensuring correctness of the compilation.
Distributed computing is a form of computing—generally by operation of application programs—in which many calculations are carried out simultaneously on the premise that large problems can often be divided into smaller problems which can be solved concurrently (“in parallel”) for efficiency. To accomplish this parallelism, distributed computing makes use of multiple autonomous computers (or processors) to solve computational problems by dividing the problem into many sub-problems that are then solved by one or more of the autonomous computers (or nodes) in a cluster of computers. To perform computations on very large problems or datasets, distributed computing clusters (comprising tens, hundreds, or even thousands of autonomous computers) may be utilized.
Many modern computing systems are comprised of multiple separate complex execution subsystems. For example, a computer may be endowed with multiple processor cores, and/or one or several graphics cards, each having many internal execution cores. Computer clusters are composed of multiple computers. Computers may provide execution services over the network, such as offering database queries or web services. Creating programs for such complex environments involves a complex set of interacting tools, and in particular, the cooperation of multiple compilers. Traditionally the cooperation of the various compilers targeting all these environments is done in ad hoc way.
A modular compiler architecture uses partial compiler modules that cooperatively produce object code for operation on a complex execution infrastructure. In a modular compiler implementation, the component partial compilers may invoke the services of other partial compilers, wherein each partial compiler operates as a self-contained “black-box” module, sharing no code or state with other partial compilers. This structure, in turn, may allow the partial compilers of such implementations to be arranged in modular hierarchies for multi-level compilation. These various implementations, in turn, then produce compiled programs able to correctly run on complex execution environments, such as large computer clusters comprising a mix of computational resources (machines, multiple cores, graphics cards, SQL server engines, etc.).
Partial compilers could be seen as a generalization of traditional compilers. Traditional compilers may be used as components of modular compilers and in combination with other partial compilers. Modular compiler implementations may comprise a set of high-level operations that manipulate partial compilers. Certain implementations may also feature a software architecture for modular compiler construction to provide a large-scale query-processing system (compiler) implemented as a modular combination of many simple partial compilers.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
To facilitate an understanding of and for the purpose of illustrating the present disclosure and various implementations, exemplary features and implementations are disclosed in, and are better understood when read in conjunction with, the accompanying drawings—it being understood, however, that the present disclosure is not limited to the specific methods, precise arrangements, and instrumentalities disclosed. Similar reference characters denote similar elements throughout the several views. In the drawings:
A multicore processor is a processor that includes multiple execution units (“cores”) on the same chip, enabling it to process simultaneously instructions from multiple instruction streams. A multiprocessor computer, in comparison, is a stand-alone computer system (or “machine”) with multiple processors that share memory and may connect via a bus, point-to-point links, or other high-speed means; however, “bus contention” (where more than one processor attempts to use the bus at the same time) and similar limitations often prevent such computing systems from scaling to more than thirty-two (32) processors. As such, a multiprocessor computer may comprise one or more multicore processors for multiplying computational power.
A distributed computer (sometime referred to as a distributed memory multiprocessor) is comprised of multiple processors connected by a network (and thus is highly scalable) to solve computational problems using parallel computing (where a problem is divided into many sub-problems, each of which is solved by different processor). For example, a massively parallel processor (MPP) is a single stand-alone computer with many networked processors using specialized high-speed interconnect networks where generally each processor has its own memory, copy of the operating system, and copy of the application(s). In contrast, a cluster (or cluster computer system) is a distributed computer comprising multiple computer systems (each a “cluster computer,” “autonomous computer,” or a “machine”) connected by a network where each machine has its own processing elements, memory, operating system, and applications, and the network generally comprises commodity networking hardware. A grid computer system (or grid) is similar to a cluster but where the networked computers communicate over the Internet which, because of its relatively low bandwidth and high latency, are the most distributed form of parallel computing and typically deals only with “embarrassingly parallel” problems, that is, problems that are easily split into parallel tasks that require little or no communication between such tasks.
A distributed computer—whether an MPP, a cluster, or a grid—may comprise one or more multiprocessor computers and/or comprise one or more multicore processors. There are also several specialized parallel/distributed computer systems based on reconfigurable computing systems with field-programmable gate arrays, general-purpose computing systems on graphics processing units, application-specific integrated circuits, and vector processors, to name a few, for example.
Notwithstanding the foregoing, the terms “concurrent,” “parallel,” and “distributed” strongly overlap, and are used interchangeably herein such that a same system may be characterized as “parallel” and/or “distributed” without loss of generality such that processors in a distributed system run concurrently in parallel. Where distinctions are necessary and the terms are disjunctively and in obvious conflict to a person of ordinary skill in the relevant art, then the term “parallel” as used in parallel computing shall refer to all processors having access to a shared memory that can be used to exchange information between processors, whereas the term “distributed” as used in distributed computing shall refer to each processor having its own private memory (a part of the “distributed memory”) where information is exchanged by passing messages between the processors (presumably through an intermediary of some kind).
While various implementations disclosed herein are described in terms of a distributed computing system and, more specifically, in terms of a cluster computer system (or “distributed computing cluster”), skilled artisans will readily recognize that such implementations can readily be implemented on other types of distributed computing systems, and nothing is intended to limit the implementations disclosed herein to any specific distributed computer type nor to any specific configuration of processors such as multiprocessors but, instead, are intended to be given the widest interpretations possible.
As shown in
In some implementations, the client 110 may include a desktop personal computer, workstation, laptop, PDA, cell phone, smart phone, or any WAP-enabled device or any other computing device capable of interfacing directly or indirectly with the network 120, such as a computing device 500 illustrated in
For simplicity, although in no way limited to such, various implementations disclosed herein may be characterized in the context of an exemplary distributed computing compiler, specifically, the DryadLINQ compiler and its underlying distributed runtime system Dryad. DryadLlNQ translates programs written in the “Language INtegrated Query” (LINQ) database programming language into large-scale distributed computations that run on shared-nothing computer clusters and, as such, the basic functionality provided by DryadLlNQ includes compiling programs for execution on a computer cluster. In this sense, a cluster may be abstractly viewed as a single computational engine composed of many computers that perform the actual computational work. Moreover, although nothing herein is tied to any particular query language, some of the terminology used herein is similarly inherited from LINQ, which is itself derived from various database applications, and thus inputs for partial compilers are referred to as “queries” while outputs generated by partial compilers are referred to as “plans.”
At a lower level of abstraction, however, it should be noted that each such computer in the cluster may itself also have multiple processor cores (multiprocessors or multicores). Consequently, the DryadLlNQ compilation process is correspondingly structured as a three-stage process: (1) translating a cluster-level computation (or program) into a set of interacting machine-level computations, (2) translating each machine-level computation into a set of processor-core-level computations, and (3) implementing each processor-core-level computation as machine-executable code. For DryadLlNQ, the LINQ source-code language is essentially the same for all of these layers; however, the optimization strategies, program transformation rules, and runtime operations invoked by the compiler at each of these levels—cluster, machine, and core—may be very different, and thus decomposing the DryadLlNQ compiler into a hierarchy of compilers (corresponding to each level) that cooperate to implement a single computation is embodied by various implementations disclosed herein.
Of course, there is no language useful for all purposes and thus there is a need for multiple types of queries and plans. In some circumstances, the queries and plans may be written in the same language, or in different languages. Moreover, partial compilers can be used to handle mixed languages by translating between the various query and plan types.
As illustrated, the partial compiler, in turn, further comprises a reducer 236 and a generator 238 wherein (a) the reducer 236 reduces its received queries to produce one or more sub-queries 236′ which are then passed (via the operative coupling) to one or more child compilers 240 as input queries 242 for such child compilers 240 and (b) the generator 238 receives (via the operative coupling) one or more sub-plans 238′ from the one or more child compilers 240 as output plans 244 from such child compilers 240 (in response to the sub-query 236′) from which the generator 238 generates and outputs it collective plan 234. As such, the reducer 236 and the generator 238 are separate but interdependent pieces that perform, respectively, query reduction and plan generation. The generator also can use query 232 when constructing the collective plan 234.
It should be noted that, in certain implementations, a child compiler may itself comprise one or more partial compilers and one or more child compilers. Moreover, in these and other such implementations, a child compiler may instead comprise one or more self-contained compilers 210. Thus, hierarchical modular constructions of partial compilers 230 and child compilers 240—wherein each child compiler 240 in turn comprises either partial compilers 230 or self-contained compilers 210—are readily possible in the form of tree structures of varying breadths and depths. Moreover, for some implementations, it may be said that for an n-ary partial compiler there is a need for n child compilers such that, given an input query, the partial compiler returns n sub-queries by “reducing” the original query to a set of n “simpler” (or sub-component) sub-queries; and the partial compiler also returns a plan that, given sub-plans resulting from these n queries, it then uses to generate an output plan for the original query. In addition, each compiler—partial or self-contained—may be presented as a “black-box” component with well-defined inputs, outputs, and functionality but without exposing the internal operation of such components. Lastly, a partial compiler may either partially compile a received query into a plan in addition to utilizing the sub-plans provided by a child compiler, or the partial compiler may further comprise a feature by which it inspects the received query and elects to operate as either a self-contained compiler or as a partial compiler.
Consequently, certain such implementations comprising a modular compiler operate such that a first partial compiler requires help from a first child compiler, and the first child compiler may itself be a second partial compiler that requires help from a second child compiler, and so on and so forth until the child compiler is a self-contained compiler or, for certain alternative implementations, a partial compiler that can act as either a partial compiler (and engage other compilers) or as a self-contained compiler (when no help is necessary). In any event, various implementations disclosed herein anticipate that the partial compiler may complete part of the compilation on the query itself to produce at least portions of the plan without assistance from any child compilers. In any event, the modular approach is intended to ease the complexity of compilation for complex source code such as source code intended for execution on a cluster.
For various such implementations, partial compilers are a generalization of self-contained compilers that, on receipt of a query, enlist the help of child compilers to perform the compilation. For example, the functionality of the DryadLINQ compiler can be provided by using a cluster-level partial compiler to generate a plan to distribute sub-queries among many machines and instruct child compilers for such machines to perform a sub-compilation (a part of the larger whole). To generate a plan for each such child compiler, the cluster-level compiler creates a machine-level sub-query which is then handed as a query to a machine-level child compiler. The machine-level child compiler then generates a machine-level plan which is returned as a sub-plan to the cluster-level partial compiler, and the cluster-level partial compiler uses this sub-plan to generate its own plan. In addition, for certain implementations, the machine-level child compiler may itself be a partial compiler that generates and distribute sub-queries for a plurality of processor cores (e.g., when targeting machines containing multicore processors) and instructs child compilers (which may be self-contained compilers) for each such processor core to perform the sub-compilations.
For certain implementations using a hierarchical modular approach, different compilers could be used to operate for generating code for different processors (having different processing capabilities, resources, etc.) as child processes under the control of a partial compiler. For example, compilation might be performed for a machine having two processor cores (CPUa and CPUb) and a graphics card processor (GPU). In these implementations, two different partial compilers might be used: one which generates plans running the GPU, and another which generates plans for the processor cores. If these compilers are organized as child compilers to a machine-level partial compiler, then given a query, the machine-level partial compiler might choose to send parts of the query as sub-queries to the child compilers targeting the processor cores while other parts are sent as sub-queries to the child compiler targeting the GPU.
To build hierarchical modular compilers for various implementations disclosed herein, one basic operation is “composition” by which one or more partial compilers and self-contained compilers are combined together to form the desired compiler functionality. As such, the composition defines the component compilers comprising a “parent” partial compiler and at least one child compiler wherein the parent partial compiler receives the sub-plan from the child compiler and incorporates the sub-plan directly into the plan that the parent partial compiler itself returns as output.
In addition, these implementations may further comprise a set of operators that can be used to manipulate partial compilers as objects. Programs built using these operators together are a form of “structured programming” which manipulates compilers as values. Utilizing a small set of well-defined operations, in turn, can help ensure correctness for the resulting combination of modular compilers (which can be referred to as a “composite compiler”).
In addition, a “case” operation may also be provided that is an extension of the conditional operation.
In addition, a functor operation can be used to create a partial unary compiler given separate and independent reduction and construction functions operating on queries and plans respectively. Whereas traditional compilers may include a sequence of optimizing passes, for example which transform a query from a higher-level representation to a lower-level representation, a modular compiler may similarly comprise such optimizations constructed using a “functor” operation applied to the optimization functions. Of course, this and others in the group of operations may also be combined together to provided blended functionality.
The use of partial compilers in building systems provides decreased design complexity by reducing the number of possible interactions between the components involved. It is also expected that a monolithic compiler applying a sequence of complex optimization passes to an internal representation should be less robust than a collection of partial compilers solving the same problem since the partial compilers communicate through well-defined interfaces and maintain independent representations of their sub-problems. Moreover, the correctness of compilers and partial compilers can be treated modularly as correctness is preserved by all operations, and correct compilers can even be assembled from partial compilers or compilers that are only correct for some queries. Restricted correctness can also be treated modularly, and a natural Hoare-type logic can be defined to reason about correctness in this circumstance. For example, a conditional partial compiler may be used to fix bugs in a child compiler by sending queries which the child cannot handle correctly to an alternative child. Alternately, a conditional partial compiler can be used to route a query either to a GPU-compiler or to a multi-core compiler, depending on whether the query can be compiled for a GPU or not.
Computer-executable instructions, such as program modules, being executed by a computer may be used. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Distributed computing environments may be used where tasks are performed by remote processing devices that are linked through a communications network or other data transmission medium. In a distributed computing environment, program modules and other data may be located in both local and remote computer storage media including memory storage devices.
With reference to
Computing device 500 may have additional features/functionality. For example, computing device 500 may include additional storage (removable and/or non-removable) including, but not limited to, magnetic or optical disks or tape. Such additional storage is illustrated in
Computing device 500 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by device 500 and includes both volatile and non-volatile media, removable and non-removable media.
Computer storage media include volatile and non-volatile, and removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Memory 504, removable storage 508, and non-removable storage 510 are all examples of computer storage media. Computer storage media include, but are not limited to, RAM, ROM, electrically erasable program read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 500. Any such computer storage media may be part of computing device 500.
Computing device 500 may contain communication connection(s) 512 that allow the device to communicate with other devices. Computing device 500 may also have input device(s) 514 such as a keyboard, mouse, pen, voice input device, touch input device, etc. Output device(s) 516 such as a display, speakers, printer, etc. may also be included. All these devices are well known in the art and need not be discussed at length here.
It should be understood that the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination of both. Thus, the methods and apparatus of the presently disclosed subject matter, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium where, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the presently disclosed subject matter.
Although exemplary implementations may refer to utilizing aspects of the presently disclosed subject matter in the context of one or more stand-alone computer systems, the subject matter is not so limited, but rather may be implemented in connection with any computing environment, such as a network or distributed computing environment. Still further, aspects of the presently disclosed subject matter may be implemented in or across a plurality of processing chips or devices, and storage may similarly be effected across a plurality of devices. Such devices might include personal computers, network servers, and handheld devices, for example.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.