This is a continuation of U.S. patent application No. ______, filed Jan. 13, 2006, having attorney docket no. 3382-71868-01, and entitled “TYPED INTERMEDIATE LANGUAGE SUPPORT FOR LANGUAGES WITH MULTIPLE INHERITANCE”. This application is incorporated herein in its entirety.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The field relates to compiling computer programs to intermediate language representations. More particularly, the field relates to compiling object-oriented source code to typed intermediate language representations.
Compilers transform computer source code from high-level programming languages to machine code. A common method of compiler construction compiles the source code into an intermediate language, which is then itself compiled into object or machine code. Early compilers type-checked the source language during the initial compilation, and then threw away the type information, leaving the intermediate language untyped. But, maintaining type information within the intermediate compiler representation has significant benefits.
For instance, a typed intermediate language allows intermediate program representations to be themselves type-checked. Errors in the types found at the intermediate level can often be traced to compiler errors, and, thus, can be used to debug the compilers, an otherwise arduous task. Furthermore, the typed intermediate representations can often be more effectively optimized at the machine code level, and safety proofs for the underlying programs can be more easily created. Moreover, typed intermediate representations can be used as a format for redistributing programs, and a user can (mechanically) check that the program redistributed in the intermediate form is safe to run, as opposed to relying on certificates or third party claims of trustworthiness.
One reason that compilers for object-oriented languages have failed to adopt compilation using typed intermediate representations is that traditional class and object encodings have been seen as too complex to type effectively at an intermediate level. Even though work has been done for developing typed intermediate languages for functional languages, much of this work does not support object-oriented programming languages.
Thus far, those typed intermediate languages that have been proposed for object-oriented languages are complicated, often inefficient, and do not allow compilers to use standard implementation techniques. In short, they are not suitable for practical compilers. Furthermore, those typed intermediate languages which do exist do not support more complex object oriented behaviors such as multiple inheritance, and even more complex systems such as multiple inheritance using virtual base classes. With virtual inheritance, a shared copy of a superclass is allowed in a subclass object when the subclass inherits the superclass more than once.
A practical compiler requires simple, general, and efficient type systems. First, compiler writers who are not type theorists should be able to understand the type system. Second, the type system needs to cover a large set of realistic object-oriented language features and compiler transformations. Third, the type system needs to express standard implementation techniques without introducing extra runtime overhead. Fourth, the type system needs to support complex and powerful object oriented language components such as multiple inheritance using virtual base classes. To enable any of the above at the intermediate language level, methods and systems are needed to maintain type information in the intermediate language compiled from a source code representation.
Described herein are methods and systems for generating typed intermediate representations of source code written in languages that allow multiple inheritance. In one aspect, the typed intermediate representations are generated using existential types to represent the possible paths from subclass objects to superclass objects within a class. In another aspect, the typed intermediate representations are generated using existential types to represent “this” pointers of methods and using special expressions to represent address arithmetic for necessary adjustment of “this” pointers in methods overridden by subclasses. In yet another aspect, at least one code portion of the typed intermediate representation comprises records which represent an object of a class when the runtime type of the object is known. These records, in an exemplary embodiment, reflect the actual layout of the object.
In another aspect, at least one code portion of the typed intermediate representation comprises records which represent an object of a class when the runtime type of the object is not known. These records, in an exemplary embodiment, reflect the inner non-virtual base objects of the enclosing object and include representations of offsets from subclass objects to virtual objects within their inheritance path.
In another aspect, at least one code portion of the typed intermediate representation can be checked for type correctness by only allowing objects with paths which are identical to the original object are allowed to call the method.
In yet another aspect, the typed intermediate representation faithfully models standard untyped implementations of multiple inheritance. Subclass objects can be cast to superclass objects by adding the offset from the subclass object to superclass object. If the superclass object is a virtual base, then the displacement from the subclass object to the superclass object must be fetched from a vtable. Similarly, casting from a superclass object to a subclass object is performed by subtracting the offset from the subclass object to the superclass object, if we statically know that the cast is valid.
The foregoing and other objects, features, and advantages of the invention will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures.
1. Overall System for Type-checking Intermediate Representations
Alternatively, after an initial compilation from the original source code representation 105 to an intermediate representation 115, the compiler optimization processes 140 can be applied to the intermediate representation 115 to further streamline the original source code 105 according to particular target architectures, for instance. Nevertheless, applying the optimizations 140 results in an optimized form 145 of the intermediate representation 115, which can also be type checked by the type checker 120. In fact, without a typed intermediate representation 115, verifying the type safety of optimizations 140 would be difficult and, in some cases, would create unwanted overhead during runtime. The dashed lines connecting optimizations 140 and optimized form 145 of the intermediate representation 115 to the type checker 135 indicate that optimizations 140 are not required to be applied to the intermediate representation 115 prior to type checking.
Also,
2. Exemplary Method of Compiling a Source Code Representation of an Object-oriented Language to a Typed Intermediate Representation
One way to make typed intermediate representations for multiple inheritance, such as 115 (
One important feature of the typed intermediate representation is that it represents the displacement from a class to at least one virtual superclass of that class. With reference to
Each object of these classes has data stored within. A representation of the object is stored in some location on a resource such as on a file or in memory when the program is compiled. Data is then stored in this representation when the program is run. Typing protects data when it is initially compiled from being accessed incorrectly by only allowing objects with the correct class to access fields that represent the data. The intermediate representation represents the actual layout of the physical structure of the objects and prevents data fields from being incorrectly accessed by only allowing objects of the correct type to access a representation of the actual storage locations.
Virtual inheritance and multiple inheritance add complication to the process of typing objects in an intermediate representation. When a class inherits a number of super classes, the various members of the class are generally at known offsets. However, when classes include virtually inherited members, the offsets become much more complex because although, in our continuing example of class C, both A and B inherit E, the child E cannot keep the offsets from both parents A and B at the same time.
In object-oriented languages, subclass objects are allowed to use methods of their superclass objects without explicitly casting the type to the specific class type method that is used. To further complicate matters, because there is multiple inheritance, there may be more than one inheritance path from a subclass to a superclass. Furthermore, the actual inheritance path that an object will take may not be known until runtime. Each different inheritance path may require a different offset from the subclass object to the super class object at runtime. There is a set of valid subclass-to-superclass object casts. The typed intermediate language captures both the actual offset of virtual objects from their subclass objects and captures the valid set of subclass-to-superclass paths (representing offsets in the resource object state) from subclass objects to super class objects.
3. Object-Oriented Language Overview
A. Classes
In object-oriented programming, programs are written as a collection of classes. Classes are composed of their superclasses, their own data fields, their own methods, and the data and methods of their superclasses.
An instance of a class is an object. Each method in an object has an associated this argument. The this argument is a pointer to the individual object for which the method is being called.
The use of classes allows the underlying programs to encapsulate data, exhibit polymorphism, and allow inheritance. Data encapsulation can also be thought of as data hiding. Classes hide their internal composition, structure and operation. They expose their functionality to client programs that utilize the class only through one or more interfaces. An interface defines the actual methods with their data types that a class uses to interact with client programs. However, the implementation of the class is defined separately and generally cannot be modified by a client program through the interface.
The advantage of using data encapsulation comes when the implementation of the class changes but the interface remains the same. For example, a method to sort members of an array may be replaced internally with a different, more efficient sort. If the interface does not change, then code which utilizes the sort will still be valid.
Inheritance allows classes to easily share common features. A child class (also known as a derived class ) inherit the properties of its parent (base) class, and is free to add features if its own. This allow classes to share common features. For example,
Polymorphism is the ability for different types of objects to invoke the same method and have the method produce the appropriate different results. The classic example is “area.” A base class “shape” can have different derived methods (such as “rotate”, “changecolor”, etc.) for different shape classes, such as “circle” “square”, “triangle”, etc. The same rotate method can then be applied to any shape object and return the correct results. The base class does not need to know, and often doesn't know, the type of the object that invokes the method until runtime. This is called dynamic binding.
B. Multiple Inheritance
Multiple inheritance allows a class to inherit the same class more than once. This is useful when a class has two or more sets of behavior a class should require. For example, with reference to
To prevent this unnecessary duplication, some languages that have multiple inheritance allow two different types of inheritance, nonvirtual and virtual. With nonvirtual inheritance, the child class contains a copy of each of the members of the inherited class, as well as copies of its own members. Therefore, when a class inherits from the same nonvirtual base class more than once, objects of that class will contain multiple copies of the data members of same multiply inherited subclass, such as the example of Student Intern 205 inheriting two copies of the Person class 220, 225.
Alternatively, a virtual base class can be inherited more than once without duplication of its data members. Each object has only one copy of a virtual base.
C. Inner, Outer, Innermost, and Outermost Objects
At runtime, instances of a class are known as objects. An object whose type is an inherited superclass of a class is referred to as an inner object or as an enclosed object. In
4. Exemplary Path Representation
Exemplary embodiments represent path in a different way than the “path” in, for example, the C++ source language. This path notion is more explicit and complete, in the sense that it describes the whole path from the outermost object to each inner object. Each object is associated with a path (either concrete or abstract). Concrete paths are paths for those objects whose runtime type is known at compile time. For objects whose runtime type is known only at runtime, an abstract path is provided. Providing the path allows differentiation between different copies of the same super class in a subclass object. C++ paths do not need to start from the outermost object. As another difference, C++ does not have abstract paths. Among other differences, C++ paths do not use an explicit identifier, such as “VB” to identify virtual bases, and do not lift virtual bases to the top level.
A path, in an exemplary embodiment is a sequence of class names that represent a way, at runtime, from an outer object to an inner object. This is called an inheritance path. In languages that allow multiple inheritance and virtual bases, the path that may be taken from an outer to an inner object, may only be fixed at runtime—that is, it may be bound dynamically. For example, and with continuing reference to
Paths are defined as follows: a path is a sequence of class names that represents a way from an outer object to an inner object, as shown above. The letters P and Q will be used to range over paths. ε will be used to represent an empty path. P:(D, E] represents a path P of the form D:: . . . :: . . . Dn::E that starts at D exclusively-exclusivity is indicated by the “(”—and ends at E, inclusively-inclusivity is indicated by the “]”—where all bases are non-virtual bases. Abstract paths are represented by path variables. A path variable ρ:(τ, A] abstracts all paths from τ to A where τ represents the runtime type of an object which might be unknown at compile time.
The pseudo-class name VB is used to lead paths from an outermost object to an inner object of a virtual base. Paths for inner objects of a virtual base class E are compressed to VB::E. There is only one copy corresponding to the virtual base class E. Thus we do not need to track the intermediate classes. In the example given just above, with reference to
The type Path(P, τ) represents object with actual runtime type τ and path P that starts form τ and leads to the object. Type Path(ε, C) represents complete C objects which are not embedded within any other object. The record type R(C) (discussed below) represents the layout of the complete objects.
More specifically, paths are sequences of class names that uniquely identify embedded objects. A path P lists all intermediate classes between two classes τs and τe where τs is a subclass of τe. P, specifically, can be thought of as a path from τs to τe.
The “this” pointer of a method has an associated path, as it points to an object. Calling a method m on an object o is implemented as first fetching m from o's vtable and then passing o to m as the first argument (the “this” pointer). Here, the “this” pointer of m is given the same path as o. The type checker requires that only objects that have the same path as o can be passed to m as the “this” pointer. This prevents unsafe dispatch where objects with incompatible types are passed as “this” pointers.
5. Object Layout
In object-oriented languages, the objects of subclasses can be viewed as objects of the superclasses. So, Student intern object (205C of
Object layout for objects whose runtime types are classes with multiple inheritance is more complex than for objects whose runtime type are classes with only single inheritance. When classes inherit singly, then all inner objects can share the starting address and vtable with the outermost object.
Objects whose type is a class with multiple inheritance need multiple vtables, as some inner objects must have different addresses than the outer ones that are not in a single direct line of inheritance. So, turning to
If a subclass C overrides a method in a super class B, such as a Student intern class (205B in
Virtual inheritance further complicates object layout. Consider the system of C, A, B, and E in
With virtual inheritance, the displacements from classes to their virtual bases may change in subclasses (the “B” class and A class each have different offsets to the “E” virtual class in
However, the inner objects for virtual bases (E 330) are floating—their locations may vary within C and each of C's subclasses, and therefore need to be calculated at runtime. Each inner object may have sub inner objects of its own if the corresponding class has non-virtual bases.
Only the innermost objects have vtables as their first fields. B is an innermost object. The outermost object (C) shares its vtable with its first non-virtual base (A), and so on.
Each inner object for C, that is, A, B, and E, have their own vtable 314, 324,334. The C object 310 shares the starting address 305 and the vtable 310 with the A object 310. The vtable record also contains class-specific information used to identify the (outermost object) class—a runtime tag. For example, the vtable for A (an inner object of the C outer object) has a tag for C, “TAG(C)” 315. The other inner objects within C also have tags for C as well, 325, 335. Each vtable of class C 314, 324, 334, contains a runtime tag 316, 326, 336 that identifies C, and allows the inner object to call the overridden method in the outer object. Each inner object (such as B 320) needs its own vtable to use as an ordinary object. For example, an inner object may have its own methods 316, 326, 336, pointers to which are stored in the vtable.
The inheritance relationship for virtual bases is not mapped directly in the actual object layout for two reasons, one obvious, one not so. Because multiple object inherit the same virtual base, and since there is only one virtual base, different objects must be at different locations (offsets) from the virtual base. Also, virtual superclasses inheritance relationships are “flattened” in the sense that, if class C inherits A (virtually or not) and A has a virtual superclass E (as can be seen, for example, in
Because the VB classes are at the top level of an object, rather than in the position they would appear to be by looking at the actual inheritance, some paths that appear to exist actually do not. For example, although the paths to E from C in the source code object hierarchy shown in
Each object needs its own way to locate the inner objects for virtual bases, as offsets between subclass objects and virtual superclass objects may vary, depending on subclass objects and actual runtime types of the objects. If a subclass inherits a virtual superclass more than once, such as E, there are not multiple copies of E, rather, each inheritance of E shares the same object. With reference to
One standard solution is to have the displacement stored in connection with the vtable for each superclass object. A standard solution for storing these displacements is to give each class with virtual bases a “virtual base table pointer” (vbptr) in its vbtable. The pointer points to a table that lists displacements from an object of the class to all of its direct virtual bases. So, if an object directly virtually inherits four classes, then there are four entries in vbptr.
To return to
Not all offsets within vbptr are from non-virtual superclasses to virtual classes. The offsets can be from virtual bases to their virtual bases. For example, if A is a virtual super class of C, while all other relations remain unchanged, the vtable of A will have a vbptr that contains the offset from A to E.
In another approach, each object of a class which inherits virtual bases contains a vbptr which points to the vtable of the class. The vtable itself is a pointer to a set of offsets, each offset corresponding to a virtual base superclass object. To get the E virtual base object out of the C object, first the vbptr is fetched from the C object which points directly to the vtable. Once at the vtable, then the appropriate entry for E is located. This entry points to the offset between the E object and the C object. Once the offset is known, it is added to the C object.
As another embodiment, rather than storing the offset from each virtual base object associated with an outer object in a vbtable, a pointer to the beginning address of the virtual base object can be stored directly in methods associated with the outer object.
6. Exemplary Record Types Related to Class Layout
Typical object-oriented languages allow classes and subclasses to be used interchangeably. Subclass objects can be used as super class objects. A class name represents objects of itself and its subclasses. In fact, this is seen as one of the strengths of object-oriented code. However, in the exemplary typed intermediate representation, class names refer only to the class themselves, and not to any subclasses because the representation has notions to represent precise runtime types to guarantee safety of dynamic dispatch. Interestingly enough, this means that quite possibly the intermediate representation is typed with more precision than the original source code.
Runtime object layouts in a typed intermediate representation are expressed in two record types. These two record types are shown in
A. Record Types
The two record types, exact 900 (from
The exact record type 900 describes the layout of a complete object C, including, in an exemplary embodiment the inherited classes (both virtual and non-virtual) and their vtables, vptrs, methods, and fields. This layout at least partially maps to the actual layout of the object C in a compiled representation of the computer program. Many of the features within exact record type 900 (such as, for example, the vtable, vptrs, and methods, in an exemplary embodiment) are themselves typed. For example, the exact record type 900 (known as R(C)) for Class C as defined in
The labels A 902, B 906, and E 908 identify inner objects corresponding to the A 310, B 320, and E 330 objects shown in
The methods of each of the bases are placed within the record at the same location that they occur within the actual object structure. For example, the methods for A 914 occur after the vbptr table for A 913, just as in the actual structure, as shown in
All virtual bases must be top-level fields in R(C), as has been described above, representing the flattening of the virtual base superclasses. So, with reference to
An exemplary embodiment uses path types to represent objects. These explicit notions of runtime types and paths guarantee the soundness of dynamic dispatch, as the type checker guarantees that only objects with allowed runtime types and paths can be passed to methods as “this” pointers at runtime.
B. Approximate Record Types
Class types are imprecise, deliberately, in object oriented programs, due to polymorphic nature of object oriented languages. An object with, for example, Class type C may be a C object or a C inner object in a subclass of C. For example, a student intern class 205C of
For most objects we know neither the runtime type nor the path, but we still need to describe the path that is taken to reach this object. Then, we can match the paths of “this” pointer in the method and of the object passed to the method.
An exemplary embodiment uses an existential type with path abstraction to represent the possible runtime types and paths for a specific object. Exemplary source languages generally allow classes and subclasses to be used interchangeably even though the precise type—the runtime type—is dependent on the execution path, which becomes evident only at runtime. In a typed intermediate representation provided with precise notions of class names, such as this one, the loose reference of source code class-names (allowing a subclass object to access a superclass object method using the subclass name) cannot be used to refer to the types of objects that are classes or their subclasses. Instead, in generating the intermediate representation, a bounded existential type ∃α<<C. ∃ρ:(α,C]. Path (ρ, α) binds a type variable α to abstract the runtime type of an object which is not known at compile time. It represents all C objects, complete or embedded. The type variable a identifies the runtime type of the outermost object, which must be C or a subclass of C. The “<<” indicates the subclass relationship, so that C<<B indicates that C is a subclass of B. The variable ρ abstracts the path from the outermost object (identified by α) to the C inner object. If we follow ρ in the α object we will get to a C object.
An exemplary embodiment of the approximated record type ApproxR(P, τ) is shown in
The ApproxR(P, τ) record 1000 contains descriptions of a class's non-virtual bases 1002, 1014, and excludes the class's virtual bases. Referring to the exemplary illustration
vbptr: Ptr{E: Disp(P::A, VB::E, τ)} 1008;
This displacement form, generalized as Disp(Path1, Path2, τ), represents the displacement from an inner object following Path1 (A) to another object following Path2 (the virtual base E) in the outer object (τ). The character τ is a stand-in for the actual runtime type—a type variable. This details the offset between the A inner object and the virtual base type E inner object in τ, the actual runtime type.
As with the exact record type 900, the ApproxR type also describes the actual layout of the object in question (in an exemplary embodiment, C) as can be seen by the location of the methods 1006, 1028, which are in the same location as the methods in the actual runtime representation of the object 316, 326. Similarly, A's fields 1012 and B's fields 1024 are in the same location as in the actual runtime object of C 311, 321 (in an exemplary embodiment).
Objects can be coerced to records, and vice versa. If an object has runtime type C it can be coerced to a record of the exact record type R(C) 900. Objects whose source type is C (complete C objects or embedded C objects in C's subclass objects) are translated to have type “∃α<<C.∃P:(α, C]. Path(P, α)” in the typed intermediate representation. These objects can then be opened to get objects of type “Path(P, α)”. Objects of type “Path(P, α)” can then be coerced to and from records of type ApproxR(P, α). These coercions are runtime no-ops, and thus, introduce no overhead at runtime. Creating new objects is done by first creating new records and then coercing the records to objects. Fetching fields out of objects is done by first coercing objects to records and fetching fields out of the records.
The two records, the exact record type 900 and approximated record type 1000 need to understand, and therefore map, the layout that the compiler chooses for the objects. Therefore, the exemplary embodiment above indicates only one possible implementation for the compiler, and, hence, the layout of the exact record type R(C) 900 and approximated record type 1000. It should be remembered, however, that not all typing rules in an exemplary embodiment need to use these two records. Thus, the rest of the type system can be independent of these aspects of the layout strategy.
C. “This” pointers
Each virtual method has a hidden parameter—the “this” pointer. The “this” pointer is automatically initialized to a pointer to the object for which a method is invoked. When a method returns a reference to its current object, that reference is a “this” pointer. In an exemplary embodiment, within the intermediate representation, the typed method has a typed “this” pointer. Explicit adjustments on “this” pointers, in an exemplary embodiment, are necessary when subclasses override or inherit methods from superclasses, and are included in the intermediate representation.
Multiple inheritance may require “this” pointer adjustment when a subclass method overrides a superclass virtual method. If class C overrides a method m introduced in a superclass B, the implementation in C expects a C object. However, m may also be called on the embedded B object in C. In an exemplary embodiment, an “adjuster thunk” for m is placed in the vtable of the embedded B object in C. This thunk converts a B object embedded in a C object into a C object, and calls the new implementation on the C object.
Each component in ApproxR(P, τ) 1000 can be identified by a path from a subclass to a superclass. If a B component exists within a C object (as is shown in
In an exemplary embodiment, each vtable corresponds to a path from the outermost object to a specific superclass inner object. Each virtual method contained in the vtable accepts only objects with the corresponding path as “this” pointers. The intermediate representation guarantees that if you get to an object by following a specific path and fetch a method from the object, you can only call the method on objects of the same path type. If the path type of an object is known, then the “this” pointer types of the virtual methods contained in the object are also known.
One complication with the “this” pointer is that if a superclass B defines a method, the subclass C should be able to reuse the method name and implementation. It doesn't need to redefine them. As an example consider a class B which defines a method m. If in C there is no method m defined, that is, there is no explicit overriding method m, then there is no actual overriding, either. The same method m, the one defined in B, is used. The same implementation must accept both complete B objects and inner B objects in C.
However, C can override the method m in B by giving a new body to method m. To guarantee soundness of dynamic dispatch, when a method o.m is called, the m should have a “this” pointer with the same path and the same runtime type of o. There is a mismatch of “this” pointer types between method implementations in the class and methods in the vtable.
To reconcile this problem, two different views for the “this” pointer exist. In a vtable, “this” pointer types of methods have the same path and runtime type as the object that corresponds to the vtable. A method implementation uses an existential type for the “this” pointer to abstract all possible paths.
Pseudo source code in an object-oriented programming language for the exemplary classes C and B shown in
With reference to
Both overriding and inheriting methods require “this” pointer adjustment. Furthermore, suppose C<<B′<<B and P (C,B]. B′ implements m with function ƒ and no other classes between B′ and C implement m. If B′=C it means that C overrides m, Otherwise, C inherits the implementation of ƒ from B′. If adjuster thunks are used, then the adjuster thunk for m in C's vtable has the “this” pointer of type Path(P, C). Function ƒ requires that “this” be a B′ object with existential types. The thunk calls ƒ after subtracting the displacement from B to B′ from the “this” pointer and packing the result, which is a B′ object, to a desired existential type. When B is a virtual base of C, the displacement from B to B′ must be statically known as no displacements from virtual bases to their subclasses are stored within the objects. Therefore, “this” pointers of adjuster thunks must have concrete runtime types (that is, offsets known at compile time) and cannot use existential types that hide (or give a range for) their runtime type.
As discussed, subclasses can inherit method implementations in superclasses. To represent all possible subclass objects, an existential path type is used. The “this” pointer of a method implementation in class C is given type ∃α<<C.∃ρ:(α, C]. Path(ρ, α). representing any C object, either complete or embedded in subclass objects. For a thunk in the vtable of an object with path, the “this” pointer can only have the specified path P.
7. Exemplary Methods of Typing an Intermediate Representation of a Computer Program
At process block 710, compiler (e.g., 110 at
Two record types are generated for this exemplary class, an exact record, an exemplary generation of which is shown in
Turning to
The exact record also includes typed methods with typed “this” pointers. The intermediate representation generates “this” pointer types 735 as described above.
Another record type, the approximate record type, is generated and is used to represent objects when the actual runtime type is not known. An exemplary embodiment of the approximate record is ApproxR(P, τ) described above. In ApproxR(P, τ), types of both “this” pointers of methods in vtables 755 and offsets to virtual bases in vbptr 750 refer to P and τ.
Objects whose source type is C (either complete C objects or embedded C objects in C's subclass objects) are translated to have the existential type “∃α<<C.∃P:(α, C]. Path(P, α)” in the typed intermediate representation. These objects can be opened to objects of type “Path(P, α)”, and then coerced to records of type ApproxR(P, α).
8. Exemplary System Which Describes the Layout of a Class in a Typed Intermediate Representation of a Source Code Representation of an Object-oriented Language.
An approximate record type ApproxR(P, τ) 906 is used to describe the layout of an object when the runtime type is not known. It describes the statically known components of the object, referring to both the actual runtime type τ and the path P of the object.
The intermediate representation further comprises a representation of “this” pointer types for virtual methods, such that when a subclass overrides a virtual method in a superclass, the “this” pointer is adjusted appropriately to view the subclass object as a superclass object.
9. Exemplary Type Syntax in the Exemplary Intermediate Representation
Based on the descriptions above of a type intermediate representation wherein class name-based information related to classes are retained, and where the underlying computer language has notions of multiple and virtual inheritance, at least some types for such a typed intermediate representation are as shown in Table 1, below.
The rules as expressed above and from hereon are just one set of embodiments of one set of representations of the actual rules that can be applied in a computer program implementing a typed intermediate language. Other embodiments and other representations of the rules that apply principles expressed with reference to these rules are also possible. For instance, notations, operands and operators of the rules may be changed in form without deviating from the principles expressed therein.
The first syntax category, kind includes kind Ω for all types and kind Ωc that classifies class names and type variables that will be instantiated with class names. Ωc is a subtype of Ω.
Path P represents paths, as described above. A path is either an empty path ε, a path from an object to a virtual base (with the flattening of the hierarchy already described taken into account) VB::C, a path variable ρ, or a path appended with a class name P::C. P::(C1:: . . . :: Cn) is equivalent to (P::C1):: . . . :: Cn. The keyword “VB” must be followed by a virtual base, as the path VB::C indicates the path to the virtual base C from the enclosing object. A non-empty path begins with one of: “VB” followed by a class name (which must be a virtual base), a class name, or a path variable. The rest of the path must be a sequence of class names.
Standard types include integer type int, pointer type Ptrψτ, and record type {l1φ1:τI}ni=1. Also included is function type (τ1, . . . , τn)→τ, and type variable α. Existential type ∃α<<τ.τ. and universal type ∀α<<τ.τ are subclassing-bounded quantified types, denoted by the subclassing constraints on type variables. Type variables are given bounds in terms of subclassing.
At least some types are defined that are used specifically to represent multiple inheritance. These include special types of path abstractions that take into account the multiple paths through an object that exist when multiple inheritance is allowed. For instance, ∃ρ:(τsτe]. τ is an existential type that hides a path that starts with the class τs and ends with τe, where ρ is the path variable, where τs represents the runtime type of an object and τe, represents the static type of the same object. Disp(P1, P2, τ) represents the displacement from an inner object following Path1 to another object, which must be a virtual base, following Path2 in the outer object, whose runtime type is τ. Both P1 and P2 must start at τ.
Type variables (tvars) are indicated by the string “tvs” which represents a sequence of type variables, each bounded by either a superclass or another type variable.
Mutability is indicated using φ. This annotation, φ, when used with pointers, indicates that the values that are pointed to by the pointer can be modified by assignment. The letter “M” indicates that an expression is modifiable (mutable), while an “I” indicates that the expression is not modifiable. Annotation with the “M” and the “I” on field labels indicates whether or not the corresponding fields are mutable. “I” can be omitted, as the default is “not mutable.”
10. Exemplary Syntax for Expressing Values and Expressions in the Exemplary Typed Intermediate Representation
Based on the syntax described above for the type intermediate representation, at least some of the values and expressions in the typed intermediate representation are shown in Table 2, which follows:
The pointers in typed intermediate representation have two possible forms: l, and ptr.l. Pointer ptr can be a heap label l, indicating it is a pointer to a value on the heap. Pointer ptr can also be an interior pointer ptr.l which points to the field l of a record, the record pointed to by ptr.
The typed intermediate representation also includes, but is not limited to, word-size values (represented by “v”), as discussed in this paragraph. These include the integer literal n. Expression C(v) coerces a record labeled by v to an object of the class name “C”.
The expression tag(C) represents a tag value, such as shown at 315, 325, and 335 (
“hv” defines heap-allocated values as records. The function fix gtvs(x1: τ1): τ=em defines g with type parameters tvs, and formals of type (x1 . . . xn), where they formals are of type τ1 . . . τn, respectively. The return type is τ, and the function body is em. Function body em may call g recursively.
“e” defines expressions of the typed intermediate language. C(e) and c2r(e) are coercions between objects and records. Expression C(e) coerces a record labeled by e to an object of the class name “C”. The expression c2r(e), the opposite expression, coerces an object e to a record C. The existential pack operation of “pack τ as α<<τu in (e: τ′) ” introduces an existential type comprising a type variable with sub-classing bounds. The expression “(α, x)=open (e1) in e2”, the opposite expression, eliminates the subclassing-bounded existential type.
Several expressions are used to handle the specific situations that arise when typing a language which allows multiple inheritance and virtual inheritance. These include, but are not limited to, the pack and open expressions pack P as ρ: (τsτe] in (e: τ), and (ρ, x)=open(e1) in e2. The pack expression introduces the path abstraction. Specifically, it hides path P from the starting location τs to the ending location τe. This hidden path is represented by the path variable ρ used in the expression “e”. The opposite expression (ρ, x)=open(e1) in e2 eliminates a path abstraction. It opens the expression e1, which exposes the heretofore hidden path ρ, and a value variable x.
The displacement of superclass objects within a subclass object is not straightforward, as detailed with reference to
The opposite expression “e ⊖C” points to the enclosing outer object of the C inner object. In the expressions “e1e2” and “e1∪e2”, e1 is an object and e2 is a displacement from the e1 object to a virtual base. The expression “e1e2” adds to the object e1 the displacement to the virtual base (e2). Similarly, the expression “e1 ∪e2” subtracts from object e1 the displacement.
11. Dynamic Semantics
Expression “c2r(C(v))” coerces object C(v) to record. Expression “c2r(C(v)•P) ” coerces an inner object of C(v) following path P to an interior pointer v.P.
Expression “v⊕C” adds to record v the displacement to the inner object C and returns an interior pointer v•C, which points to the C inner object of v. This is used when C is a direct non-virtual super class of the runtime type of v. Expression “(v•C)⊖C” subtracts the displacement from the inner object C and returns the record v. The symbols and ∪ are used to represent pointer adjustment. Expression “(v•P1)v2” returns an interior pointer v•P2, if v2=disp(P1, P2, C) which represents the displacement from P1 to P2 in C. Expression “(v•P2)∪v2” subtracts the displacement v2 from the interior pointer v•P2 and returns the result v•P1.
12. Static Semantics
The typed intermediate language is sound, and its type checking is decidable, as shown in Table 3, below. In an exemplary embodiment, the intermediate typed language has a class declaration table ⊖ that maps class names to declarations. A kind environment Δ tracks type variables and path variables with regards to scope. Each type variable has a subclassing upper bound, which is either a class name or another type variable which already exists within the kind environment Δ. A path variable has a constraint of the form (τsτe] which defines a path that starts with the class τs and ends with τe. A heap environment E maps labels to types. A type environment Γ maps variables to types.
In an exemplary embodiment, class names and type variables (which can be instantiated with a class name) only have kind Ωc. All of the following types must be of kind Ωc: the bounds in subclassing-bounded quantified types, the type in tag types, and the starting and ending types of paths.
Only records of type R(C) can be coerced to C outermost objects of type Path(ε, C). A record coerced from a C outermost object has type R(C). A record coerced from an inner object of type Path(P, τ) has type ApproxR(P, τ), which defines the layout of the inner object.
The soundness is proved by progress and preservation, as shown below. The decidability of type checking is proved by the decidability of subtyping, also shown below, combined with the minimal type property.
13. Translating a Source Language with Multiple Inheritance and Virtual Inheritance into a Typed Intermediate Representation
This section shows how to translate an exemplary source code language with multiple inheritance and virtual inheritance into a typed intermediate representation. The exemplary language, the syntax of which is shown in Table 4, below, comprises object creation, field fetch/assignment, method invocation, local variable binding and upcasts. The source language, in this embodiment, assumes some preprocessing: explicit upcasts are inserted when necessary; member access from an object requires that the object be of the class that introduces the member; when creating a new object of class C, fields introduced by C itself are differentiated from those of C's non-virtual bases and those of from C's virtual bases.
The paths in the exemplary source language are the same as the paths in the typed intermediate language that contain no path variables. A class declaration can specify direct non-virtual and virtual bases of the class, a set of new fields, and a set of method implementations, including both new and overridden methods.
1. Types
A class name C in the source language is translated to an existential type with type and path abstractions, which represents a C inner object in a subclass object, as shown below in Table 5.
2. Expressions
A “new” expression is translated by inserting vtables into inner objects, and grouping the inner objects of virtual bases. A function “|e,P|” (shown in Table 6, below) translates an inner object e that follows path P. If e contains no inner objects, (the corresponding class has no non-virtual bases and e (such as E in
To create an object of C that has no non-virtual bases, a vtable (314, 324, 334 in
The upcast expression allows upcasting in the source language. In an exemplary embodiment, downcasting is not allowed. The translation of the upcast expression adds an appropriate offset to the object to cast. Both nonvirtual and virtual bases can be cast. Casting to a non-virtual base is performed by adding the compile-time known offset to the object. Casting to a virtual base involves fetching the displacement from the vtable (314, 324, 334 in
3. Methods
As shown in Table 7, below, methods are translated to functions with explicit “this” pointers. “|m,C|” means the translation of method m that is implemented in class C.
4. This Pointer Adjustment
Creating vtables requires thunks to adjust the “this” pointers of methods. Function “adjust(this, P, P′, C)” (shown in Table 8) transforms “this” of type Path (P, C) to type Path (P′, C), provided that P and P′ are valid paths from C. If P does not involve a virtual base, then P′ must be a prefix of P. P-P′ is the difference between the two paths.
Table 9 shows the creation of adjustor thunks. Suppose class A introduces a method m of type (τ1, . . . , τn)→τ and P:(C,A]. “|m,P,C|” creates an adjuster thunk for the m entry in vtablep, which has “this” pointer type Path(P, C). The body of the new function calls lm, which is the implementation of m in class A′ and P′: (C,A′] and no classes between A′ and C override m. The thunk uses the “adjust” function to get a “this” pointer of type |A′| expected by lm. The thunk can share the same address with lm if they only differ in coercions of “this” pointers.
14. An Exemplary Method for Type Checking an Exemplary Typed Intermediate Representation
Typing rules related to code portions such as expressions can be based on path types, both those known and unknown at runtime. Exact paths are determined at compile time to define objects whose types are known at compile time. Abstract paths are used at compile time for objects with unknown paths and unknown runtime types. Address arithmetic for adding and subtracting offsets from subclass objects to superclass objects is used to capture pointer adjustment. At 1120, such typing rules are applied to evaluate the type safety of at least one code portion of the typed intermediate representation. Later at 1130, once the type safety evaluation of the code portion is complete, the results of the evaluation are determined.
15. Computing Environment
With reference to
A computing environment may have additional features. For example, the computing environment 1200 includes storage 1240, one or more input devices 1250, one or more output devices 1260, and one or more communication connections 1270. An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing environment 1200. Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment 1200, and coordinates activities of the components of the computing environment 1200.
The storage 1240 may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, CD-RWs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment 1200. The storage 1240 stores instructions for the software 1280 implementing methods of generating typed intermediate language, and of type-checking the generated intermediate language.
The input device(s) 1250 may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, touchscreen, or another device that provides input to the computing environment 1200. For audio, the input device(s) 1250 may be a sound card or similar device that accepts audio input in analog or digital form, or a CD-ROM reader that provides audio samples to the computing environment. The output device(s) 1260 may be a display, printer, speaker, CD-writer, or another device that provides output from the computing environment 1200.
The communication connection(s) 1270 enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions, compressed graphics information, or other data in a modulated data signal. These connections may include network connections, which may be wireless connections, may include dial-up connections, and so on.
Computer-readable media are any available tangible media that can be accessed within a computing environment. By way of example, and not limitation, with the computing environment 1200, computer-readable media include memory 1220, storage 1240, communication media, and combinations of any of the above.
In view of the many possible embodiments to which the principles of the disclosed invention may be applied, it should be recognized that the illustrated embodiments are only preferred examples of the invention and should not be taken as limiting the scope of the invention. Rather, the scope of the invention is defined by the following claims. We therefore claim as our invention all that comes within the scope and spirit of these claims.