The present invention relates generally to computer programming languages and more particularly toward compilers and type inference.
A type system defines the organization of a computer programming language. Among other things, the type system specifies how data types are declared and employed. The process of verifying data types against the type system is referred to as type checking. If the type is checked at compile time, it is referred to as statically typed, whereas a language that is type checked at run time is called dynamically typed. Statically typed languages typically contain variables that can have but one fixed data type. Conventionally, programmers specify types explicitly. For example, int x=47; int y=11; int z=x+y. Here, each of the additive components, x and y, are specified as type integer. Similarly, the result, z, is also expressly denoted as an integer. Thus, if z is specified elsewhere in a local class, method or function as a string type, the compiler would generate an error.
As type systems become increasingly sophisticated, it becomes increasingly cumbersome for programmers to write explicit type declarations on local variable declarations and on invocations of generic methods, for example. Consider the following conventional C# declaration of a generic method MkArray:
To mitigate the burden on programmers and improve succinctness, some conventional languages have employed type inference. Type inference allows programmers to omit type annotations from expressions and/or variables whenever the types can be determined automatically by compilers and/or interpreters from the context. This eliminates unnecessary verbosity thereby making programs more concise and easier to read. For example, in C# it is possible to invoke the MkArray method without explicitly specifying a type argument:
Through type inference, the type arguments int and string are automatically determined from the arguments to the method by the compiler. Without type inference, a programmer would have been forced to write more garrulous assignments. For example, consider the following:
A simple type inference mechanism or methodology proceeds by deriving the types of the arguments of the function. In the first call, for instance, the compiler determines that both 5 and 213 have type int, written as 5<:int, 213<:int. In the second call, the compiler determines that both “foo” and “bar” are strings. Given the actual types of the arguments, the type inference mechanism then continues to match these actual types to the formal type parameters producing a substitution that binds type variables to types. In this scenario, the inferred bindings are T:=int for the first argument and T:=string for the second argument. Given such a substitution, the compiler subsequently verifies that the substitution is complete. That is, it provides a binding for all type generic type parameters, and that it is consistent in the sense that each type parameter is bound to the same type. In the above example, the substitution is both complete and consistent. Given a complete and consistent substitution, the compiler can then insert the correct type-parameters to the generic method invocation. Accordingly, a programmer can simply write:
However, it should be appreciated that in the previous example type inference is employed to infer type parameters, but programmers still had to write types for the result or left side of the expression. More complex type inference mechanisms could perform the inference on this side as well. For example, the compiler can determine that T:=int for the first argument and T:=string for the second argument and results in each case are the same. So, based on the type determination from the right side of the argument the type of the left side is able to be resolved. Hence, a programmer need not specify the result type and can write the arguments in the more concise format without any types as follows:
The actual method of type inference can get much more complicated than the simple examples provided thus far. For example, consider the following variable assignments:
The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. This summary is not an extensive overview of the invention. It is not intended to identify key/critical elements of the invention or to delineate the scope of the invention. Its sole purpose is to present some concepts of the invention in a simplified form as a prelude to the more detailed description that is presented later.
Briefly described, the subject invention concerns systems and methods for inferring types. In particular, the invention identifies several problems with conventional systems and provides novel and efficient solutions thereto.
According to one aspect of the invention, local type components are introduced to alias inferred types. Computer programming can be improved in many ways including but not limited to ease of use and conciseness, if inferred types are available for use. Conventionally, types are inferred by a compiler and stored as an internal type that is unrecognizable and inaccessible to programmers. Accordingly, the subject invention provides for an inference component and methodology that binds the internal type to a type component provided by a programmer. Hence, inferred types can now be utilized as regular types for example to annotate variables or utilize as a type parameter for generic methods, among other things.
Programmers do not always wish to utilize inferred types. Thus, it would be inefficient to generate type aliases constantly regardless of use. Therefore, type components can be omitted when they are not needed. This approaches what is conventionally accomplished. However, there are problems with the conventional technology that have gone unnoticed, particularly with respect to variable declarations. Thus, a new variable indicator is supplied to indicate when a new local variable is being declared, in accordance with another aspect of the subject invention. This indicator, possibly expressed as a keyword, provides clarity in light of much ambiguity. Without such an indicator and in accordance with conventional technologies uncertainty exists as to whether a new local variable is meant to be declared or whether a variable in scope is meant to be utilized. The new variable indicator solves this problem.
According to yet another aspect of the invention, a new more efficient type inference system and method are disclosed that infer and bind types to elements upon initial examination. Conventionally, once an element such as a variable is seen once by a compiler the type is not inferred and bound until the entire program block has been scanned to determine if there are additional declarations of the same variable, and if so a complicated type unification algorithm is employed. The subject system infers and binds the type upon initial examination and generates compile-time errors if the variable is reused in the context of a different type. However, the subject invention also contemplates identifying the errors at compile-time yet delaying errors to run-time.
To the accomplishment of the foregoing and related ends, certain illustrative aspects of the invention are described herein in connection with the following description and the annexed drawings. These aspects are indicative of various ways in which the invention may be practiced, all of which are intended to be covered by the present invention. Other advantages and novel features of the invention may become apparent from the following detailed description of the invention when considered in conjunction with the drawings.
The foregoing and other aspects of the invention will become apparent from the following detailed description and the appended drawings described in brief hereinafter.
The present invention is now described with reference to the annexed drawings, wherein like numerals refer to like or corresponding elements throughout. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed. Rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present invention.
As used in this application, the terms “component” and “system” are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a server and the server can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
Furthermore, the present invention may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer and implement the subject invention. The term “article of manufacture” (or alternatively, “computer program product”) as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. For example, computer readable media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), smart cards, and flash memory devices (e.g., card, stick). Additionally it should be appreciated that a carrier wave can be employed to carry computer-readable electronic data such as those used in transmitting and receiving electronic mail or in accessing a network such as the Internet or a local area network (LAN). Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope or spirit of the subject invention.
Turning initially to
In essence, type inference component 112 generates a type for every expression and sub-expression in a program from context data. As illustrated here, type inference component 112 can interact with a local program module 120 to infer type for expressions defined therein. However, when inference component 112 infers a type it produces an internally generated type 114 that can be used for type checking. The generated type is inaccessible to users and programmers and is stored internally with some obscure compiler generated name. Nevertheless, it would be beneficial if programmers were able to utilize these compiler-generated types. For instance, consider the following pseudo code:
Here, rather than returning a type directly like in the previous example there is another generic type, Collection<T>. Consequently, there can be a collection of T, which can be an array, a hash table, and list, among other things. Now assume the following local expression:
It should be noted that the present examples have been made simplistic for purposes of clarity. Thus, one could easily look at the provided examples and determine the type. However, the types can be arbitrarily large and complex such as a table of strings of lists of strings and integers, and the like. Therefore, programmers may not even be able to infer the type themselves easily. Moreover, there might not even be a way for programmers to specify the inferred type.
Consequently, one problem with pure type inference is that it is not possible to utilize the inferred type. Typical type inference only allows a variable or generic method type to be inferred. Thus, it is problematic if programmers want another variable of an inferred type, they need to pass the inferred type as a parameter to another generic method, or they need to have the type in hand in any way. In accordance with an aspect of the subject invention, this problem can be solved by introducing a type component 122.
Type component 122 acts as a local type alias. The inference component 112 binds or links an inferred and internally generated type 114 to the type component 122. Once declared, a type component 122 can be employed like a regular type. The type component 122 therefore bridges the worlds of fully explicitly typed languages and fully implicitly typed languages and hence is pivotal in providing expressive power to programmers. In conventional programming languages, which lack this mechanism, programmers are forced to either provide all their types explicitly or provide no type annotations at all.
Type component 122 provides a name for some type, which is bound to by the compiler 110 to a generated type 114 via inference component 112. The type component is specified by a programmer in a local program module or block 122. In order to identify the type component 122, an identifier can be associated therewith. The identifier tells the compiler 110 and more specifically the inference component 112 that a type component is being provided which should be bound to the inferred type of the element (e.g., variable) with which it is associated. Various mechanisms can be utilized as the identifier, such as the “type” keyword.
Consider the following more concrete example:
It should be appreciated that other identifiers and mechanisms can be utilized to introduce the type component to alias local types. One alternative is to introduce the type component by prefixing an identifier to the type component, for example employing a special symbol such as # at one or more defining occurrences. For example:
The type component can be utilized outside of the variable declaration context. For instance, the type component can be utilized as a type parameter or constructed type to name but a few examples. Case in point:
The type component 122 is beneficial in many ways; however, it is particularly useful when dealing with complex data types. As described above, inferred types can be quite complex such that it may not be easy or practically possible for programmers to infer and/or specify such types themselves. One area in which types become quite complex is queries. For instance:
From this, we know that this expression returns some collection of values that have a name and an age. For example, name can be of type string while age can be an integer. There are at least two problems associated with this example. First, one can appreciate how fast this type can become unmanageable. For example, the type could include name, age, date of birth, country, address, zip code, phone number, email, etc. The second problem, which is even worse, is that programmers may not even have a way to write such a type. Therefore, the result type is a collection of something, collection <T>, and it is known that the type T is defined as:
However, the actual type T is not known. In fact, during type inference this is a compiler-generated type 114, which is hidden from and inaccessible to programmers. In essence, the type inference component 112 will generate some type T, but the name of such type is not exposed and even if it were, it would be in an incomprehensible compiler format (e.g., T1034F6V). However, with the subject system 100 this is no longer a problem as the type component 122 can alias the compiler-generated type in friendly terms. For instance, the result can be written collection <#Q>. Now, a programmer can simply refer to the given type parameter “Q” rather than the hidden obscure generated type 114. Then, the type can be easily employed, for example:
It should be noted and appreciated that the compiler 110 utilizes the inference component 112 to infer types and bind them to type components 122. The scope of the type component 122 that aliases an inferred type is local. The type component 122 resides in a local program module or block 120. The scope of the type component 122 can therefore be limited to that block or module similar to the scope of a local constant or variable declaration. Of course, it is possible to have different scoping rules for type component aliases than for local variable or local constant declarations.
Local type alias components can be bound wherever types are inferred. Consider, for instance, the following local variable declarations:
In this first example, the type component alias T is consistently bound to integer so the type check component 210 would not generate an error.
As per this second example, type aliases S and T are both bound to R and hence S=T are equal and simply another alias for R.
In the above example, the type alias T is inconsistently bound to both int and bool and hence this would lead to compile-time error generated by the type checker component 210.
Conventional type aliasing rules imply it is not possible to bind type aliases, for example, in the context of inferring type parameters of a generic method. It should be noted, however, that it is in fact possible to devise alternative rules that would allow local type aliases to be bound even in the context of generic method type parameters. According to an aspect of the subject invention, inference rules 212 are provided for binding type component aliases to types or leaving them unbound. Furthermore, at the expense of added complexity, more liberal rules can be utilized to allow type aliases to be bound to other type aliases as well as for allowing constructed types to include type aliases. An exemplary set of rules 212 are provided hereinafter.
The rules 212 for inferring the type of a local variable declaration P x=e follow the same rules 212 as type inference for generic method invocations, but again, another set of rules 212 can be employed to compute a set of bindings for local type aliases from a declared type and a derived type. Assume that the local variable expression e the type A where all type aliases Tx that appear in A have been replaced by their bound type Sx given the currently computed set of substitutions Tx:=Sx, and that the declared type of variable x is type P. Type inference can operate on the types A and P according to the following steps and produces a set of new bindings Tx:=Sx, where Tx is a type alias and Sx is a type that does not contain any type aliases. Nothing is inferred from the initializer expression e, but type inference succeeds with the empty binding set if any of the following are true: (1) P does not involve any local type alias, or P is equal to A; (2) the initializer expression e is the null literal; (3) the initializer expression e is an anonymous method; and (4) the initializer expression e is a method group. Furthermore, if P is a local type alias, and A does not contain any local type aliases, the type inference succeeds for this declaration with the substitution P:=A. If P is an array type and A is an array type of the same rank, then replace A and P respectively with the element types of A and P and repeat the step. If P is a constructed type, and A does not contain any local type aliases, and if, for each local type alias Tx that occurs in P, exactly one type Sx can be determined such that replacing each Tx with each Sx produces a type to which A is convertible by standard implicit conversion, then inferencing succeeds for this local variable declaration with the substitution set Tx:=Sx. Otherwise, the type inference fails.
If the local variable declaration in a block is passes through the above rules 212 with success, then all inferences that were produced from the previous local variable declaration can be pooled. This pooled set of inferences must then have the following properties. If the type alias occurred more than once, then all of the inferences for that type alias must bind to the same type. In short, the set of inferences must be consistent. At any given point in the block where the type bound to a local type alias is needed (e.g., for overloading resolution, in the derived type of a variable initializer, . . . ) the type alias should have been bound. This ensures that an unbound alias is never bound to another alias. The example below is alright because the type alias T is bound at the point overloading resolution is applied in the Console.WriteLine statement:
The following example leads to a compile-time error since type alias T would be bound to another type alias S instead of a type:
There may be times, however, where type component aliases are not needed because the type is not going to be used again. In other words, a programmer wants to utilize type inference on an expression, but they are never going to employ the inferred type, for example, as a type parameter. Turning to
To truly understand and appreciate the subject system 300, it is necessary to understand one of the problems solved by it. Consider the following pseudo code for example:
Here, there is a class x with a local variable S1 defined as type integer. Within the scope of this variable, in F( ), variable S1 is again employed and assigned to some expression. Accordingly, ambiguity arises concerning whether a programmer meant to introduce a new local variable or whether he/she meant to assign to the local variable previously declared. Therefore, the compiler 110 does not know whether to infer a new type. If the type were given, for example, Bool S1=expression then the compiler will recognize that this is a new local variable, however this is how it is done without type inference. Hence, just leaving out the type is not is not good enough in the case of type inference, because the inference component 112 cannot distinguish between creating a new local variable and assigning to something in scope. Accordingly, the subject invention provides for a new variable indicator component 310 associated with a variable in a variable expression of a local program module 120. In accordance with an aspect of the invention, the new variable indicator component 310 can include a keyword including but not limited to var, let, or dim. For example:
Here, the new variable indicator component 310 represented as the keyword Var in the above pseudo code informs the compiler 110 that the S1 in the function F( ) is a new variable distinct from the other variable S1 in scope. In accordance therewith, the inference component 112 can infer the type from the local context, namely expression.
To summarize what as been presented thus far, in system 100 of
Conventional technology infers types in a complicated and inefficient manner. In particular, the technology infers the most general type of a plurality of assignments. By way of example, assume that the following variable assignments:
The subject invention addresses this problem by binding the first element to an inferred type. If the inference component 112 encounters the same element it should be bound to the same type or the component 112 will generate a compiler-time error. This is a more efficient approach than is conventionally known and does not blow up in terms of inference time. It should be noted that the conventional inference technology can break down to a scenario that superficially resembles the subject invention. For example, if there is one a single variable declaration in a local programming block such as x=“hello.” Here, conventional technology will not immediately infer and bind string type to x as the subject invention, but rather would scan the entire local code section to determine if there are additional instances of the variable x such that a super type can be calculated. After not locating a variable x with a different type, the conventional technology would only then infer and bind string type to x. The subject invention would infer and bind the type to x as soon as it is encountered and return an error if later it is found that the same variable is to be bound to a different type. In essence, the in system 400 is much more efficient. Furthermore, it should be appreciated that conventional languages that employ type inferences up to the time of this invention do not employ subtypes but rather utilize a lengthy and time-consuming unification calculation to determine the most general type.
In view of the exemplary systems described supra, a methodology that may be implemented in accordance with the present invention will be better appreciated with reference to the flow charts of
Additionally, it should be further appreciated that the methodologies disclosed hereinafter and throughout this specification are capable of being stored on an article of manufacture to facilitate transporting and transferring such methodologies to computers. The term article of manufacture, as used, is intended to encompass a computer program accessible from any computer-readable device, carrier, or media.
Turning to
Turning to
Throughout this detailed description, generation of errors has been described specifically in the context of compile-time errors. It is often advantageous to locate errors at compile time so that such errors can be remedied early in the developmental process. It should be appreciated, however, that the subject invention also contemplates generating run-time errors even though the system could identify them at compile time. In essence, detected compile-time errors can be delayed until run time. To enable such functionality, a flag can be set, for example, in the type checker component to specify when such errors are to be delayed. This provides additional flexibility with respect to when such errors are to be addressed.
In order to provide a context for the various aspects of the invention,
With reference to
The system bus 818 can be any of several types of bus structure(s) including the memory bus or memory controller, a peripheral bus or external bus, and/or a local bus using any variety of available bus architectures including, but not limited to, 11-bit bus, Industrial Standard Architecture (ISA), Micro-Channel Architecture (MSA), Extended ISA (EISA), Intelligent Drive Electronics (IDE), VESA Local Bus (VLB), Peripheral Component Interconnect (PCI), Universal Serial Bus (USB), Advanced Graphics Port (AGP), Personal Computer Memory Card International Association bus (PCMCIA), and Small Computer Systems Interface (SCSI).
The system memory 816 includes volatile memory 820 and nonvolatile memory 822. The basic input/output system (BIOS), containing the basic routines to transfer information between elements within the computer 812, such as during start-up, is stored in nonvolatile memory 822. By way of illustration, and not limitation, nonvolatile memory 822 can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory 820 includes random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as synchronous RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and direct Rambus RAM (DRRAM).
Computer 812 also includes removable/non-removable, volatile/non-volatile computer storage media.
It is to be appreciated that
A user enters commands or information into the computer 812 through input device(s) 836. Input devices 836 include, but are not limited to, a pointing device such as a mouse, trackball, stylus, touch pad, keyboard, microphone, joystick, game pad, satellite dish, scanner, TV tuner card, digital camera, digital video camera, web camera, and the like. These and other input devices connect to the processing unit 814 through the system bus 818 via interface port(s) 838. Interface port(s) 838 include, for example, a serial port, a parallel port, a game port, and a universal serial bus (USB). Output device(s) 840 use some of the same type of ports as input device(s) 836. Thus, for example, a USB port may be used to provide input to computer 812, and to output information from computer 812 to an output device 840. Output adapter 842 is provided to illustrate that there are some output devices 840 like displays (e.g., flat panel and CRT), speakers, and printers, among other output devices 840, that require special adapters. The output adapters 842 include, by way of illustration and not limitation, video and sound cards that provide a means of connection between the output device 840 and the system bus 818. It should be noted that other devices and/or systems of devices provide both input and output capabilities such as remote computer(s) 844.
Computer 812 can operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 844. The remote computer(s) 844 can be a personal computer, a server, a router, a network PC, a workstation, a microprocessor based appliance, a peer device or other common network node and the like, and typically includes many or all of the elements described relative to computer 812. For purposes of brevity, only a memory storage device 846 is illustrated with remote computer(s) 844. Remote computer(s) 844 is logically connected to computer 812 through a network interface 848 and then physically connected via communication connection 850. Network interface 848 encompasses communication networks such as local-area networks (LAN) and wide-area networks (WAN). LAN technologies include Fiber Distributed Data Interface (FDDI), Copper Distributed Data Interface (CDDI), Ethernet/IEEE 1102.3, Token Ring/IEEE 1102.5 and the like. WAN technologies include, but are not limited to, point-to-point links, circuit-switching networks like Integrated Services Digital Networks (ISDN) and variations thereon, packet switching networks, and Digital Subscriber Lines (DSL).
Communication connection(s) 850 refers to the hardware/software employed to connect the network interface 848 to the bus 818. While communication connection 850 is shown for illustrative clarity inside computer 812, it can also be external to computer 812. The hardware/software necessary for connection to the network interface 848 includes, for exemplary purposes only, internal and external technologies such as, modems including regular telephone grade modems, cable modems, power modems and DSL modems, ISDN adapters, and Ethernet cards.
What has been described above includes examples of the present invention. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the present invention, but one of ordinary skill in the art may recognize that many further combinations and permutations of the present invention are possible. Accordingly, the present invention is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the term “includes” is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.