In certain programming languages that are not strongly typed, i.e., a variable is not constrained to have associated with it only one clearly defined type of data, such as a character, string, integer, floating point number, and/or Boolean value, a function or operation may have associated with it a first behavior if a variable that is an argument of the function is of a first data type and a second behavior if the variable is instead of a second data type. For example, in JavaScript, which is not strongly typed, the “+” operator when applied to two numbers results in the numbers being summed whereas if the operator is applied to two strings the strings are concatenated. That is, in JavaScript 2+3 yields 5, the sum of the numbers 2 and 3, whereas “two”+“three” yields “twothree”, i.e., the two strings concatenated together. In JavaScript, the “+” operator may be used within the definition of a function, e.g., function b(m,n) {return m+n}, and the operator will result in the arguments being summed if they are numbers or concatenated if they are strings, for example, as determined dynamically at runtime. “Length” is another example of a function or operation that in JavaScript exhibits variable type dependent behavior. The statement “x.length” in JavaScript results in a first behavior if the variable x is a string and a second behavior that is different from the first if the variable x is not a string, e.g., an array.
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
Variable type knowledge based call specialization is disclosed. In some embodiments, during compilation of code written in a loosely typed or un-typed programming language, such as JavaScript, where the behavior of one or more operations or functions varies based on the data type of one or more arguments on which the operation or function operates, a determination is made as to which one or more data types actually are or would be encountered in the course of execution of the associated code. For each such data type, code corresponding to a behavior associated with a data type determined to be associated with the argument(s) at a call site at which the operation or function is invoked is generated. For example, in the example described above, in some embodiments in the case of a “+” operator used in JavaScript, for example in a function definition, a determination is made as to the data type(s) of the arguments of the operation at the call site, i.e., the portion of the code at which the “+” operator is invoked. If, for example, analysis of the source code or an intermediate representation thereof, such as LLVM intermediate representation (IR) or bytecode, indicates that the data type of the arguments (e.g., variables) to which the operator is to be applied will be one or the other of a number and a string, machine (or intermediate) code is generated only for the behavior corresponding to data of that type, for example summing if it is determined the variables will always be numbers or concatenating if it is determined that the variables will always be strings. In some embodiments, the data type is determined based on an analysis of the source code and/or an intermediate representation thereof, such as LLVM IR, bytecode, or another similar representation. In some embodiments, the determination is made by observing the JavaScript or other code as executed by an interpreter and noting the data type of the variables and/or the type-dependent behavior of the operator or function as observed at runtime.
Using the approaches described herein optimized machine code that does not include unneeded and costly conditional statements to determine variable data type at runtime and provide data type-dependent behavior at runtime can be avoided. Instead, where the data type of variables or other arguments can be determined a priori, for example by analyzing source code and/or an intermediate representation thereof, or by observing the behavior of code such as JavaScript as executed by an interpreter, machine code to implement only a behavior corresponding to a predetermined data type is generated, eliminating the need to implement and evaluate at run time conditional statements and the needless generation of machine code to implement behaviors associated with data types that will never be encountered.
CPU 702 is coupled bi-directionally with memory 710 which can include a first primary storage, typically a random access memory (RAM), and a second primary storage area, typically a read-only memory (ROM). As is well known in the art, primary storage can be used as a general storage area and as scratch-pad memory, and can also be used to store input data and processed data. It can also store programming instructions and data, in the form of data objects and text objects, in addition to other data and instructions for processes operating on CPU 702. Also as well known in the art, primary storage typically includes basic operating instructions, program code, data and objects used by the CPU 702 to perform its functions. Primary storage devices 710 may include any suitable computer-readable storage media, described below, depending on whether, for example, data access needs to be bi-directional or uni-directional. CPU 702 can also directly and very rapidly retrieve and store frequently needed data in a cache memory (not shown).
A removable mass storage device 712 provides additional data storage capacity for the computer system 700, and is coupled either bi-directionally (read/write) or uni-directionally (read only) to CPU 702. Storage 712 may also include computer-readable media such as magnetic tape, flash memory, signals embodied on a carrier wave, PC-CARDS, portable mass storage devices, holographic storage devices, and other storage devices. A fixed mass storage 720 can also provide additional data storage capacity. The most common example of mass storage 720 is a hard disk drive. Mass storage 712, 720 generally store additional programming instructions, data, and the like that typically are not in active use by the CPU 702. It will be appreciated that the information retained within mass storage 712, 720 may be incorporated, if needed, in standard fashion as part of primary storage 710 (e.g. RAM) as virtual memory.
In addition to providing CPU 702 access to storage subsystems, bus 714 can be used to provide access other subsystems and devices as well. In the described embodiment, these can include a display monitor 718, a network interface 716, a keyboard 704, and a pointing device 706, as well as an auxiliary input/output device interface, a sound card, speakers, and other subsystems as needed. The pointing device 706 may be a mouse, stylus, track ball, or tablet, and is useful for interacting with a graphical user interface.
The network interface 716 allows CPU 702 to be coupled to another computer, computer network, or telecommunications network using a network connection as shown. Through the network interface 716, it is contemplated that the CPU 702 might receive information, e.g., data objects or program instructions, from another network, or might output information to another network in the course of performing the above-described method steps. Information, often represented as a sequence of instructions to be executed on a CPU, may be received from and outputted to another network, for example, in the form of a computer data signal embodied in a carrier wave. An interface card or similar device and appropriate software implemented by CPU 702 can be used to connect the computer system 700 to an external network and transfer data according to standard protocols. That is, method embodiments of the present invention may execute solely upon CPU 702, or may be performed across a network such as the Internet, intranet networks, or local area networks, in conjunction with a remote CPU that shares a portion of the processing. Additional mass storage devices (not shown) may also be connected to CPU 702 through network interface 716.
An auxiliary I/O device interface (not shown) can be used in conjunction with computer system 700. The auxiliary I/O device interface can include general and customized interfaces that allow the CPU 702 to send and, more typically, receive data from other devices such as microphones, touch-sensitive displays, transducer card readers, tape readers, voice or handwriting recognizers, biometrics readers, cameras, portable mass storage devices, and other computers.
In addition, embodiments of the present invention further relate to computer storage products with a computer readable medium that contains program code for performing various computer-implemented operations. The computer-readable medium is any data storage device that can store data which can thereafter be read by a computer system. The media and program code may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known to those of ordinary skill in the computer software arts. Examples of computer-readable media include, but are not limited to, all the media mentioned above: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as floptical disks; and specially configured hardware devices such as application-specific integrated circuits (ASICs), programmable logic devices (PLDs), and ROM and RAM devices. The computer-readable medium can also be distributed as a data signal embodied in a carrier wave over a network of coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion. Examples of program code include both machine code, as produced, for example, by a compiler, or files containing higher level code that may be executed using an interpreter.
The computer system shown in
Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.