This document generally relates to interpretive environments, and more particularly, to scripting languages used in interpretive environments.
In general, there are two types of code: compiled code and interpreted code. Compiled code has two general categories. The first category compiles source code into object code and then links one or more object codes together to create an executable, which is executed at run-time. The second category compiles the source code into an intermediate language, which under-goes just-in-time compilation at run-time to create native code that is executed. Hereinafter, the executable and the native code are both referred to as executable code.
In order to assure that the executable code does not harm a computer system, the executable code and its associated source code undergo extensive testing and review. While this minimizes the amount of executable code that is harmful, some harmful executable code still exists. Some of the harmful executable code is unforeseen, while other harmful executable code is specifically written to cause harm to computer systems (e.g., viruses, worms, and Trojan Horse attacks). Fortunately, several mechanisms have been developed to further minimize the risk of harmful code. One mechanism is a feature provided by a processor for recognizing pages in memory as executable or non-executable. Because the operating system directly configures the executable code in memory, the operating system may mark certain memory pages as executable and others as non-executable. Then, when an instruction in the executable code attempts to execute code in the memory page marked as non-executable, the processor throws an exception. This prevents stray pointers and malicious code from harming the information in the memory that is marked as non-executable.
Unfortunately, this feature of the processor is not available when processing interpreted code. Interpreted code is processed at run-time via an interpreter. The interpreter is responsible for processing the interpreted code into commands that the processor can execute. Conceptually, the interpreter operates in a serial manner, inputting a string and interpreting the string into a command. The command is associated with a set of executable instructions that perform the command when executed by the processor. Because the operating system does not manage the memory for interpreted code, the operating system is unable to mark pages in memory as non-executable or executable. From the processor's perspective, it is executing the interpreter software module (i.e., the interpreter) that has been loaded into memory and is being managed by the operating system. The interpreter software module is responsible for processing the interpreted code. In other words, the interpreted code is viewed as “data” to the processor. Thus, security problems arise when interpreted code (e.g., a script) contain “data” that is interpreted into harmful commands (e.g., format c:).
While there are various ways in which a harmful command may be “inserted” into an otherwise useful and harmless script, one way is via an input file. For example, a script may input a text file containing several lines. Each line may list a user's name. The script may then specify a command using each user's name. In this example, the harmful command may be “inserted” by editing one of the lines in the script and appending a malicious string (e.g., format c:) after one of the user's names. Because the interpreter “interprets” its input into commands, the interpreter will interpret the malicious string into the “correct”, but harmful, command. Then, when the “correct” command is executed, undesirable and/or harmful actions occur.
One way to minimize the security problems associated with scripts is to have the scripts and any data that is input into the script undergo a formal review and testing procedure similar to source code for compiled code. However, this solution is not ideal, and may not even be attainable.
Thus, until now, an adequate solution for minimizing security problems with scripts in an interpretive environment has eluded those skilled in the art.
The techniques and mechanisms described herein are directed to a scripting security mechanism that minimizes security risks associated with interpreting a script written with a scripting language. The scripting security mechanism includes a scripting-language syntax for designating code and data. The scripting-language syntax includes a data construct for designating data within the script. When the interpreter encounters the data construct within the script, the interpreter interprets information associated with the data construct using a subset of the total operations available to the interpreter. By allowing the information to be interpreted using the subset of operations, the interpreter reduces the likelihood that the information associated with the data construct will cause harm to a computer system. Thus, the entire script becomes more robust and secure.
The scripting-language syntax may include an export option for selecting which variables to export from the data construct. The export option aids in preventing malicious attacks that overwrite existing variables in the operating environment.
The scripting-language syntax may also include a constraint option for specifying a constraint for any of the variables that are exported from the data construct. The constraint options further restrict the output from the data construct.
Non-limiting and non-exhaustive embodiments are described with reference to the following figures, wherein like reference numerals refer to like parts throughout the various views unless otherwise specified.
Briefly, the present mechanism and technique minimizes the security risks associated with interpreting a script written with a scripting language. The scripting security mechanism includes a scripting-language syntax for designating code and data. The scripting-language syntax includes a data construct for designating data within the script. When the interpreter encounters the data construct within the script, the interpreter interprets information associated with the data construct using a subset of the total operations available to the interpreter. By allowing the information to be interpreted using the subset of operations, the interpreter reduces the likelihood that the information associated with the data construct will cause harm to a computer system. Thus, the entire script becomes more robust and secure. These and other advantages will become clear after reading the following detailed description.
Exemplary Computing Environment
The various embodiments of the present scripting security mechanism may be implemented in different computer environments. The computer environment shown in
With reference to
Computing device 100 may have additional features or functionality. For example, computing device 100 may also include additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated in
Computing device 100 may also contain communication connections 116 that allow the device to communicate with other computing devices 118, such as over a network. Communication connection(s) 116 is one example of communication media. Communication media may typically be embodied by computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. The term computer-readable media as used herein includes both storage media and communication media.
Various modules and techniques may be described herein in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. for performing particular tasks or implement particular abstract data types. These program modules and the like may be executed as native code or may be downloaded and executed, such as in a virtual machine or other just-in-time compilation execution environment. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
An implementation of these modules and techniques may be stored on or transmitted across some form of computer readable media. Computer readable media can be any available media that can be accessed by a computer. By way of example, and not limitation, computer readable media may comprise “computer storage media” and “communications media.”
Interpreter 204 (also commonly referred to as a script engine) may be one or more software modules implemented within the operating system 105 illustrated in
However, by implementing the present scripting-security mechanism, the risk of executing harmful commands within script 202 may be significantly reduced. The scripting-security mechanism includes a scripting-security syntax written within script 202. Briefly, the scripting-security syntax, described in detail below in conjunction with
The scripting-language syntax provides a data construct that includes a data label (e.g., data label 310 and 320). Data labels 310 and 320 illustrated in
Data blocks 316 and 326 include one or more lines of script designated as data. While the lines of script within data blocks 316 and 326 may have the same text as lines within traditional script lines 302 and 304, the interpreter processes the lines within data blocks 316 and 326 using a more restrictive set of operations. The restrictive set of operations is defined within interpreter 204 and act as a default set of operations that are allowed within data blocks. Typically, the default set of operations do not include operations that affect processes, system state, and the like. However, as will be described below, the default set of operations may be modified to allow additional operations within a particular data block. Thus, the present scripting-security mechanism provides flexibility to the script writer, while still providing a higher degree of security than in the past. As shown, there may be one or more data blocks within the script. Each data block may specify their own output and processing requirements. Alternatively, the output and/or processing requirements may be specified at a global level as a policy.
Lines 330, 332, 340, and 342 are now compared with respect to how interpreter 204 processes them. Lines 330, 332, and 342 are within the traditional lines of script 302, while line 340 is within data block 310. Lines 332 and 342 are identical in appearance. However, the processing of each of these lines is remarkably different. If line 342 had been included within the script, an error would have occurred because line 342 may run code of a constructor for a User object which could potentially execute malicious and/or harmful code. While a typical script may not have identical lines, identical lines aid in illustrating the difference in processing between them. As shown, both lines create a new user using information assigned to a variable (e.g., new user $y). For a traditional interpreter, the variable $y is assigned by coping the entire contents of “file” in line 330. In contrast, for the present interpreter that implements the scripting security mechanism, the variable $y is assigned via a return value from a datablock (e.g., Datablock 310), where within the datablock the contents of “file” was included. Continuing with the example described above, assume that one of the lines within the input file at lines 330 and 340 has been appended with a malicious string, such as “format c:”. Line 330, processed in the traditional manner, will interpret the malicious string into the “correct”, but harmful, command, which gets executed. In contrast, line 340, processed in accordance with the present security-scripting technique, will also interpret the malicious string into the “correct”, but harmful, command. However, because this “correct” command is not one of the allowed operations within a data block, the “correct” command will not be executed. In other words, because line 340 is within a datablock, the interpreter will not allow line 342 to receive the potentially harmful command. Thus, anything included from “file” is restricted to be data. The security-scripting technique is described in further detail below in conjunction with
This security-scripting technique provides several advantages to a script writer. First, the script writer does not need to expend effort on reviewing the input file or adding checks within the script to attempt to catch any harmful code trying to pass off as data. Secondly, the script writer can be assured that the script will not execute harmful code if the harmful code appears within the data block.
It is envisioned that the use of constraints and/or the specification of actions to take upon a constraint violation may be used in many different scenarios. For example, in one scenario, a system administrator may use it for a confidence check of the script. In this scenario, the number of data elements processed by the script provides a level of confidence to the administrator that the script performed correctly. If the system administrator sees that the script processed a certain number of data elements, the system administrator feels relatively confident that the script performed correctly. However, if unusually large or small numbers of data elements are processed, the system administrator may want to investigate the reason for the unexpected results. Using the present security mechanism, the system administrator may set a constraint on the variable that keeps a tab on the number of data elements that are processed. The constraint may specify a certain range for the variable. During the interpretation of the script, if the interpreter does not obtain a number within the defined constraint, the interpreter may throw an exception, generate an error, request confirmation to proceed or stop, and the like.
In another embodiment, the scripting-language syntax may allow the default set of operations to be modified for a particular data block. The following is an exemplary syntax for modifying the default set of operations:
[OperationsAllowed(Literals, ConstantExpressions)]
datablock Foo { . . . }
The interpreter will then restrict the data block to only allow literals and constant expressions.
At block 602, a line from the script is retrieved. Retrieving the line is performed in any well known manner. Processing continues at decision block 604.
At decision block 604, a determination is made whether the line contains the scripting-security syntax in accordance with the present scripting security mechanism. As mentioned earlier, the scripting-security syntax may take several different forms. For example, the syntax may include a label, such as “DataBlock”, “Data”, and the like. It also may include a start designation and an end designation, such as “{” and “}”, respectively. If the line does not contain the label for the scripting-security syntax, processing continues at block 606.
At block 606, the line is processed without the application of security mechanisms. At decision block 608, a determination is made whether there is another line within the script. If there is another line, processing loops back to block 602 to get another line. Otherwise, processing is complete. As one can see from the flow diagram, if the script does not contain the scripting-security syntax, the script is processed as it had been processed in the past. However, if the scripting-security syntax is identified in the line at block 604, processing continues as decision block 610.
At decision block 610, a determination is made whether the line is allowed to process. As mentioned earlier, a subset of the total operations are allowed to be performed on the lines in the data block. The subset may be defined to disallow deletion of files, changing of specific state variables, and the like. This prevents the execution of unexpected executable code. However, some executable code that will not harm the computing system when executed may be included within the subset of operations, such as $A=32. If the line is associated with an operation not defined within the subset of operations, processing continues at block 612.
At block 612, an indication is output. The indication indicates that the line attempted to execute code that was not authorized within the data block. The indication may be an error message, stopping the processing of the script, and/or the like. Alternatively, the indication may be a message that requires input for either proceeding with the script or exiting the script. If the input to the message indicates proceeding with the script, processing continues at block 614. Otherwise, processing is ended for the script.
At block 614, the line is processed. For the line that includes the label this may involve identifying export variables and any constraints associated with the export variables. For other lines that are processed within the data block, the variables that are used during the processing of the line are not immediately exported. Instead, as will be described below in conjunction with block 624, only the variables specified for export are exported. Processing continues at block 616.
At block 616, the next line in the script is retrieved. Each line within the data block is processed using the subset of operations defined. This ensures that malicious executable code will not be executed, and if an attempt is made to execute code that is not allowed, the administrator will be alerted. Processing continues at decision block 618.
At decision block 618, a determination is made whether the next line includes an indication for the end of the data block. In the case that the next line has a command along with the end indication, the process performs block 614 on the command portion and then exits the processing within the data block by proceeding to decision block 620. If the next line does not include the end indication, processing loops back to decision block 610 and proceeds as described above.
At decision block 620, a determination is made whether each of the variables that had constraints defined for it were within the constraints specified. If one or more of the variables did not meet their constraints, processing continues at block 622.
At block 622, an indication that one or more of the variables did not meet their constraints is output. Similar to the indication described in block 612, the indication may be an error message, stopping the processing of the script, and/or the like. Alternatively, the indication may be a message that requires input for either proceeding with the script or exiting the script. If the input to the message indicates proceeding with the script, processing continues at block 624. Otherwise, processing is ended for the script.
At block 624, the variables defined for exporting are exported. For example, system state variables may be updated with new values and the like. Processing then continues at decision block 608 to check if there are more lines to process in the script. Processing then continues as described above.
As one can easily imagine, there may multiple data blocks within one script. Alternatively, the entire script may be written within one data block. By utilizing the present scripting security mechanisms in a script, an administrator responsible for running the script may be provided information alerting him to potential corrupt data, possible security problems within the script, and the like, thus, providing another level of security when running scripts.
The present scripting security mechanism provides many advantages to script writers. The scripting security mechanism is especially important when the script operates on large data sets that do not lend themselves to be conveniently inspected for potentially harmful instructions. By utilizing the present scripting security mechanism, the script writer may be provided some assurance that the script will not execute harmful instructions. Thus, the script writer does not have to inspect the code within the data block.
Using the above teachings, the present scripting security mechanisms may be implemented in different interpretive environments by those skilled in the art. Each of the interpretive environments can then achieve the advantages outlined above. For example, the scripting security mechanism may be implemented within the MONAD shell developed by the Microsoft Corporation of Redmond, Wash.
Reference has been made throughout this specification to “one embodiment,” “an embodiment,” or “an example embodiment” meaning that a particular described feature, structure, or characteristic is included in at least one embodiment of the present invention. Thus, usage of such phrases may refer to more than just one embodiment. Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
One skilled in the relevant art may recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, resources, materials, etc. In other instances, well known structures, resources, or operations have not been shown or described in detail merely to avoid obscuring aspects of the invention.
While example embodiments and applications have been illustrated and described, it is to be understood that the invention is not limited to the precise configuration and resources described above. Various modifications, changes, and variations apparent to those skilled in the art may be made in the arrangement, operation, and details of the methods and systems of the present invention disclosed herein without departing from the scope of the claimed invention.