The present disclosure generally relates to the field of computer architecture, and more particularly, to a method and an apparatus for processing self-modifying codes.
Self-modifying codes may refer to a set of computer codes that modifies itself while being executed by a computer processor. Self-modifying codes are widely used for run-time code generation (e.g., during Just-In-Time compilation). Self-modifying codes are also widely used for embedded applications to optimize memory usage during the execution of the codes, thereby improving code density.
Software codes 102 include a self-modifying code section 106, which includes a “memcpy old_code, new_code, size” (memory copy) instruction and a “jmp old_code” (jump) branching instruction. The execution of the “memcpy” instruction of self-modifying code section 106 can cause the computer processor to acquire data from the “new_code” memory location, and store the acquired data at “old_code” memory location. After executing the “memcpy” instruction, at least a part of software codes 102 stored at the “old_code” memory location can be overwritten with software codes 104. Moreover, the execution of the “jmp old_code” branching instruction of self-modifying code section 106 also causes the computer processor to acquire and execute software codes stored at a target location, in this case the “old_code” memory location. As discussed above, the software codes at the “old_code” memory location have been updated with software codes 106. Therefore, at least a part of software codes 102 are modified as computer processor executes the software codes, hence the software codes are “self-modifying.”
To reduce the effect of memory access latency, a computer processor typically employs a pre-fetching scheme, in which the computer processor pre-fetches a set of instructions from the memory, and stores the pre-fetched instructions in an instruction fetch buffer. When the computer processor needs to execute an instruction, it can acquire the instruction from the instruction fetch buffer instead of from the memory. Instruction fetch buffer typically requires shorter access time than the memory. Using the illustrative example of
Self-modifying codes can create a pipeline hazard for the aforementioned pre-fetching scheme, in that the assumption of the execution sequence of the instructions, based on which a set of instructions are selected for pre-fetching, is no longer valid following the modification to the codes. As a result, the instruction fetch buffer may pre-fetch incorrect instructions and provide incorrect instructions for execution. This can lead to execution failure and add to the processing delay of the computer processor. Therefore, to ensure proper and timely execution of the modified software codes, the computer processor needs to be able to detect the modification of the software codes, and to take measures to ensure that the instruction fetch buffer pre-fetches a correct set of instructions after the software codes are modified.
Embodiments of the present disclosure provide a method for handling self-modifying codes, the method being performed by a computer processor and comprising: receiving a fetch block of instruction data from an instruction fetch buffer; before transmitting the fetch block of instruction data to a decoding unit of the computer processor, determining whether the fetch block includes instruction data of self-modifying codes; responsive to determining that the fetch block includes instruction data of self-modifying codes, transmitting a flush signal to reset one or more internal buffers of the computer processor.
Embodiments of the present disclosure also provide a system comprising a memory that stores instruction data, and a computer processor being configured to process the instruction data. The processing of the set of instructions comprises the computer processor being configured to: acquire a fetch block of the instruction data from an instruction fetch buffer; before transmitting the fetch block of instruction data to a decoding unit, determine whether the fetch block of the instruction data contain self-modifying codes; responsive to determining that the fetch block of the instruction data contain self-modifying codes, reset one or more internal buffers of the computer processor.
Embodiments of the present disclosure also provide a computer processor comprising: a branch prediction buffer configured to store a pairing between an address associated with a predetermined branching instruction and a target address of a predicted taken branch; an instruction fetch buffer configured to store instruction data prefetched from a memory according to the pairing stored in the branch prediction buffer; an instruction fetch unit configured to: receive a fetch block of instruction data from the instruction fetch buffer; before transmitting the fetch block of instruction data to a decoding unit of the computer processor, determine, based on information stored in at least one of the branch prediction buffer and the instruction fetch buffer, whether the fetch block includes instruction data of self-modifying codes; and responsive to determining that the fetch block includes instruction data of self-modifying codes, transmitting a flush signal to reset one or more internal buffers of the computer processor.
Additional objects and advantages of the disclosed embodiments will be set forth in part in the following description, and in part will be apparent from the description, or may be learned by practice of the embodiments. The objects and advantages of the disclosed embodiments may be realized and attained by the elements and combinations set forth in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosed embodiments, as claimed.
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise represented. The implementations set forth in the following description of exemplary embodiments do not represent all implementations consistent with the invention. Instead, they are merely examples of apparatuses and methods consistent with aspects related to the invention as recited in the appended claims.
Embodiments of the present disclosure provide a method and an apparatus for handling self-modifying codes. With an embodiment of the present disclosure, instructions of self-modifying codes can be detected from pre-fetched instruction data, before the instruction data are forwarded for decoding and execution. As a result, the likelihood of identifying and executing incorrect instructions due to the aforementioned pipeline hazards caused by self-modifying codes can be mitigated. Moreover, corrective actions can also be taken when the pipeline hazards are detected before the pre-fetched instructions are decoded and executed, thereby incorrect decoding result can be prevented from propagating through the pipeline. As a result, proper and timely execution of the modified software codes can be ensured.
Reference is now made to
Computer processor 202 further includes a processing pipeline for acquiring and executing the instructions in stages. As shown in
Instruction fetch unit 203 can acquire the instructions for execution in binary form and extract information used for decoding the instructions. The information may include, for example, a length of the instructions. In a case where the instructions have variable lengths (e.g., the instructions being a part of the Intel x86 instruction set), the instruction length information may be needed to identify the instructions. In some cases, the instruction length information can be determined based on the first byte of instruction data. As an illustrative example, if instruction fetch unit 203 identifies from the instruction data an escape byte, which is associated with the hexadecimal value of 0x0F, instruction fetch unit 203 may determine that at least the subsequent byte of data corresponds to an opcode, which may indicate that the instruction length is at least two bytes. Moreover, instruction fetch unit 203 may also extract different fields for an instruction, and based on the values of these fields, determine whether additional bytes are needed to determine the instruction length. As an illustrative example, for an Intel x86 instruction, instruction field unit 203 may extract the values for fields such as the Mod field and R/M field of the ModR/M byte, and based on the values of these fields, determine whether additional data (e.g., SIB byte) is needed to determine the instruction length.
Instruction fetch unit 203 can then transmit the information, including the instruction length, to instruction decode unit 206, which uses the information to identify the instruction. Based on an output of instruction decode unit 206, instruction execution unit 208 can then perform the operation associated with the instruction. Memory access unit 210 may also be involved in accessing data from memory system 220 and providing the data to instruction execution unit 208 for processing. Write back unit 212 may also be involved in storing a result of processing by instruction execution unit 208 in a set of internal registers (not shown in
The acquisition of an instruction by instruction fetch unit 203 can be based on an address stored in a program counter 204. For example, when computer processor 202 starts executing the first instruction of software codes 102, program counter 204 may store a value of 0x00, which is the memory address of the first instruction of software codes 102 (“xorl %eax, %eax). The program counter value can also be used for pre-fetching a set of instructions. For example, if the instructions are expected to be executed sequentially following the order by which they are stored in the memory system 220, instruction fetch unit 203 can acquire a set of consecutive instructions stored at a memory address indicated by program counter 204. Typically the set of instructions are pre-fetched in blocks of 4 bytes. After instruction fetch unit 203 acquires an instruction and finishes processing it (e.g., by extracting the instruction length information), the address stored in program counter can be updated to point to the next instruction to be processed by instruction fetch unit 203.
As an illustrative example, software codes 104 of
On the other hand, if instruction fetch unit 203 has finished processing a branching instruction, instruction fetch unit 203 may perform a branch prediction operation, and pre-fetch a target instruction from a target location of the branching instruction, before the branching instruction is executed by instruction execution unit 208. As an illustrative example, referring to software codes 102 of
With such an arrangement, computer processor 202 does not need to wait until the execution of the branching instruction by instruction execution unit 208 to determine the target instruction, and the branching operation can be speeded up considerably.
Branch prediction buffer 216 can provide information that allows instruction fetch unit 203 to perform the aforementioned branch prediction operation. For example, branch prediction buffer 216 can maintain a mapping table that pairs an address of a fetched instruction with a target address. The address of the fetched instruction can be the address stored in program counter 204. The fetched instruction can be branching instruction, or an instruction next to a branching instruction. The target address can be associated with a target instruction to be executed as a result of execution of the branching instruction. The pairing may be created based on prior history of branching operations. As an illustrative example, computer processor 202 can maintain a prior execution history of software codes 102 of
After instruction fetch unit 203 pre-fetches a first set of instructions based on the address stored in program counter 204, instruction fetch unit 203 can also access branch prediction buffer 216 to determine whether a pairing between the address and a target address exists. If such a pairing can be found, instruction fetch unit 203 may pre-fetch a second set of instructions including the target instruction from the target address. On the other hand, if such a pairing cannot be found, instruction fetch unit 203 can assume the instructions are to be executed sequentially following the order by which they are stored in memory system 220, and can pre-fetch a second set of consecutive instructions immediately following the first set of instructions. Instruction fetch unit 203 then stores the pre-fetched instructions in instruction fetch buffer 214, and then acquires the pre-fetched instructions later for processing and execution.
Despite the speed and performance improvement brought about by branch prediction and pre-fetching, self-modifying codes can pose potential pipeline hazards to these operations. Reference is now made to
In the illustrative example shown in
Referring to
For fetch block 1, however, instruction fetch unit 203 may acquire a target address from the pairing stored in branch prediction buffer 216, and then control instruction fetch buffer 214 to acquire the instruction data from address 0x100 at memory system 220, instead of acquiring the instruction data from address location 0x04 for the remaining byte of the “movsbl” instruction data. As a result, as shown in
A pipeline hazard may occur in the scenario depicted in
To mitigate the aforementioned pipeline hazards, computer processor 202 may need to remove the branch prediction decision that leads to the fetching of fetch blocks 0 and 1 (e.g., by removing the pairing stored in branch prediction buffer 216 shown in
On the other hand, if the fetch block 0 in
Reference is now made to
As shown in
In some embodiments, as shown in
When instruction fetch unit 203 accesses instruction fetch buffer 214 again to acquire fetch blocks 0 and 1 for processing, instruction fetch unit 203 may then determine, based on the indications provided by pre-fetch state register 402, that the software codes being processed have been modified. For example, if the branch indication bit of fetch block 0 is “one,” which indicates that it has a predicted taken branch, instruction fetch unit 203 may determine that the instructions in fetch block 0 includes a branch instruction. Based on this determination, instruction fetch unit 203 may also determine that fetch block 0 includes complete data for every instruction included in the fetch block, and that fetch block 1 should not include data for decoding any instruction in fetch block 0. Therefore, when extracting information of an instruction of fetch block 0, if instruction fetch unit 203 determines that some data from fetch block 1 is also needed to extract the information (e.g., to determine the instruction length) of the instruction, instruction fetch unit 203 may determine that fetch block 0 no longer includes a branching instruction with a target instruction in fetch block 1, contrary to what the associated branch indication bit indicates. Therefore, instruction fetch unit 203 may determine that the software codes are likely to have been modified. Based on this determination, instruction fetch unit 203 (or some other internal logics of computer processor 202) may transmit a signal to branch prediction buffer 216 to remove the pairing entry between address 0x00 and target address 0x100. The internal buffers of instruction fetch unit 203, instruction decode unit 206, write back unit 212, etc., can also be reset to ensure correct execution of the modified software codes.
On the other hand, if the branch indication bit of fetch block 0 is “zero,” which indicates that fetch block 0 does not have a predicted taken branch, instruction fetch unit 203 may determine that the fetch block 0 does not include a branch instruction. Therefore, instruction fetch unit 203 may determine that fetch blocks 0 and 1 likely contain consecutive instructions, and pipeline hazards are unlikely to occur, as explained above. Therefore, instruction fetch unit 203 does not need to take additional actions, and can just process fetch blocks 0 and 1 and provide the fetch block data to instruction decode unit 206 for decoding.
In some embodiments, computer processor 202 may also include a pre-fetch state register 404 configured to store the byte locations of a predetermined branching instruction (e.g., the “jmp” branching instruction). The byte locations may include, for example, a starting byte location, an ending byte location, etc., and can be associated with a fetched instruction address (and the associated target address) stored in branch prediction buffer 216. The byte locations can also be used to determine whether an instruction stored in a particular fetch block has been modified, which can also provide an indication that the piece of software codes being executed by computer processor 202 have been modified. Although
Referring to
In some embodiments, the detection of self-modifying codes can also be based on a combination of information provided by pre-fetch state registers 402 and 404. For example, pre-fetch state register 404 may only store the starting byte location of the predetermined branching instruction. Instruction fetch unit 203 may determine that an instruction of fetch block 0 is associated with a matching starting byte location, but its ending byte location (based on the extracted instruction length information) indicates that the instruction data extends into fetch block 1. If the branch indication bit (stored in pre-fetch state register 402) of fetch block 1 is “one,” which may indicate that fetch block 1 is fetched as a result of branch prediction and do not include any data of an instruction of fetch block 0, instruction fetch unit 203 may also determine that instructions stored in fetch block 0 has been modified, and that the piece of software codes being executed by computer processor 202 have been modified. The same determination can also be made if instruction fetch unit 203 determines that data from fetch block 1 is needed to determine the instruction length, and that the branch indication bit of fetch block 1 is “one,” as discussed above. Instruction fetch unit 203 may then reset its internal buffers, and transmit reset signals to internal buffers of instruction decode unit 206, and write back unit 212, etc., to avoid the incorrect decoding result being propagated through the pipeline.
With embodiments of the present disclosure, instructions of self-modifying codes can be detected from pre-fetched instruction data, before the instruction data are forwarded for decoding and execution. As a result, the likelihood of identifying and executing incorrect instructions due to the aforementioned pipeline hazards caused by self-modifying codes can be mitigated. Moreover, corrective actions can also be taken when the pipeline hazards are detected before the pre-fetched instructions are decoded and executed, thereby incorrect decoding result can be prevented from propagating through the pipeline. As a result, proper and timely execution of the modified software codes can be ensured.
Reference is now made to
After an initial start, method 500 proceeds to step 502, where computer processor 202 receive a fetch block of instruction data from instruction fetch buffer 214.
After receiving the fetch block, at step 504, computer processor 202 determines whether the fetch block has a predicted taken branch. The determination can be based on, for example, a branch indication bit of pre-fetch state register 402 associated with the fetch block. If computer processor 202 determines, in step 506, that the fetch block does not have a predicted taken branch, it can then determine that the fetch block is not associated with a branch prediction operation, and there is no need to take further action. Therefore, method 500 can then proceed to the end.
If computer processor 202 determines that the fetch block has a predicted taken branch (in step 506), it can then determine whether the fetch block has sufficient data for instruction length determination, in step 508. Instruction length determination can be based on the first byte of an instruction data, as well as the values of various fields of an instruction (e.g., ModR/M byte, SIB byte, etc.). As discussed above, in a case where the fetch block has a predicted branch, the fetch block should include complete data for every instruction included in the fetch block, and none of these instructions should extend into another fetch block that includes the branching target instruction. If computer processor 202 determines that the fetch block does not include sufficient data for instruction length determination, in step 510, it can proceed to determine that self-modifying codes are detected, and perform additional actions including, for example, removing a pairing entry from branch prediction buffer, flushing the internal buffers of computer processor 202, etc., in step 512.
If computer processor 202 determines that the fetch block includes sufficient data for instruction for instruction length determination (in step 510), computer processor 202 can proceed to determine instruction lengths and byte locations for each instruction in the fetch block, in step 514. In step 516. computer processor 202 can then receive the byte locations for a predetermined branching instruction in fetch block. As discussed above, the byte locations can include, for example, a starting byte location and an ending byte location of the predetermined branching instruction. Computer processor 202 may receive the byte locations information from, for example, pre-fetch state register 404.
After receiving the byte locations information from pre-fetch state register and determining the byte locations information of the instructions of the fetch block, computer processor 202 can then proceed to determine whether there is at least one instruction of the fetch block with starting and ending byte locations that match those of the predetermined branching instruction, in step 518. If the computer processor 202 determines that no instruction of the fetch block has the matching starting and ending byte locations (in step 520), which can indicate that the data of at least one instruction extends beyond the fetch block and cannot be the predetermined branching instruction, it can then proceed to step 512 and determine that the instruction of the fetch block has been modified, and self-modifying codes are detected. On the other hand, if an instruction with matching starting and ending byte locations (or just matching ending byte locations) is found in step 520, computer processor 202 may determine that either the software codes being executed are not self-modifying codes, or that the fetch block includes complete data for the instructions, and can proceed to the end without taking additional actions. Computer processor 202 may also discard a subsequent instruction (if any) to the predetermined branching instruction in the fetch block, because of the branch prediction operation.
It will be appreciated that the present invention is not limited to the exact construction that has been described above and illustrated in the accompanying drawings, and that various modifications and changes can be made without departing from the scope thereof. It is intended that the scope of the invention should only be limited by the appended claims.