This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2015-038984, filed on Feb. 27, 2015, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a program profiler circuit, a processor, and a program counting method.
For example, as a tool for analyzing the performance of a program executed by a processor such as a central processing unit (CPU), a profiler device configured to measure execution times of various events executed by the program or the numbers of executions of the events is known. The profiler device of this type includes an index table that receives values (addresses), generated by the processor, of a program counter and outputs function numbers indicating functions (subroutines) corresponding to the values of the program counter. Then, the profiler device measures execution times of the functions based on time periods for continuously outputting the function numbers from the index table and measures the numbers of executions of the functions based on changes in the function numbers. The execution times and the numbers of the executions that are measured by the profiler device are stored in a memory, and the performance of the program is analyzed based on the information stored in the memory (refer to, for example, Japanese Laid-open Patent Publication No. 2004-348635).
In addition, in order to write information such as the number of occurrences of a specific instruction or the like in the memory, the profiler device writes information having a high degree of importance over information having a low degree of importance and stored in the memory and thereby suppresses insufficiency of a region in which the information is to be written (refer to, for example, Japanese Laid-open Patent Publication No. 2002-342125).
If an address size of a storage region for storing the program to be analyzed is larger than an address size of the index table in the aforementioned profiler device, the index table may convert a value of the program counter into an erroneous function number. It is, therefore, difficult for the profiler device to measure execution times of the functions included in the program stored in the storage region that has the address size larger than that of the index table.
According to an aspect, a program profiler circuit, a processor, and a program counting method that are disclosed herein aim to measure execution times of subroutines included in a program regardless of the size of the program.
According to an aspect of the invention, an program profiler circuit includes: a stack having a first storage region for stacking, when an instruction to call a subroutine is detected, a head address of the subroutine and for unstacking a lastly stacked head address when a restoration instruction to return to a source from which the subroutine is called is detected; a matching determining unit that has a plurality of second storage regions in which head addresses of subroutines are registered and is configured to output region information indicating a second storage region having registered therein a head address that matches the head address lastly stacked by the stack processing unit; and an accumulator that has a plurality of accumulation regions corresponding to the plurality of second storage regions and is configured to increment with a predetermined value to a value stored in an accumulation region corresponding to the region information output from the matching determining unit.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
Hereinafter, embodiments are described with reference to the accompanying drawings. For signal lines through which signals are transmitted, reference symbols that are the same as the names of the signals are used. A signal with “/” at the top of a signal name indicates a negative logical level.
The arithmetic processing device 200 detects an instruction to call a subroutine SR, generates call information JSR based on the detection of the call instruction, and outputs a head address HADD of the subroutine SR to be called. In addition, the arithmetic processing device 200 detects a restoration instruction to cause the program to return to a source from which the subroutine SR is called, and the arithmetic processing device 200 generates restoration information RTS based on the detection of the restoration instruction.
The arithmetic processing device 200 outputs an address ADD to a memory 400, fetches an instruction included in the program stored in the memory 400, and executes the fetched instruction. In the example illustrated in
The program profiler circuit 300 includes a stack processing unit 310, a matching determining unit 320, and an accumulator 330. The stack processing unit 310 includes a storage region 312 for sequentially holding the head addresses HADD of the subroutines and stacks the head addresses HADD output from the arithmetic processing device 200 in the storage region 312 based on the call information JSR. The stack processing unit 310 outputs a lastly stacked head address HADD. The stack processing unit 310 unstacks the lastly stacked head address HADD from the storage region 312 based on the restoration information RTS. In this manner, the stack processing unit 310 operates in a so-called first-in-last-out scheme.
The stack processing unit 310 executes the operation of stacking a head address HADD based on the call information JSR generated by the arithmetic processing device 200 upon the detection of a call instruction and executes the operation of unstacking a head address HADD based on the restoration information RTS generated by the arithmetic processing device 200 upon the detection of a restoration instruction. The call information JSR and the restoration information RTS exist in a conventional processing device. Thus, the stacking operation and unstacking operation of the stack processing unit 310 may be achieved by adding, to the arithmetic processing device 200, signal lines through which the call information JSR and the restoration information RTS are transmitted to the external without the addition of a new circuit to the arithmetic processing device 200.
The number (storage capacity of the stack processing unit 310) of head addresses able to be held by the stack processing unit 310 is set based on the maximum number of nests of the subroutines described in the program. For example, if the maximum number of the nests is “16”, it is sufficient if the stack processing unit 310 has 16 regions for holding head addresses.
The matching determining unit 320 includes multiple storage regions 322 in which the head addresses of the subroutines are registered in advance. In
In the example illustrated in
The number of storage regions 322 included in the matching determining unit 320 is set based on the maximum number of subroutines described in the program. Thus, the number of the storage regions 322 included in the matching determining unit 320 may be reduced, compared with a case where storage regions 322 that correspond to all addresses of the program to be measured by the program profiler circuit 300 are included in the matching determining unit 320. In other words, by installing the stack processing unit 310, only the head addresses HADD of the subroutines may be supplied to the matching determining unit 320, instead of the supply of all the addresses of the program to be measured, and the size of a circuit of the matching determining unit 320 may be reduced.
The accumulator 330 includes multiple accumulation regions 332 corresponding to the multiple storage regions 322 of the matching determining unit 320, respectively. Specifically, the number of the accumulation regions 332 included in the accumulator 330 is set based on the maximum number of subroutines described in the program, like the matching determining unit 320. During the time when the region information AINF is output from the matching determining unit 320, the accumulator 330 repeats an accumulation process of adding a predetermined value to a value stored in an accumulation region 332 corresponding to the region information AINF and storing a value obtained by the addition in the accumulation region 332 corresponding to the region information AINF. Thus, accumulated values that correspond to time periods for which the subroutines SR are executed are stored in the accumulation regions 332 corresponding to the subroutines SR. The execution times of the subroutines SR are indicated by products of the accumulated values stored in the accumulation regions 332 and a cycle of the accumulation process of adding the predetermined value to the values stored in the accumulation regions 332 and storing values obtained by the addition in the accumulation regions 332. In other words, the accumulated values stored in the accumulation regions 332 indicate the execution times of the subroutines SR. Specifically, a program counting method of measuring the execution times of the subroutines SR is achieved by the program profiler circuit 300.
For example, the process of adding the predetermined value and storing values in the accumulation regions 332 is executed for each cycle of a clock to be used to cause the program profiler circuit 300 to operate. In this case, in order to execute the addition process without an erroneous operation of the accumulator 330, it is preferable that the frequency of the clock to be used to cause the program profiler circuit 300 to operate be lower than the frequency of a clock to be used to cause the arithmetic processing device 200 to operate. If the process of adding the predetermined value and storing values in the accumulation regions 332 is executed for each cycle of the clock, the execution times of the subroutines SR are indicated by products of the accumulated values stored in the accumulation regions 332 and the cycle of the clock.
In the embodiment illustrated in
In addition, the stacking operation and unstacking operation of the stack processing unit 310 may be achieved by using the call information JSR and restoration information RTS generated by the arithmetic processing device 200 without the addition of a new circuit to the arithmetic processing device 200.
Furthermore, the execution times of the subroutines are measured by the program profiler circuit 300 without the insertion of an intercept process routine for measurement or the like in the program. Thus, the execution times of the subroutines may be measured without a reduction in the efficiency of executing the program.
The CPU 200A operates in synchronization with a clock CLK and executes programs stored in a memory such as a main memory or a cache memory. The programs to be executed by the CPU 200A include an operating system, a program to be executed to measure execution times of subroutines and to be evaluated, and an evaluation program. The evaluation program is executed to control the program profiler circuit 300A and measure the execution times of the subroutines (functions) included in a program (application program or the like) to be evaluated. The cache memory may be installed in the processor 100A. The CPU 200A is an example of an arithmetic processing device.
The processor 100A has a function of setting predetermined information in the register IOREG based on the evaluation program executed by the CPU 200A. In addition, the processor 100A has a function of generating an address EAD, a write enable signal EWE, a chip select signal ECS, a data input signal EDI, and a control signal CAMWR based on the evaluation program executed by the CPU 200A. In addition, the processor 100A has a function of receiving a data output signal EDO from the program profiler circuit 300A based on the evaluation program executed by the CPU 200A. The register IOREG outputs a mode signal EMD and a task run signal TRUN based on the set information.
The CPU 200A decodes a jump subroutine (JSR) instruction to call a subroutine and outputs, based on the decoding of the JSR instruction, a head address CAD of the subroutine and call information JSR indicating the execution of the JSR instruction to the program profiler circuit 300A. The JSR instruction is an example of a call instruction. In addition, the CPU 200A decodes a return subroutine (RTS) instruction to cause the program to return to a source from which a subroutine is called, and the CPU 200A outputs, based on the decoding of the RTS instruction, restoration information RTS indicating the execution of the RTS instruction to the program profiler circuit 300A. The RTS instruction is an example of a restoration instruction. An example of the CPU 200A is illustrated in
The program profiler circuit 300A includes a stack processing unit 10, a flip-flop 11, a content addressable memory (CAM) 20, a selector 30, and a random access memory (RAM) 40. The program profiler circuit 300A also includes a register 50, an incrementer 60, a divider 70, a memory controller 80, a decoder 90, switches SW1 and SW2, and OR circuits OR1 and OR2.
When receiving the mode signal EMD of a first logical level, the program profiler circuit 300A transitions to a measurement mode in which the program profiler circuit 300A measures the execution times of the subroutines included in the program (application program or the like) to be evaluated. When receiving the mode signal EMD of a second logical level different from the first logical level, the program profiler circuit 300A initializes the CAM 20 and the RAM 40 and transitions to an evaluation mode in which the program profiler circuit 300A reads, from the RAM 40, information indicating the execution times measured in the measurement mode. Outlines of the evaluation mode and measurement mode are illustrated in
The OR circuit OR1 outputs an enable signal EN during the time when the OR circuit OR1 receives the call information JSR or the restoration information RTS from the CPU 200A. The OR circuit OR2 outputs a task run signal TRUN or a chip select signal ECS to a chip select terminal CS of the RAM 40. The task run signal TRUN is asserted when the program to be evaluated is to be executed by the CPU 200A in the measurement mode. The chip select signal ECS is generated by the processor 100A based on the execution of the evaluation program.
The stack processing unit 10 includes storage regions (flip-flops FF1 to FF16 illustrated in
The flip-flop 11 (D-FF) latches the address HAD received from the stack processing unit 10 in synchronization with a divided clock DCLK and outputs the latched address as an address HADd to the CAM 20.
The CAM 20 has multiple storage regions in which the head addresses of the subroutines are registered in advance. An example of the state of the CAM 20 in which the head addresses are already registered is illustrated in
The CAM 20 registers, in the storage regions based on a control signal CAMWR generated by the processor 100A, data indicating the head addresses of the subroutines included in the program executed by the CPU 200A. The control signal CAMWR includes addresses indicating the storage regions of the CAM 20, data signals indicating logical levels of the data to be written in the storage regions, and a signal that controls the writing of the data. The data is written in the CAM 20 based on the control signal CAMWR before the measurement of the execution times of the subroutines is started by the evaluation program executed by the CPU 200A.
The decoder 90 decodes the address EAD generated within the processor 100A in the evaluation mode and asserts any of word line signals EWL that is indicated by the address EAD. The selector 30 transfers the word line signal EWL received from the decoder 90 through a word line WL to the RAM 40 in the evaluation mode. The selector 30 transfers the data signals DT received from the CAM 20 through word lines WL to the RAM 40 in the measurement mode.
The RAM 40 executes a writing operation when receiving a high-level signal by the chip select terminal CS and receiving a low-level signal by a write enable terminal/WE. In the writing operation, the RAM 40 writes data received by data input terminals DI in memory cells connected to the word line WL that is at a high level and through which the RAM 40 receives the signal through the selector 30. The data input terminals DI receive the data input signal EDI through the switch SW1 in the evaluation mode and receive a value output from the incrementer 60 through the switch SW1 in the measurement mode. For example, the data input terminals DI is 32-bit terminals.
The RAM 40 executes a reading operation when receiving a high-level signal by the chip select terminal CS and receiving a high-level signal by the write enable terminal/WE. In the reading operation, the RAM 40 outputs, from data output terminals DO, data read from memory cells connected to the word line WL at the high level. For example, the data output terminals DO are 32-bit terminals. The RAM 40 is an example of a storage unit that reads, based on a read request, values held by memory cells MC connected to the word line WL at the high level and writes, based on a write request, values in the memory cell MC connected to the word line WL at the high level.
Logical levels of the chip select terminal CS, write enable terminal/WE, and word lines WL when the RAM 40 operates depend on a circuit of the RAM 40 and are not limited to the aforementioned levels. An example of the RAM 40 is illustrated in
The register 50 holds, in synchronization with a clock RCLK, the data output from the data output terminals DO of the RAM 40 and outputs the held data to the incrementer 60. The register 50 is an example of a holder that holds a value read from the RAM 40.
The incrementer 60 receives the data output from the register 50, increments values of the received data by 1, and outputs the incremented data to the RAM 40 through data input terminals DI. The incrementer 60 is an example of an adder that adds a predetermined value “1” to values held by the register 50 and outputs values obtained by the addition to the RAM 40. The RAM 40, the register 50, and the incrementer 60 are an example of an accumulator (counter) that repeats a process of adding the predetermined value to a value stored in an accumulation region corresponding to region information during the time when the CAM 20 outputs the region information.
The divider 70 divides the frequency of the clock CLK and generates the divided clock DCLK. The divider 70 may be installed outside the program profiler circuit 300A. A clock that is different from the clock CLK may be supplied to the processor 100A from outside the processor 100A, while the divider 70 may not be installed in the program profiler circuit 300A. It is sufficient if the frequency of the divided clock DCLK may be a frequency that enables the reading operation and writing operation of the RAM 40 to be executed within one cycle of the divided clock DCLK.
The memory controller 80 generates the write enable signal DWE and the clock RCLK in synchronization with the divided clock DCLK. When the write enable signal DWE is at a high level, the write enable signal DWE indicates the read request to be transmitted to the RAM 40. When the write enable signal DWE is at a low level, the write enable signal DWE indicates the write request to be transmitted to the RAM 40. Examples of waveforms of the write enable signal DWE and clock RCLK generated by the memory controller 80 are illustrated in
The switch SW2 transfers the write enable signal EWE generated by the processor 100A to the write enable terminal/WE of the RAM 40 in the evaluation mode. The switch SW2 transfers the write enable signal DWE received from the memory controller 80 to the write enable terminal/WE of the RAM 40 in the measurement mode.
The program counter PC outputs an address received from the selector S1 to the incrementer INC and the selector S2. The incrementer INC increments the address received from the program counter PC and outputs the incremented address PC+ to the selector S1.
If a selection signal ASEL output from the instruction decoder DEC indicates that instructions are fetched in order of addresses, the selector S1 selects the address PC+ received from the incrementer INC. If the selection signal ASEL indicates the execution of an address change instruction to change an address to another address other than the address PC+, the selector S1 selects an address CAD received from the operating unit OPU. In this case, the address change instruction is the JSR instruction, the RTS instruction, a branch instruction, a jump instruction, or the like. Then, the selector S1 outputs the selected address to the program counter PC. If the instructions are to be sequentially fetched, the selector S2 selects an address output from the program counter PC. If the address change instruction is executed and the CPU 200A outputs or receives data based on a load instruction, a store instruction, or the like, the selector S2 selects an address output from the address register AREG. Then, the selector S2 outputs the selected address to the memory such as the main memory or the cache memory.
In order for the CPU 200A to fetch an instruction, the instruction is read as read data from the memory based on the address output from the selector S2 and is stored in the instruction register IREG. If the CPU 200A executes the load instruction, data is read from the memory based on the address output from the selector S2 and is stored in the register file REG. If the CPU 200A executes the store instruction, data output from the data register DREG is written as write data in the memory based on the address output from the selector S2.
The instruction register IREG has multiple regions for holding instructions received from the memory and sequentially outputs the held instructions to the instruction decoder DEC. The instruction decoder DEC decodes the instructions received from the instruction register IREG and generates, based on the results of the decoding, multiple control signals that control operations of the operating unit OPU, selectors S1 and S2, and the like. The multiple control signals include the call information JSR, the restoration information RTS, and the selection signal ASEL.
The data register DREG has multiple regions for holding data output from the operating unit OPU upon the execution of the store instruction. The address register AREG has multiple regions for holding addresses output from the operating unit OPU upon the execution of the address change instruction, load instruction, or store instruction.
The register file REG has multiple registers for holding data read from the memory or data output from the executor EX. The register file REG outputs, to the executor EX, data held by at least any of the multiple registers of the register file REG based on the control signals received from the instruction decoder DEC.
The executor EX executes calculation in accordance with the instructions decoded by the instruction decoder DEC and outputs results of the calculation to the register file REG, the decoder register DREG, the address register AREG, or the selector S1.
The holder HLD15 includes 16 flip-flops FF (FF1 to FF16) that hold a 16th-bit [15] of the address CAD. The flip-flops FF1 to FF16 are an example of a first storage region. The holder HLD15 includes 16 multiplexers MUX (MUX1 to MUX16) configured to stack or unstack the address CAD bit [15] in or from the adjacent flip-flops FF. In
The number of flip-flops FF and the number of multiplexers MUX are set to numbers that are equal to or larger than the number of nests (layers) of the subroutines included in the program. In other words, the program profiler circuit 300A that includes the stack processing unit 10 illustrated in
The flip-flops FF operate in synchronization with the clock CLK received by clock terminals CK during the time when the enable signal EN is at the high level in the measurement mode in which the mode signal EMD is negated to the low level. The flip-flops FF latch values of the address CAD [15] received by input terminals IN in synchronization with the clock CLK and output the latched values from output terminals OUT. The output terminal OUT of the flip-flop FF1 is connected to an input terminal IN1 of the multiplexer MUX2 and an address line through which an address HAD [15] among 16-bit addresses HAD [0:15] is transmitted.
The output terminals OUT of the flip-flops FF2 to FF16 are connected to input terminals IN2 of the multiplexers MUX1 to MUX15 corresponding to the flip-flops FF1 to FF15 each located at the next higher stage (upper side of
The multiplexers MUX1 to MUX16 output, from output terminals OUT, the address CAD [15] received by the input terminals IN1 during the time when the multiplexers MUX1 to MUX16 receive the low-level restoration information RTS by selection terminals SEL. In addition, the multiplexers MUX1 to MUX16 output, from the output terminals OUT, the address CAD [15] received by the input terminals IN2 during the time when the multiplexers MUX1 to MUX16 receive the high-level restoration information RTS by the selection terminals SEL. The output terminals OUT of the multiplexers MUX1 to MUX16 are connected to the input terminals IN of the flip-flops FF1 to FF16, respectively. The multiplexer MUX1 receives the address CAD [15] from the CPU 200A by the input terminal IN1. In
When receiving the high-level enable signal EN and the low-level restoration information RTS, the holder HLD15 executes the stacking operation of holding the address CAD [15] received from the CPU 200A. The high-level enable signal EN and the low-level restoration information RTS indicate the execution of the JSR instruction. In the stacking operation, the address CAD [15] held by the flip-flops FF1 to FF15 is transferred to the flip-flops FF2 to FF16 each located at the next lower stage. The holder HLD15 outputs the newly stacked address CAD [15] as the address HAD [15].
When receiving the high-level enable signal EN and the high-level restoration information RTS, the holder HLD15 executes the unstacking operation of transferring the address CAD [15] held by the flip-flops FF to the flip-flops FF each located at the next higher stage. The high-level enable signal EN and the high-level restoration information RTS indicate the execution of the RTS instruction. Then, the holder HLD15 outputs, as the address HAD [15], the address CAD [15] transferred from the flip-flop FF2 to the flip-flop FF1. Operations of the holders HLD1 to HLD14 are the same as the operations of the holder HLD15, except for the fact that bits of the addresses CAD held by the holders HLD1 to HLD14 are different from bits of the address CAD held by the holder HLD15.
In the evaluation mode in which the mode signal EMD is asserted to the high level, the flip-flops FF do not execute the operation of latching data and the stack processing unit 10 stops the operations of stacking and unstacking the addresses CAD.
The program to be evaluated includes a main routine including an instruction JSR(A) to call a subroutine A and an instruction JSR(B) to call a subroutine B and includes the subroutines A and B and a subroutine C. The subroutine A includes an instruction JSR(C) to call the subroutine C. For example, the main routine is stored from an address “0100h” of the main memory, and the subroutines A, B, and C are stored from addresses “0200h”, “0300h”, and “0400h” of the main memory.
The evaluation program writes the head address “0200h” of the subroutine A illustrated in
The CAM 20 compares a value of the address HAD received from the stack processing unit 10 with the address values held in the multiple storage regions. If the value of the address HAD matches any of the address values held in the multiple storage regions, the CAM 20 asserts, to the high level, a data line DT corresponding to a storage region holding the matched address value. As illustrated in
The CAM 20 sets the data line DT0 to the high level during the time when the CAM 20 receives the address HAD “0200h” from the stack processing unit 10. The CAM 20 sets the data line DT1 to the high level during the time when the CAM 20 receives the address HAD “0300h” from the stack processing unit 10. The CAM 20 sets the data line DT2 to the high level during the time when the CAM 20 receives the address HAD “0400h” from the stack processing unit 10.
The control circuit CNTL outputs a write control signal WR when receiving the high-level task run signal TRUN by a chip select terminal CS and receiving a low-level signal by a write enable terminal/WE. In addition, the control circuit CNTL outputs a read control signal RD when receiving the high-level task run signal TRUN by the chip select terminal CS and receiving a high-level signal by the write enable terminal/WE. Any of the write enable signals DWE and EWE is supplied to the write enable terminal/WE. The control circuit CNTL includes a circuit that avoids overlapping of a time period for which the write control signal WR is at the high level with a time period for which the read control signal RD is at the high level.
The write amplifiers WA amplify signal amounts of data received by the data input terminals DI (DI0 to DI31) based on the write control signal WR and output the amplified data to the bit lines BL (BL0 to BL31). The read amplifiers RA amplify signal amounts of data on the bit lines BL (BL0 to BL31) based on the read control signal RD and output the amplified data to the data output terminals DO (DO0 to DO31).
First, in step S10, the CPU 200A causes the processor 100A to generate the control signal CAMWR and registers the head addresses of the subroutines of the program to be evaluated in the CAM 20. The head addresses to be registered in the CAM 20 are generated at a stage of compiling the program to be evaluated, coupling the program to be evaluated to a load module by a linker, and loading the program to be evaluated in a memory by a loader.
Next, in step S20, the CPU 200A causes the processor 100A to generate the address EAD, the write enable signal EWE, the chip select signal ECS, and the data input signal EDI and causes the RAM 40 to execute the writing operation. Then, the CPU 200A writes “0” in all the memory cells MC of the RAM 40. By steps S10 and S20, the program profiler circuit 300A is initialized.
Next, in step S30, the CPU 200A causes the processor 100A to assert the task run signal TRUN and thereby causes the program profiler circuit 300A to transition from the evaluation mode to the measurement mode.
Next, in step S40, the CPU 200A starts to execute the program to be evaluated. After the execution of the program to be evaluated is terminated, the CPU 200A causes the processor 100A to negate the task run signal TRUN and causes the program profiler circuit 300A to transition from the measurement mode to the evaluation mode in step S50.
Next, in step S60, the CPU 200A causes the processor 100A to generate the address EAD, the write enable signal EWE, and the chip select signal ECS and causes the RAM 40 to execute the reading operation. Then, the CPU 200A reads, from the RAM 40, the data output signal EDO indicating the numbers of execution cycles of the subroutines A, B, and C included in the program to be evaluated.
Next, in step S70, the CPU 200A calculates products of the cycle of the divided clock DCLK and the numbers, read from the RAM 40, of the execution cycles of the subroutines A, B, and C. The calculated products indicate the execution times of the subroutines A, B, and C. The CPU 200A outputs the calculated products. Then, a person who designed the program to be evaluated or the like checks the validity of the execution times, indicated by the calculated products, of the subroutines A, B, and C.
The divided clock DCLK is output regardless of a logical value of the mode signal EMD ((a) illustrated in
The stack processing unit 10 illustrated in
The CPU 200A sequentially increments a value of the program counter PC and executes the subroutine A ((e) illustrated in
Since the logical value of the mode signal EMD is “0” (or indicates the measurement mode), the selector 30 illustrated in
The memory controller 80 generates, in synchronization with the divided clock DCLK, the write enable signal DWE delayed by a predetermined time and having a predetermined pulse width ((g) illustrated in
The flip-flop 11 latches the address HAD in synchronization with the divided clock DCLK obtained by dividing the frequency of the clock CLK to be used to cause the CPU 200A to operate and generates the address HADd in synchronization with the divided clock DCLK. Thus, the memory controller 80 may sequentially generate, in synchronization with the divided clock DCLK, a read request and a write request that are each indicated by the write enable signal DWE. The timing of outputting the write enable signal /WE (read request) to be supplied to the RAM 40 may match the timing of changing a word line WL to the high level based on the address HADd. Since the memory controller 80 generates the clock RCLK synchronized with the divided clock DCLK, the register 50 may latch the output of the RAM 40 after a certain time after the supply of the read request. As a result, the reading operation and the writing operation may be accurately executed by the RAM 40.
The memory controller 80 may generate the high-level write enable signal DWE based on falling edges of the divided clock DCLK and generate the low-level write enable signal DWE based on rising edges of the divided clock DCLK. In this case, the memory controller 80 sequentially generates the high-level write enable signal DWE (read request) and the low-level write enable signal DWE (write request) upon the assertion of the data line DT0.
In the example illustrated in
The control circuit CNTL of the RAM 40 illustrated in
The RAM 40 asserts the write control signal WR in synchronization with the falling edges of the divided clock DCLK and executes the writing operation ((k) illustrated in
After that, the memory controller 80 repeatedly generates the write enable signal DWE and the control circuit CNTL of the RAM 40 alternately generates the read control signal RD and the write control signal WR based on the write enable signal DWE. Then, the reading operation and the writing operation are alternately executed, and the data stored in the memory cells MC connected to the word line WL0 is increased by 1. In the example illustrated in
The instruction decoder DEC of the CPU 200A outputs the call information JSR(C) based on the decoding of the instruction JSR(C) ((n) illustrated in
The CAM 20 asserts, to the high level, the data line DT2 corresponding to a storage region storing the same value (“0400h”) as the address HAD and negates the data line DT0 to the low level ((q) and (r) illustrated in
The instruction decoder DEC of the CPU 200A outputs the restoration information RTS(C) based on the decoding of the instruction RTS(C) ((t) illustrated in
The stack processing unit 10 executes the unstacking operation based on the enable signal EN and the high-level restoration information RTS and transfers the value (“0200h”) of the address CAD held by the flip-flop FF2 illustrated in
After that, the program profiler circuit 300A operates in the same manner as the operations executed upon the execution of the subroutine A and alternately executes the reading operation and the writing operation on the memory cells MC connected to the word line WL0 corresponding to the data line DT0 of the RAM 40 ((w) illustrated in
The instruction decoder DEC of the CPU 200A outputs the restoration information RTS(A) based on the decoding of the instruction RTS(A) ((x) illustrated in
Next, the instruction decoder DEC of the CPU 200A outputs the call information JSR(B) based on the decoding of the instruction JSR(B) ((a) illustrated in
The CAM 20 asserts, to the high level, the data line DT1 corresponding to a storage region storing the same value (“0300h”) as the address HAD ((d) illustrated in
The instruction decoder DEC of the CPU 200A outputs the restoration information RTS(B) based on the decoding of the instruction RTS(B) ((f) illustrated in
The stack processing unit 10 executes the unstacking operation based on the enable signal EN and the high-level restoration information RTS, transfers the initial value “0” held by the flip-flop FF2 illustrated in
By the aforementioned operations, “7” is held by memory cells MC corresponding to the subroutine A and “5” is held by memory cells MC corresponding to the subroutine B in the RAM 40 after the execution of the program to be evaluated. In addition, “3” is held by memory cells MC corresponding to the subroutine C. In the actual program, the numbers of execution cycles of the subroutines A, B, and C are larger than the numbers of cycles that are illustrated in
In the second embodiment illustrated in
In the second embodiment illustrated in
The memory controller 80 generates the read request indicated by the write enable signal DWE in synchronization with the divided clock DCLK and the flip-flop 11 latches the address HAD in synchronization with the divided clock DCLK and generates the address HADd to be supplied to the CAM 20. Thus, the timing of the read request may match the timing of changing a word line WL to the high level. As a result, the supply of the read request to the RAM 40 before the change of the word line WL to the high level may be suppressed and an erroneous operation of the RAM 40 may be suppressed.
The CAM 20B is different from the CAM 20 illustrated in
The selector 30B transmits the address EAD as an address AD to the RAM 40B during the assertion of the mode signal EMD (evaluation mode). In addition, the selector 30B transmits a data signal DT received from the CAM 20B as the address AD to the RAM 40B during the negation of the mode signal EMD (measurement mode).
The RAM 40B has an address decoder corresponding to the decoder 90 illustrated in
In the third embodiment illustrated in
The characteristic points and advantages of the embodiments will be clarified from the above description. This indicates that the claims include the characteristic points and advantages of the aforementioned embodiments without departing from the spirit and scope of the claims. In addition, persons who have common knowledge in the art may easily conceive various modifications and changes. Thus, it is not intended that the scope of the inventive embodiments is limited to the above description. The embodiments may be based on appropriate modifications and equivalents included in the scope disclosed in the embodiments.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2015-038984 | Feb 2015 | JP | national |