Claims
- 1. A method for analyzing instruction completion delays in a processor, said method comprising:
identifying at least one potential completion delay cause for completing a group of instructions; assigning a counter for each said at least one potential completion delay cause; counting in every said counter a number of clock cycles required to complete execution of said group of instructions; and upon completion of said group of instructions:
identifying a final completion delay cause, from said at least one potential completion delay cause, that delayed execution of a final instruction in said group of instructions; retaining said number of clock cycles stored in said counter associated with said final completion delay cause; and subtracting said number of clock cycles counted for all other completion delay causes that are not said final completion delay cause.
- 2. The method of claim 1, wherein said retaining step and said subtracting step are cumulative for multiple groups of instructions.
- 3. The method of claim 1, wherein said final completion delay cause is a cache miss.
- 4. The method of claim 1, wherein said final completion delay cause is a data dependency, said data dependency resulting from a wait for a result from another instruction.
- 5. The method of claim 1, wherein said final completion delay cause is an execution delay, said execution delay resulting from a time required for an execution unit in said computer processor to execute an instruction having all required data.
- 6. The method of claim 1, wherein said group of instructions comprises only one instruction.
- 7. A system for analyzing instruction completion delays in a processor, said system comprising:
means for identifying at least one potential completion delay cause for completing a group of instructions; means for counting, for each said at least one potential completion delay cause, a number of clock cycles spent during completion of said group of instructions; and means for, upon completion of said group of instructions:
identifying a final completion delay cause, from said at least one potential completion delay cause, that delayed execution of a final instruction in said group of instructions; retaining said number of clock cycles stored in said counter associated with said final completion delay cause; and subtracting said number of clock cycles counted for all other completion delay causes that are not said final completion delay cause.
- 8. The system of claim 7, wherein said means for retaining and subtracting are used to cumulatively retain and store clock cycles associated with final completion delay causes for multiple groups of instructions.
- 9. The system of claim 7, wherein said final completion delay cause is a cache miss.
- 10. The system of claim 7, wherein said final completion delay cause is a data dependency, said data dependency resulting from a wait for a result from another instruction.
- 11. The system of claim 7, wherein said final completion delay cause is an execution delay, said execution delay resulting from a time required for an execution unit in said computer processor to execute an instruction having all required data.
- 12. The system of claim 7, wherein said group of instructions comprises only one instruction.
- 13. A computer usable medium for analyzing instruction completion delays in a processor, said computer usable medium comprising:
computer program code for identifying at least one potential completion delay cause for completing a group of instructions; computer program code for assigning a counter for each said at least one potential completion delay cause; computer program code for counting in every said counter a number of clock cycles required to complete execution of said group of instructions; and computer program code for, upon completion of said group of instructions:
identifying a final completion delay cause, from said at least one potential completion delay cause, that delayed execution of a final instruction in said group of instructions; retaining said number of clock cycles stored in said counter associated with said final completion delay cause; and subtracting said number of clock cycles counted for all other completion delay causes that are not said final completion delay cause.
- 14. The computer usable medium of claim 13, wherein said retaining and said subtracting are cumulative for multiple groups of instructions.
- 15. The computer usable medium of claim 13, wherein said final completion delay cause is a cache miss.
- 16. The computer usable medium of claim 13, wherein said final completion delay cause is a data dependency, said data dependency resulting from a wait for a result from another instruction.
- 17. The computer usable medium of claim 13, wherein said final completion delay cause is an execution delay, said execution delay resulting from a time required for an execution unit in said computer processor to execute an instruction having all required data.
- 18. The computer usable medium of claim 13, wherein said group of instructions comprises only one instruction.
- 19. A method for analyzing instruction completion delays in a processor, said method comprising:
identifying a completion delay cause for instructions in a group of instructions being processed by a processor; counting a number of clock cycles during which each said completion delay cause delays completion of said group of instructions; and storing said number of clock cycles for each said completion delay cause.
- 20. The method of claim 19, wherein said storing of said number of clock cycles spent for each of said potential completion delay causes is cumulative for multiple groups of instructions.
- 21. A system for analyzing instruction completion delays in a processor, said system comprising:
a status indicator that identifies at least one potential completion delay cause for completing a current group of instructions; at least one counter, each said counter associated with a specific said status indicator and each said counter counting a number of clock cycles required to complete execution of said current group of instructions; and a reset mechanism, wherein upon completion of said group of instructions, said reset mechanism resets, to a value representing a number of clock cycles counted before initiating execution of said current group of instructions, each said counter that is not associated with a status indicator that identifies a final completion delay cause that delayed execution of a final instruction of said current group of instructions.
- 22. The system of claim 21, wherein a counter, which is associated with said status indicator that identifies said final completion delay cause, cumulatively retains and stores clock cycles associated with final completion delay causes for multiple groups of instructions.
- 23. The system of claim 21, wherein said final completion delay cause is a cache miss.
- 24. The system of claim 21, wherein said final completion delay cause is a data dependency, said data dependency resulting from a wait for a result from another instruction.
- 25. The system of claim 21, wherein said final completion delay cause is an execution delay, said execution delay resulting from a time required for an execution unit in said computer processor to execute an instruction having all required data.
- 26. The system of claim 21, wherein said group of instructions comprises only one instruction.
RELATED APPLICATION
[0001] The present invention is related to the subject matter of the following commonly assigned, copending U.S. patent application Ser. No. 09/______ (Attorney Docket No. AUS920020225US1) entitled “SPECULATIVE COUNTING OF PERFORMANCE EVENTS WITH REWIND COUNTER” and filed ______, 2002. The content of the above-referenced application is incorporated herein by reference.