The present invention relates to a control method, an information processing device, and a storage medium.
A central processing unit (CPU) installed in most computers has a parallel processing function that simultaneously executes a plurality of programs. The parallel processing function enables faster program execution by scheduling so as to allow a plurality of programs executed simultaneously to use a plurality of instruction execution units built in the CPU. The CPU is sometimes called a processor, and the instruction execution unit in the CPU is sometimes called an arithmetic unit.
For example, in the hyper-threading technique implemented in Intel’s CPU, when two threads are executed simultaneously in one CPU, in a case where there is an instruction execution unit that is not used by one thread, this instruction execution unit is allocated to the other thread. This achieves parallel processing as if two CPUs were executing two threads in parallel even though one CPU is executing two threads.
In this manner, to execute a plurality of programs in parallel, efficiently allocating the instruction execution units built in the CPU to each program is an important technique in the parallel processing.
In relation to the parallel processing, a multithread execution processor capable of minimizing thread exchange overhead is known (see Patent Document 1, for example).
Patent Document 1: Japanese Laid-open Patent Publication No. 2019-160352.
According to an aspect of the embodiments, a control method for a computer to execute a process includes in response to a request to generate a certain processing result, specifying a second process that includes a second instruction different from a first instruction included in a first process that is being executed by an execution unit of an arithmetic processing device, from among a plurality of processes that each generate the certain processing result, based on a relationship between the first process and the plurality of processes; and controlling the execution unit to execute the second process.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
When a plurality of threads is executed in parallel within a CPU, waiting time sometimes occurs in the instruction execution unit built-in the CPU, and speed-up by parallel processing is not necessarily achieved.
Note that such a difficulty arises not only when a plurality of threads is executed in parallel within a CPU, but also when various processes are executed within various arithmetic processing devices.
In one aspect, an object of the present invention is to suppress the occurrence of an instruction waiting to be executed in a process executed by an arithmetic processing device.
According to one aspect, the occurrence of an instruction waiting to be executed may be suppressed in a process executed by an arithmetic processing device.
Hereinafter, embodiments will be described in detail with reference to the drawings.
In the parallel processing, the instruction execution units 111 to 114 are allocated to each thread such that the instruction execution units used between the threads 211 and 212 do not overlap. When the parallel processing ends, the CPU 101 integrates the processing results of the threads 211 and 212 in step 203.
In this manner, in a case where there is little overlap of the instruction execution units used by each thread when a plurality of threads is executed in parallel, the plurality of threads is enabled to simultaneously execute instructions, and parallel processing as if a plurality of CPUs was working is achieved.
However, in a case where there is a lot of overlap of the instruction execution units used by each thread and the number of instruction execution units built in the CPU is smaller than the number of threads, while a certain thread uses a specific instruction execution unit, other threads are sometimes put into a waiting state. In this case, the CPU stands by for the execution of instructions of other threads until the specific instruction execution unit is released.
In the parallel processing, the threads 311 and 312 both execute the instruction A only. In this case, the instruction execution unit 111 that executes the instruction A is regularly in a busy state, and while one thread is using the instruction execution unit 111, the other thread is put into a waiting state, causing waiting time to occur. When the parallel processing ends, the CPU 101 integrates the processing results of the threads 311 and 312 in step 303.
In this manner, when two threads repeatedly execute only the same instruction using one instruction execution unit, the processing time taken is doubled compared with a case where two threads are allowed to execute the same instruction simultaneously.
As an example of an application where such events occur, 1:N biometric authentication can be mentioned. In a biometric authentication system that performs the 1:N biometric authentication, a sensor reads biometric information such as the fingerprint, iris, and vein pattern of a person to be authenticated, and coded biometric feature information is generated from the read biometric information. By coding the biometric information, it becomes possible to perform a high-speed comparison (verification) process.
In the comparison process, the biometric feature information on the person to be authenticated is compared with the biometric feature information on many registrants registered in advance in the biometric authentication system, and similarity between the biometric feature information on the person to be authenticated and the biometric feature information on each registrant is calculated. Then, the similarity is compared with a predetermined threshold value, and when there is a registrant having similarity greater than the threshold value, it is determined that the person to be authenticated really is that registrant.
The biometric feature information on tens of thousands to millions of registrants is sometimes registered in the biometric authentication system. In this case, in order to compare the biometric feature information on many registrants with the biometric feature information on the person to be authenticated in a short time, it is effective to execute the comparison process in parallel with a plurality of threads.
A comparison algorithm for the biometric feature information is common to a plurality of threads executed in parallel, and the comparison process is repeated for the biometric feature information on many registrants. Accordingly, the plurality of threads will repeatedly execute the same instruction. For this reason, situations close to the parallel processing illustrated in
Even when the comparison process is executed by a plurality of threads, if there is no free instruction execution unit, the processing time will not be much enhanced from the case where the comparison process is executed by one thread, and speed-up by the parallel processing will not be achieved.
As illustrated in
According to the information processing device 601 in
The memory 812 is, for example, a semiconductor memory such as a read only memory (ROM), a random access memory (RAM), or a flash memory and stores programs and data used for processing. The CPU 811 (processor) corresponds to the arithmetic processing device 611 in
For example, the input device 813 is a keyboard, a pointing device, or the like and is used for inputting directions or information from an operator or a user. For example, the output device 814 is a display device, a printer, a speaker, or the like and is used for inquiring of the operator or the user or outputting a processing result. When the information processing device 801 performs the 1:N biometric authentication, the processing result may be an authentication result for the person to be authenticated.
For example, the auxiliary storage device 815 is a magnetic disk device, an optical disk device, a magneto-optical disk device, a tape device, or the like. The auxiliary storage device 815 may be a flash memory or a hard disk drive. The information processing device 801 may store programs and data in the auxiliary storage device 815 and load the stored programs and data into the memory 812 to use.
The medium driving device 816 drives a portable recording medium 802 and accesses the contents recorded in the portable recording medium 802. The portable recording medium 802 is a memory device, a flexible disk, an optical disk, a magneto-optical disk, or the like. The portable recording medium 802 may be a compact disk read only memory (CD-ROM), a digital versatile disk (DVD), a universal serial bus (USB) memory, or the like. The operator or the user may store programs and data in this portable recording medium 802 and load the stored programs and data into the memory 812 to use.
As described above, a computer-readable recording medium that stores programs and data to be used for processing is a physical (non-transitory) recording medium such as the memory 812, the auxiliary storage device 815, or the portable recording medium 802.
The network connection device 817 is a communication interface circuit that is connected to a communication network such as a local area network (LAN) or a wide area network (WAN) and performs data conversion associated with communication. The information processing device 801 may receive programs and data from an external device via the network connection device 817 and load the received programs and data into the memory 812 to use.
The execution unit 901 includes instruction execution units 911 to 913. The instruction execution unit 911 executes an instruction “popcnt”, the instruction execution unit 912 executes a numerical operation instruction, and the instruction execution unit 913 executes a bit operation instruction. The execution unit 901 and the instruction execution units 911 to 913 are hardware circuits.
In the information processing device 801 in
In the 1:N biometric authentication, a request is made to generate comparison results for the biometric feature information in regards to the biometric feature information on each of a plurality of registrants. When a request to generate comparison results for the biometric feature information is made, the CPU 811 selects one program from among the plurality of programs, based on a relationship between the program being executed by the execution unit 901 and the plurality of programs. When a program different from the program being executed is selected, the selected program contains an instruction different from the instruction contained in the program being executed and uses an instruction execution unit different from the instruction execution unit used by the program being executed.
Next, the CPU 811 controls the execution unit 901 to execute the selected program. This suppresses overlap of the instruction execution units used by each program and avoids the occurrence of waiting time in the instruction execution units. Therefore, the occurrence of an instruction waiting to be executed may be suppressed, and the comparison process for the biometric feature information on many registrants may be speeded up.
The term (*piTmp1++)^(*piTmp2++) contained in the programs P1 and P2 is a part that calculates the exclusive OR of the biometric feature information on the person to be authenticated and the biometric feature information on the registrant and is common to the two programs. However, the two programs differ from each other in the part that counts the number of logic “1” bits contained in the calculated exclusive OR bit string.
In the program P1, the number of logic “1” bits is counted by executing only one instruction “popcnt”. Meanwhile, in the program P2, the same process as the instruction “popcnt” is achieved by complex operations combining numerical operations (addition and subtraction) and bit operations (logical product and bit shift).
The program P1 uses the instruction execution units 911 to 913 in
Since the (*piTmp1++)^(*piTmp2++) part is common to the two programs, when the two programs are executed in parallel, there is a possibility that overlap of the instruction execution unit 912 or the instruction execution unit 913 occurs in terms of the processing of this part. However, since the processing time of this part occupies a small proportion of the entire processing time of the comparison process, the probability of overlap occurring at the same time point is low, and even if overlap occurs, the delay due to waiting time is small.
Note that the number of programs that perform the comparison process for the biometric feature information is not limited to two, and three or more programs that generate the same comparison result may be prepared. Also in this case, the combination of instructions contained in each program is different from the combinations of instructions contained in other programs, and each program uses a combination of instruction execution units different from the combinations of the other programs.
The memory 812 stores a program selection candidate list. The program selection candidate list records average processing time of each of the plurality of programs and the number of threads executing those programs.
The average processing time of each program is obtained in advance and recorded in the program selection candidate list. The average processing time may be the time calculated arithmetically from the processing time of the instruction execution unit used by the program, or may be the time measured by experiment. In the initial state, the number of threads for all the programs is set to zero.
In the parallel processing in
When a plurality of programs has been selected (step 1302, YES), the CPU 811 selects the program with the shortest average processing time from among the selected programs (step 1303). When a plurality of programs has the same average processing time, the CPU 811 randomly selects any program from among these programs. This enables to select one of the programs even when there is a plurality of programs with the smallest number of threads.
Next, the CPU 811 supplies the selected program to the p-th thread (step 1304) and increments the number of threads for the supplied program by one in the program selection candidate list (step 1305). When only one program has been selected (step 1302, NO), the CPU 811 performs the processes from step 1304 onwards.
When only two programs are registered in the program selection candidate list, the processes in steps 1302 and 1303 may be omitted. In this case, in step 1301, an unexecuted program different from the program already being executed is selected from among the two programs.
After supplying the program to the p-th thread, the CPU 811 increments p by one (step 1103) and compares p with M (step 1104). M represents the maximum value of the number of threads that can be executed simultaneously in the CPU 811. When p is smaller than M (step 1104, YES), the CPU 811 repeats the processes from step 1102 onwards. This causes the zeroth to M-1-th threads to be executed in parallel.
When p reaches M (step 1104, NO), the CPU 811 stands by until the end of execution of any thread (step 1105). Then, when the execution of a q-th (q = 0 to M - 1) thread ends (step 1106), the CPU 811 decrements the number of threads for the program that has been executed by the q-th thread by one in the program selection candidate list (step 1107).
Next, the CPU 811 checks whether or not the biometric feature information on all registrants has been processed (step 1108). When an unprocessed registrant remains (step 1108, NO), the CPU 811 supplies any program to the q-th thread in order to compare the biometric feature information on the person to be authenticated and the biometric feature information on the unprocessed registrant (step 1109). The execution unit 901 uses the instruction execution unit according to the combination of instructions contained in the supplied program to execute the supplied program.
The program supplying process in step 1109 is similar to the program supplying process in
When the biometric feature information on all registrants has been processed (step 1108, YES), the CPU 811 aggregates the comparison results for the biometric feature information on all registrants and sorts the registrants in descending order of similarity (step 1110).
Note that, when the program P2 is being executed in the thread 1501, the number of threads for the program P1 is zero, and the number of threads for the program P2 is one in the program selection candidate list. Therefore, the program P1, which has the smallest number of threads, is selected from among the programs P1 and P2 and supplied to the thread 1502.
According to the parallel processing in
In addition, since the program supplying process is performed by the CPU 811 executing the control program, new hardware for control does not have to be added, and the hardware amount of the CPU 811 does not increase.
In the parallel processing in
For example, when a program Q1 that performs a process different from the comparison process for the biometric feature information is being executed in a thread, any program Q2 that perform the comparison process for the biometric feature information is also allowed to be selected and supplied to another thread. In this case, the programs Q1 and Q2 are executed in parallel, the process achieved by the program Q1 corresponds to the first process, and the process achieved by the program Q2 corresponds to the second process.
Next, parallel processing for selecting a program using an instruction usage frequency table instead of the program selection candidate list will be described. In this case, the CPU 811 performs parallel processing similar to the parallel processing in
The memory 812 stores the instruction usage frequency table for each selection candidate program. The instruction usage frequency table records instructions contained in programs, instruction usage frequencies, and instruction processing time.
The processing time for the instruction "^" is "1", the processing time for the two instructions "++" is "2", the processing time for the instruction "+" is "1", and the processing time for the instruction "popcnt" is "10". Therefore, the total processing time of the program P1 is “14”.
The processing time for the instruction “^” is “1”, the processing time for the two instructions “++” is “2”, and the processing time for the five instructions “+” is “5”. The processing time for the five instructions “>>” is “5”, the processing time for the five instructions “&” is “5”, and the processing time for the instruction “-” is “1”. Therefore, the total processing time of the program P2 is “19”.
The instruction “^”, the instruction “++”, and the instruction “+” are overlapping instructions commonly contained in the programs P1 and P2.
TA represents the total sum of the processing time of overlapping instructions commonly contained in a program PX already being executed in any thread and a selection candidate program PY, among instructions contained in the program PY. TB represents the total processing time of the program PY. The overlap ratio R represents the ratio of TA to TB. The overlap ratio R is an example of a statistical value regarding instructions overlapping with instructions contained in the first process and indicates the probability of waiting time occurring due to overlap of instruction execution units used by each thread.
For example, when the programs P1 and P2 illustrated in
First, when the program PY is the program P1, since all the instructions overlap between the programs PX and PY, the overlap ratio R of the program P1 is calculated by the following formula.
Next, when the program PY is the program P2, since the instruction “^”, the instruction “++”, and the instruction “+” overlap between the programs PX and PY, the overlap ratio R of the program P2 is calculated by the following formula.
Meanwhile, when the program PX is the program P2 and the program PY is the program P1, the overlap ratio R of the program P1 is calculated by the following formula.
Next, when the program PX is the program P2 and the program PY is the program P2, the overlap ratio R of the program P2 is calculated by the following formula.
The CPU 811 may calculate the overlap ratio R of each program with the following formula.
NA represents the total sum of the number of overlapping instructions commonly contained in the program PX already being executed in any thread and the selection candidate program PY, among instructions contained in the program PY. NB represents the entire number of instructions contained in the program PY. The overlap ratio R represents the ratio of NA to NB.
Note that, when p = 0 holds, since none of the programs are being executed, the CPU 811 sets the overlap ratio R of each program to the same value.
Next, the CPU 811 selects the program with the lowest overlap ratio R from among the plurality of selection candidate programs (step 1702) and checks whether or not a plurality of programs has been selected (step 1703).
When a plurality of programs has been selected (step 1703, YES), the CPU 811 selects the program with the shortest total processing time from among the selected programs (step 1704). When a plurality of programs has the same total processing time, the CPU 811 randomly selects any program from among these programs. This enables to select one of the programs even when there is a plurality of programs with the lowest overlap ratio R.
Next, the CPU 811 supplies the selected program to the p-th thread (step 1705). When only one program has been selected (step 1703, NO), the CPU 811 performs the process in step 1705.
The program supplying process in step 1109 is similar to the program supplying process in
Note that, when the program P2 is being executed in the thread 1801, the program P1 has an overlap ratio R of 29%, and the program P2 has an overlap ratio R of 100%, as indicated by formulas (4) and (5). Therefore, the program P1, which has the lowest overlap ratio R, is selected from among the programs P1 and P2 and supplied to the thread 1802.
In step 1701, when a plurality of programs has already been executed, the CPU 811 may calculate the overlap ratio R using each program being executed as the program PX and obtain a statistical value of the overlap ratios R for each of the plurality of programs PX. As the statistical value of the overlap ratios R, an average value, a total sum, a median value, or the like can be used. In this case, in step 1702, the CPU 811 selects the program with the smallest statistical value of the overlap ratios R from among the plurality of selection candidate programs.
According to the parallel processing that selects a program using the instruction usage frequency table, among a plurality of programs that generate the same comparison result, the program with a smaller number of instructions overlapping with instructions of the program being executed is selected and executed. This suppresses overlap of the instruction execution units used by each thread and avoids the occurrence of waiting time in the instruction execution units. Accordingly, the comparison process for the biometric feature information on many registrants may be speeded up.
The configurations of the information processing device 601 in
In the information processing device 801 in
The configurations of the CPU 101 in
The flowcharts in
The parallel processing illustrated in
The program selection candidate lists illustrated in
Calculation formulas (1) to (6) are merely examples, and the information processing device 801 may calculate the overlap ratio R using another calculation formula.
While the disclosed embodiments and the advantages thereof have been described in detail, those skilled in the art will be able to make various modifications, additions, and omissions without departing from the scope of the present invention explicitly set forth in the claims.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation application of International Application PCT/JP2020/024186 filed on Jun. 19, 2020 and designated the U.S., the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2020/024186 | Jun 2020 | US |
Child | 17983153 | US |