The present invention relates to an image recognition apparatus for detecting an object from an image with the use of a neural network.
Various methods for safely driving a vehicle such as an automobile have been devised. For example, there has been devised a method in which a camera for photographing the direction of travel of a vehicle is placed and a function of detecting a pedestrian or the like based on an image photographed by the camera is provided to a car navigation system. The car navigation system informs a driver of the presence of a pedestrian when the pedestrian is detected. The driver can drive a vehicle while keeping aware of the movement of a pedestrian.
As a method for detecting a pedestrian from an image photographed by a camera, we have a method using a neural network. A neural network is an information processing system which is created by using a cranial nervous system of a human being as a model, and is employed in not only detection of a pedestrian but also character recognition and the like.
Patent Document 1 describes an image recognition apparatus for detecting a pedestrian from an image photographed by a camera mounted on a vehicle. The image recognition apparatus according to Patent Document 1 detects a candidate object which is a candidate for identification, from an input image with the use of a remarkability calculation method. By applying a neural network to a candidate object, whether or not the candidate object is a pedestrian is determined.
However, the quantity of calculation of a neural network (neuro calculation) is enormous. In a case where a pedestrian is detected from an image photographed by a camera with the use of neuro calculation, image recognition processing must be carried out in real time. Accordingly, there is a need to use a large-size hardware for neuro calculation. Also, in a case where neuro calculation is performed by software processing, a processor having a high clock frequency is indispensable, which causes a problem of increased power consumption.
An image recognition apparatus according to the present invention is for determining whether or not an object which is to be detected is included in a frame, and includes: a pre-processing unit configured to perform neuro calculation on a pixel value of the first calculation image data and generating first calculation image data of a predetermined size smaller than a first area in the frame, from an image in the first area; a neuro calculation unit configured to calculate a neuro calculation value which indicates whether or not the object which is to be detected is included in the first area, by; and a post-processing unit configured to generate result data which indicates whether or not the object which is to be detected is included in the frame by using the neuro calculation value.
By using the first calculation image data of a size smaller than the first area image for neuro calculation, it is possible to reduce a quantity of calculation in neuro calculation. This allows neuro calculation to be performed in real time, and reduces the size of a hardware.
The image recognition apparatus according to the present invention further includes: a first block buffer in which the first calculation image data is stored; and a second block buffer in which second calculation image data generated from an image in a second area different from the first area in the frame by the pre-processing unit, is stored, wherein the pre-processing unit generates the second calculation image data and stores the second calculation image data in the second block buffer when the neuro calculation unit performs neuro calculation by using the first calculation image data stored in the first block buffer, and the pre-processing unit generates the first calculation image data and stores the first calculation image data in the first block buffer when the neuro calculation unit performs neuro calculation by using the second calculation image data stored in the block buffer.
Generation of calculation image data on which neuro calculation is to be performed and neuro calculation can be achieved in parallel, so that image recognition processing can be carried out in real time.
Also, in the image recognition apparatus according to the present invention, the neuro calculation unit is implemented by a configurable processor.
A processor having a lower clock frequency than that in a case where the neuro calculation process is carried out by software processing can be employed. This reduces power consumption in performing neuro calculation.
An object of the present invention is to provide techniques which lead to size reduction of a hardware and allow image recognition processing using neuro calculation to be carried out in real time.
Also, another object of the present invention is to provide techniques which allow for reduction in power consumption in carrying out image recognition processing using neuro calculation.
These and other objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.
Below, a preferred embodiment of the present invention will be described with reference to accompanying drawings.
<1. Overall Structure>
As shown in
The input interface 1 receives a frame 30F of a moving-image data provided by the vehicle-mounted camera. The input interface 1 extracts a brightness component from the frame 30F, and outputs a brightness frame 31F to the pre-processing unit 2.
The pre-processing unit 2 partitions the brightness frame 31F into blocks each of which has a predetermined size, to generate block data 33. The block data 33 is stored in either the block buffer 3A or 3B.
The pre-processing unit 2 includes a frame parameter detector 21, a frame storage unit 22, a block parameter detector 23, and a block generation unit 24. The frame parameter detector 21 calculates statistical values 31 of pixel values of the brightness frame 31F. The statistical values 31 include the maximum value, a standard deviation, and the like of pixel values. In the frame storage unit 22, the brightness frame 31F received from the frame parameter detector 21 is stored in either a frame buffer 22A or 22B. The block parameter detector 23 identifies a block in which a pedestrian is to be detected, in the brightness frame 31F, and calculates statistical values 32 of pixel values of the block. The block generation unit 24 receives the brightness frame 31F from either the frame buffer 22A or 22B. The block generation unit 24 generates the block data 33 of the block identified by the block parameter detector 23, from the received brightness frame 31F.
The neuro calculation unit 4 receives the block data 33 from either the block buffer 3A or 3B, and performs neuro calculation on the block data 33. As a result of neuro calculation, an output synapse 34 is output from the neuro calculation unit 4. In the coefficient table 5, weighting coefficients used for neuro calculation are stored.
The post-processing unit 6 generates result data 35 by using the frame 30F and the output synapse 34. In the photographed-data storage unit 7, the frame 30F provided by the vehicle-mounted camera is stored without any modification thereto.
<2. Outline of Image Recognition Processing>
The input interface 1 receives the frame 30F (step S1), and extracts the brightness frame 31F. The input interface 1 stores the frame 30F into the photographed-data storage unit 7. The pre-processing unit 2 generates the block data 33 used for neuro calculation, from the brightness frame 31F (step S2). The neuro calculation unit 4 performs neuro calculation on each pixel of the block data 33 (step S3). The post-processing unit 6 determines whether or not a pedestrian is detected based on a value of the output synapse 34. When it is determined that a pedestrian is detected, the post-processing unit 6 generates the result data 35 which is composed by superimposing a block where the pedestrian is detected on the frame 30F (step S4).
Hereinbelow, generation of the block data 33 will be outlined.
In
The detection block BL_B is larger in size than the detection block BL_A. The detection block BL_B is set to be larger in order to detect a pedestrian present in the neighborhood of a location of photographing. On other hand, the detection block BLA is set to be smaller in order to detect a pedestrian present at a good distance. To use detection blocks having various sizes allows pedestrians in various places to be detected.
Reasons why the size of the block data 33 is constant will be given. If the number of pixels in the block data 33 varies according to the size of a detection block, the neuro calculation unit 4 must change particulars of a process for neuro calculation according to the number of pixels in the block data. Unlike this, by keeping the size of the block data 33 constant irrespective of the size of the detection block, it is possible to simplify a process for neuro calculation. Also, by reducing the size of the block data 33 to a size smaller than that of the detection block BL_A or BL_B, a quantity of calculation in the neuro calculation process (step S3) is reduced.
Refer back to
<3. Operations of the Pre-Processing Unit (Step S2)>
Hereinbelow, operations of the pre-processing unit 2 which carries out the step S2 (refer to
<3.1. Processes Carried Out by the Frame Parameter Detector 21>
Now, operations of the frame parameter detector 21 will be described with reference to
The frame parameter detector 21 calculates the statistical values 31 of the brightness frame 31Fa (step S201). The maximum value, the minimum value, a sum, a variance, and a standard deviation of pixels included in the brightness frame 31Fa are calculated as the statistical values 31. The frame parameter detector 21 designates the frame buffer 22A as a storage location of the brightness frame 31Fa (step S202). The brightness frame 31F is stored in the frame buffer 22A (step S203).
The pre-processing unit 2 receives the brightness frame 31Fb as a next frame following after the brightness frame 31Fa. The statistical values 31 of the brightness frame 31Fb are calculated (step S201). The frame buffer 22B is designated as a storage location of the brightness frame 31Fb (step S202). Thereafter, until all of the brightness frames 31F are completely input (“No”, in step S204), processes from the step S201 to the step S204 are repeated. As shown in
As shown in
At a time T12, not only a process of generating the block data 33 from the brightness frame 31Fa is completed, but also a process of storing the brightness frame 31Fb in the frame buffer 22B is finished. The pre-processing unit 2 can start to generate the block data 33 from the brightness frame 31Fb at a time T12. By employing a double-buffered structure in the frame storage unit 22 in which the brightness frames 31F are stored, it is possible to carry out a process of the frame parameter detector 21 and a process of generating the block data 33 in parallel with each other. Therefore, image recognition processing can be achieved efficiently.
<3.2. Processes of the Block Parameter Detector 23 and the Block Generation Unit 24>
(Determination of Block)
The pre-processing unit 2 determines which frame buffer should be a target of read-out (step S251). In a case where the block data 33 is generated from the brightness frame 31Fa, the frame buffer 22A is a target of read-out. The pre-processing unit 2 determines one detection block in which a pedestrian is to be detected, in the brightness frame 31F by using a preset block parameter table 221 (refer to
As shown in
“BL_SIZE_X” and “BL_SIZE_Y” are parameters which determine sizes along an X direction and a Y direction of a detection block. “BL_START_X” and “BL_START_Y” are parameters which indicate coordinates of a detection block BL1 which is firstly determined in the area 32F, and correspond to coordinates of an upper-left vertex of the detection block BL1.
“BL_OFS_X” and “BL_OFS_Y” are offset values of an X coordinate and a Y coordinate of a detection block, and are used for calculating an upper-left vertex of a new detection block. For example, an X coordinate of an upper-left vertex of a detection block BL2 is a value obtained by adding “BL_OFS_X” to “BL_START_X”. In calculating coordinates of an upper-left vertex of a detection block BL11 in the second stage, a value obtained by adding “BL_OFS_Y” to “BL_START_Y” is a Y coordinate of an upper-left vertex of the detection block BL11.
“BL_RPT_X” and “BL_RPT_Y” are parameters which determine the number of times of cutting out a detection block from the area 32F. For example, when “BL_RPT_X” is set at 10, the number of detection blocks which are cut out along an X axis is 10. When “BL_RPT_Y” is set at five, the number of detection blocks which are cut out along a Y axis is five.
The sequence of processes of determining a detection block will be described. The pre-processing unit 2 determines a detection block along an X axis the same number of times (10 times) as is set by “BL_RPTX”, after determining the detection block BL1. More specifically, the detection blocks BL1, BL2, BL10 are sequentially determined in the first stage. Next, the pre-processing unit 2 sequentially determines detection blocks (BL11 to BL20) in the second stage. When “BL_RPT_Y” is set at five, the pre-processing unit 2 repeats the above-described processes until the detection blocks in the fifth stage are determined. As a result, 50 detection blocks are designated as areas in each of which a pedestrian is to be detected.
The block parameter table 221 is prepared in accordance with each of the sizes of the detection blocks BL_A and BL_B. The block parameter table 221 shown in
Refer back to
(Normalization of Brightness Frame)
The block generation unit 24 normalizes the brightness frame 31F using the statistical values 31 of the brightness frame 31F (step S254). Normalization of the brightness frame 31F is a process of changing each of pixel values (brightness values) of the brightness frame 31F so as to agree with a preset typical brightness distribution. The neuro calculation process (step S3, refer to
For example, a brightness value of the brightness frame 31F obtained by photographing at night time is low as a whole. If neuro calculation is performed without normalizing such the brightness frame 31F, it is probable that no pedestrian is detected. However, by normalizing the brightness frame 31F, it is possible to prevent reduction of accuracy in detecting a pedestrian.
Subsequently, the block generation unit 24 cuts out image data of the detection block BL1 from the normalized brightness frame 31F, and further normalizes the image data of the detection block BL1 with the use of the statistical values 32 (step S255). Even though the brightness frame 31F is normalized, still there is variation in a spatial distribution of brightness. If an area covered by the detection block BL1 is dark in the normalized brightness frame 31F, it is probable that accuracy in neuro calculation for the detection block BL1 is reduced. Thus, also image data of the detection block BL1 is normalized in the same manner as the brightness frame 31F.
(Sobel Filter Process)
Hereinafter, image data of the normalized detection block BL1 will be referred to as “block image data”. The block generation unit 24 carries out Sobel filter process on the block image data (step S256). Sobel filter process is a process of enhancing an edge of an object in an image.
Now, Sobel filter process will be described in detail. First, matrixes S0, S1, S2, and S3 each of which is a three-by-three matrix are defined as shown in <Formula 1>.
When block image data is expressed by a matrix P, the matrix P is as shown in <Formula 2>. In <Formula 2>, “M” indicates a coordinate in a horizontal direction (X-axis direction). “N” indicates a coordinate in a vertical direction (Y-axis direction).
A pixel value of block image data after Sobel filter process will be denoted by “SBL(m, n)”. Note that “(m, n)” are coordinates in an X-axis direction and a Y-axis direction. “SBL(m, n)” is calculated by <Formula 3>.
<Formula 3>
SBL(m,n)=Coring(|S0*P|)+Coring(|S1*P|)+Coring(|S2*P|) Coring(|S3*P|) <Formula 3>
A coring function in <Formula 3> is expressed by the following <Formula 4>.
Also, an operator “*” in <Formula 3> indicates convolution. A formula of convolution is <Formula 5> as follows. A matrix S used in <Formula 5> is shown in <Formula 6>.
In <Formula 5>, “out(m, n)” indicates a pixel value of block image data after convolution. Also, “p(m-k, n-r)” indicates a pixel value of block image data before convolution. The matrix S is any one of the matrixes S0, S1, S2, and S3 shown in <Formula 1>, and “s(k, r)” is each element of the matrix S.
(Gaussian Filter Process)
Refer back to
<Formula 7> shows a point spread function W used in Gaussian filter process. A point spread function W is a five-by-five matrix.
A pixel of block image data after Gaussian filter process will be denoted by “g(m, n)”. A matrix of pixels of block image data after Sobel filter process will be denoted by “P1”. The matrix P1 has the same composition as that of <Formula 2>. Note that “g(m, n)” is obtained by performing convolution of the point spread function W and the matrix P1 as shown in <Formula 8>. As a result of Gaussian filter process, noises in block image data can be reduced.
(Smoothing Process)
The pre-processing unit 2 performs a smoothing process (step S258) on block image data after Gaussian filter process. A matrix L used in a smoothing process is shown in <Formula 9>. The matrix L is a three-by-three matrix.
A pixel of block image data after a smoothing process will be denoted by “low(m, n)”. A matrix of pixels of block image data after Gaussian filter process will be denoted by “P2”. The matrix P2 has the same composition as that of <Formula 2>. As shown in <Formula 10>, “low(m, n)” is obtained by performing convolution of the matrix L and the matrix P2.
Next, the block generation unit 24 changes a size of block image data on which a smoothing process has been performed, into a predetermined size (step S259). As a result of this, the block data 33 is generated. The size of the block data 33 is 16 pixels along an X-axis direction and 32 pixels along a Y-axis direction, for example (refer to
The block generation unit 24 stores the block data 33 into either the block buffer 3A or the block buffer 3B (step S260). The pre-processing unit 2 checks whether or not respective pieces of block data 33 which correspond to all of detection blocks are generated from the brightness frame 31F (step S261). If generation of the block data 33 is not finished (“No” in step S261), the pre-processing unit 2 turns back to the step S252, and repeats the processes from the step S252 to the step S261. If all pieces of block data 33 are generated (“Yes” in step S261), the pre-processing unit 2 terminates the flow chart in
<4. Writing and Read-Out Performed on Block Buffer>
The pre-processing unit 2 stores the generated block data 33 into either the block buffer 3A or 3B (refer to
The image recognition apparatus 100 is able to perform a process of generating the block data 33 (refer to
At a time T21, writing of the block data 33 of the detection block BL1 is completed. The neuro calculation unit 4 starts the neuro calculation process (step S3) on the block data 33 of the detection block BL1 at a time T21. In other words, the neuro calculation unit 4 reads out the block data 33 of the detection block BL1 from the block buffer 3A in a period from a time T21 to a time T22. The pre-processing unit 2 generates the block data 33 of the detection block BL2 (refer to
In a period from a time T22 to a time T23, the neuro calculation unit 4 reads out the block data 33 of the detection block BL2 from the block buffer 3B. The pre-processing unit 2 writes the block data 33 of the detection block BL3 (refer to
<5. Neuro Calculation Process (Step S3)>
Hereinbelow, the neuro calculation process (step S3) will be described in detail.
<5.1. Outline of Neuro Calculation>
The input layer 51 includes input synapses 41-1 through 41-H. The input synapses 41-1 through 41-H respectively correspond to the pixels of the block data 33. Hereinafter, the input synapses 41-1 through 41-H may be collectively referred to as “input synapses 41”, as occasion arises. The size of the block data 33 is 16×32 pixels, so that the number of the input synapses 41 is 512. The neuro calculation unit 4 carries out an intermediate synapse calculation process (step S300), to calculate synapses of the intermediate layer 52 (intermediate synapses) based on the input synapses 41.
The intermediate layer 52 includes intermediate synapses 42-1 through 42-J. In the preferred embodiment of the present invention, the number of the intermediate synapses is 256. However, the number of the intermediate synapses may be any other number that is equal to or smaller than the number of the input synapses 41.
The output synapse 34 is one piece of numeric data. The neuro calculation unit 4 carries out an output synapse calculation process (S350), to calculate the output synapse 34 based on the intermediate synapses 42-1 through 42-J.
Now, a method of calculating a synapse will be described. A method of calculating an intermediate synapse and a method of calculating an output synapse 34 are identical to each other. A formula for calculating a synapse is shown in <Formula 11>.
More details of <Formula 11> will be provided by using calculation of the intermediate synapse 42-1 as an example.
The input synapses 41-1 through 41-H correspond to “Si” in <Formula 11>. Weighting coefficients W11 through W1H which are respectively set in association with the input synapses 41 correspond to “Wi” in <Formula 11>. The weighting coefficients Wi are stored in the coefficient table 5. In the preferred embodiment of the present invention, since an object of detection is a pedestrian, the weighting coefficients Wi for a pedestrian are stored in the coefficient table 5. Additionally, by changing the weighting coefficients Wi stored in the coefficient table 5, not only a pedestrian but also various objects such as an automobile and a traffic sign can be detected.
In <Formula 11>, “bm” is an initial value of the intermediate synapse 42-1. A term of Σ operator in <Formula 11> corresponds to a total value 41T, which is a sum of results obtained by respectively multiplying the input synapses 41 by the weighting coefficients. By substituting a sum of the total value 41T and the initial value bm into a sigmoid function, the intermediate synapse 42-1 can be obtained. A sigmoid function is shown in <Formula 12>.
As described above, the number of performances of multiplications and additions is extremely large in the neuro calculation process. Then, in order to speed up the neuro calculation process (step S3), the neuro calculation unit 4 performs a plurality of calculations in parallel in the intermediate synapse calculation process (S300) and the output synapse calculation process (S350). Below, each of the above-cited processes will be described in detail.
<5.2. Intermediate Synapse Calculation Process (Step S300)>
The neuro calculation unit 4 selects a group of intermediate synapses (intermediate group) which are to be calculated, from 12 intermediate synapses (step S301). As shown in
The neuro calculation unit 4 selects a group of the input synapses 41 (input group) which are used for calculation of the intermediate synapses 42-1, 42-2, 42-3, and 42-4 (step S303). At first, the input synapses 41-1, 41-2, 41-3, and 41-4 are selected as an input group. Pixel values S1, S2, S3, and S4 respectively corresponding to the input synapses 41-1, 41-2, 41-3, and 41-4 are loaded into a memory (not shown) (step S304). The neuro calculation unit 4 selects the input synapse 41-1 from the input group, and loads the weighting coefficients W11, W12, W13, and W14 associated with the input synapse 41-1 into the memory (not shown) (step S305). The weighting coefficients W11, W12, W13, and W14 are coefficients which are set in association with the input synapse 41-1 in order to calculate the intermediate synapses 42-1, 42-2, 42-3, and 42-4. The weighting coefficients W11, W12, W13, and W14 are loaded from the coefficient table 5.
The neuro calculation unit 4 multiplies the pixel value S1 by each of the weighting coefficients W11, W12, W13, and W14, and adds respective results of the multiplications to the intermediate values M1, M2, M3, and M4, respectively (step S306). More specifically, a result of multiplication of the pixel value S1 by the weighting coefficient W11 is added to the intermediate value M1. A result of multiplication of the pixel value S1 by the weighting coefficient W12 is added to the intermediate value M2. Similarly, respective results of multiplications of the pixel value S1 by the weighting coefficients W13 and W14 are added to the intermediate values M3 and M4, respectively.
Subsequently, the neuro calculation unit 4 checks whether or not all of the input synapses included in the input group are used for calculation of the intermediate values M1, M2, M3, and M4 (step S307). Since the input synapses 41-2, 41-3, and 41-4 are not yet selected (“No” in step S307), the neuro calculation unit 4 turns back to the step S305.
As shown in
The neuro calculation unit 4 carries out the processes in the steps S305 and S306 also on the input synapse 41-3. Respective results of multiplications of the pixel value of the input synapse 41-3 by the weighting coefficients are added to the intermediate values M1, M2, M3, and M4, respectively.
The neuro calculation unit 4 carries out the processes in the steps S305 and S306 also on the input synapse 41-4. As shown in
Now, refer back to
The input synapses 41-5 through 41-12 are not selected as an input group (“No” in step S308), so that the neuro calculation unit 4 turns back to the step S303, and newly selects the input synapses 41-5, 41-6, 41-7, and 41-8 as an input group.
The neuro calculation unit 4 carries out the processes in the steps S305 and S306 on each of the input synapses 41-5, 41-6, 41-7, and 41-8.
When all of the input synapses 41 are selected as an input group (“Yes” in step S308), the neuro calculation unit 4 inputs the intermediate values M1, M2, M3, and M4 into a sigmoid function (step S309). Results of calculation of a sigmoid function are stored in the memory not shown, as the intermediate synapses 42-1, 42-2, 42-3, and 42-4 (step S310).
As described above, out of the intermediate synapses 42, the intermediate synapses 42-1, 42-2, 42-3, and 42-4 are firstly calculated. Respective results of multiplications of the input synapse by the weighting coefficients are added to a plurality of intermediate values in parallel, so that four intermediate synapses 42 can be calculated at the same time. Therefore, the intermediate synapse calculation process (step S300) can be carried out at high speed.
Refer back to
A box 45 indicated by broken lines represents a process carried out when the input synapses 41-1, 41-2, 41-3, and 41-4 are selected as an input group. A box 46 represents a process carried out when the input synapses 41-5, 41-6, 41-7, and 41-8 are selected as an input group. A box 47 represents a process carried out when the input synapses 41-9, 41-10, 41-11, and 41-12 are selected as an input group. A box 48 represents a process of calculating the intermediate synapses in one intermediate group. In
In
The MAC process will be described in detail. In the MAC process, calculation represented by a formula inside parentheses in <Formula 11> is performed. When the formula inside parentheses in <Formula 11> is defined as an operator “mac”, the operator “mac” can be expressed by the following recurrence formula, <Formula 13>.
In the MAC process, a multiplication of Si×Wi is performed in the first cycle. As described above, “Si” corresponds to the input synapses 41-1 through 41-H (refer to
In each of the boxes 45, 46, and 47, the MAC process and the process of loading the weighting coefficients (Lc process) are carried out in parallel. As a result, the process of calculating the intermediate synapses 42 can be carried out efficiently.
Also, the MAC process 45A in the box 45 and the Li process 46A and the Lc process 46B in the box 46 are carried out in parallel. In other words, when an input group is newly set, the Li process and the Lc process therefor are carried out in parallel with the MAC process carried out for the immediately preceding input group. As a result, the neuro calculation unit 4 can efficiently carry out the process of calculating the intermediate synapses 42.
Next, the number of cycles required in the intermediate synapse calculation process (step S300) will be described. As shown in the boxes 45, 46, and 47, the number of cycles required in the MAC process for each of the input groups is Q±1. Further, in order to calculate all of the intermediate synapses 42 included in an intermediate group, the processes in the boxes 45, 46, and 47 must be repeated N1/Q times. The box 48 represents the process of calculating the intermediate synapses included in the intermediate group. Thus, in order to calculate all of the intermediate synapses, the process in the box 48 must be repeated N2/Q times. As a result, the number of cycles C1 required to calculate all of the intermediate synapses is expressed by <Formula 14>.
In <Formula 14>, the term of “(Q+1)” represents a period in which the MAC process is carried out in the boxes 45, 46, and 47. Also, in the term of “6”, the first three cycles correspond to the Init process, the Li process, and the Lc process for the first input group (the box 45). The remaining three cycles correspond to two cycles for the SIG process carried out after the process for the last input group (the box 47) and one cycle for the Ss process.
Next, the Li process and the Lc process will be described in detail. In <Formula 14>, “Q” indicates the number of parallels in the intermediate synapse calculation process (S300), namely, the number of intermediate synapses which are to be calculated in parallel. In the case shown in
In a case where the number of bits of a pixel value S, of the input synapse 41 is set at “d”, d×4-bit data is read out from either the block buffer 3A or 3B by the Li process. In a case where the number of bits of the weighting coefficient Wi associated with the input synapse 41 is set at “e”, e×4-bit data is read out from the coefficient table 5 by the Lc process. For example, if the number of bits d of the pixel value S, of the input synapse 41 is eight, the Li process causes 32-bit data to be read out from either the block buffer 3A or 3B. If the number of bits e of the weighting coefficient Wi associated with the input synapse 41 is 16, the Lc process causes 64-bit data to be loaded into the memory from the coefficient table 5.
As described above, in the intermediate synapse calculation process (S300), each of the Li process and the Lc process requires one cycle for reading out data having bits in number corresponding to the number of parallels Q. As a result, it is possible to efficiently read out data necessary for the MAC process.
<5.3 Linear Approximation of Sigmoid Function>
As shown in <Formula 12>, a sigmoid function uses an exponential function. In the process of the step S309 (refer to
Additionally, a sigmoid function shown in
<5.4. Output Synapse Calculation Process (Step S350)>
The neuro calculation unit 4 sets four partial addition values 43-1, 43-2, 43-4, and 43-4 (step S351). The partial addition values 43-1, 43-2, 43-4, and 43-4 are temporary numerical values used for calculation of the output synapse 34, and each of them is initially set to 0. As shown in
Subsequently, the neuro calculation unit 4 selects four intermediate synapses 42-1, 42-2, 42-3, and 42-4 in accordance with the number of the partial addition values 43-1, 43-2, 43-4, and 43-4, and loads them into a memory (not shown) (step S353). Weighting coefficients Wm1, Wm2, Wm3, and Wm4 (refer to
After the step S354, respective results of multiplications of the intermediate synapses by the weighting coefficients are added to the partial addition values, respectively (step S355). As shown in
As described above, in each of the processes of the steps S353, S354, and S355, a plurality of processes in accordance with the number (4) of the partial addition values are carried out in parallel, so that the output synapse calculation process (step S350) can be speeded up.
The neuro calculation unit 4 checks whether or not all of the intermediate synapses 42 are selected (step S356). Since the intermediate synapses 42-5 through 42-12 are not yet selected (“No” in the step S356), the neuro calculation unit 4 turns back to the step S353, selects the intermediate synapses 42-5, 42-6, 42-7, and 42-8, and loads them into the memory.
The neuro calculation unit 4 carries out the processes of the steps S354 and S355 on the intermediate synapses 42-5, 42-6, 42-7, and 42-8. As shown in
Subsequently, the neuro calculation unit 4 selects the intermediate synapses 42-9, 42-10, 42-11, and 42-12 and loads them (“No” in step S356, step S353). The neuro calculation unit 4 carries out the processes of the steps S354 and S355 also on the intermediate synapses 42-9, 42-10, 42-11, and 42-12. As shown in
The neuro calculation unit 4 sums up the partial addition values 43-1, 43-2, 43-4, and 43-4 (step S357), to thereby calculate a total value 44 (refer to
The neuro calculation unit 4 performs calculation of a sigmoid function to which the total value 44 is input (step S358), to thereby calculate the output synapse 34. Particulars of the process of the step S358 are identical to those of the step S309 (refer to
“Init_O (Init_O process)” is a process of providing the initial value 34i to the partial addition value 43-1, and corresponds to the step S352. “Ls (Ls process)” is a process of loading the selected intermediate synapses, and corresponds to the step S353. “Lc (Lc process)” is a process of loading the weighting coefficients associated with the loaded intermediate synapses, and corresponds to the step S354.
“MAC (MAC process)” includes a process of multiplying the intermediate synapses 42 by the weighting coefficients and a process of adding the results of the multiplications to the partial intermediate values, and corresponds to the step S355. The MAC process is identical to the MAC process shown in
“SUM (SUM process)” is a process of summing up two partial addition values. The SUM process is repeated F/S times, so that the total value 44 is calculated (step S357). “S” is the number of parallels in an addition process carried out in the step S357. The number of cycles required for carrying out the SUM process one time is one. “Ss (Ss process)” is a process of storing the output synapse 34 in the memory of the neuro calculation unit 4, and is identical to the Ss process in
Referring to
Next, description will be made about the number of cycles required for the output synapse calculation process (S350). It is unnecessary to consider the number of cycles required for the Ls process and the Lc process which are carried out in parallel with the MAC process. The number of times the MAC process is carried out can be expressed by the number of intermediate synapses (N2=12)/the number of selections of intermediate synapses (F=4).
Besides, each of the Ls process and the Lc process is singly carried out one time. The SUM process (one cycle) is repeated F/S times. Each of the Init_O process (one cycle), the SIG process (two cycles), and the Ss process (one cycle) is singly carried out.
As a result, the number of cycles required for the output synapse calculation process (S350) can be expressed by <Formula 16>
In <Formula 16>, the term of “(N2/F)×2” indicates the number of cycles required for the MAC process. The term of “F/S” indicates the number of times the SUM process is carried out. The term of “6” is a total value of the respective numbers of cycles required for the Init process, the Ls process, the Lc process, the SIG process, and the Ss process each of which is singly carried out.
Next, the Ls process and the Ls process which are carried out in the output synapse calculation process (S350) will be described in detail. In <Formula 16>, “F” denotes the number of parallels in the output synapse calculation process (S350), and corresponds to the number of the partial addition values. In the example shown in
In a case where the intermediate synapse is f-bit data, f×4-bit data is loaded into a memory not shown by the Ls process. In a case where the number of bits of the weighting coefficient Wi associated with the intermediate synapse is “g”, g×4-bit data is read out from the coefficient table 5 by the Lc process. For example, if the intermediate synapse is 8-bit data, the Ls process causes 32-bit data to be loaded into the memory not shown. If the number of bits of the weighting coefficient Wi is 16, the Lc process causes 64-bit data to be loaded into the memory from the coefficient table 5.
As described above, in the output synapse calculation process (S350), by each of the Li process and the Lc process requires one cycle for reading data having bits in number corresponding to the number of parallels F. This allows data necessary for the MAC process to be efficiently read out.
As is made clear from the above description, the image recognition apparatus 100 calculates a plurality of intermediate synapses 42 in parallel in the intermediate synapse calculation process (S300). Also, the image recognition apparatus 100 carries out processes of respectively adding respective results of multiplications of the intermediate synapses 42 by the weighting coefficients to the partial addition values, in parallel, in the output synapse calculation process (S350). In this manner, by carrying out various processes in parallel, it is possible to calculate the output synapse 34 at high speed.
Additionally, it is preferable that the neuro calculation unit 4 is composed by using a configurable processor. In a case where a general-purpose CPU or digital signal processor (DSP) is employed, the neuro calculation unit 4 is implemented by software processing. However, in some cases, the number of bits of data that can be loaded into a CPU or a DSP is fixed, and/or the number of commands that can be executed in parallel is fixed. For this reason, in carrying out the image recognition processing of the preferred embodiment of the present invention in real time, power consumption is increased because a CPU or DSP with a high clock frequency must be used.
In a case where the neuro calculation unit 4 is composed by using a hardware circuit, the circuit configuration is complicated to cause a problem of an increased cost. In such a case, the number of commands, synapses, stages of perceptron, and so on, which can be processed in parallel, cannot be easily changed.
A configurable processor is a processor to which a command compliant with the image recognition processing of the preferred embodiment of the present invention can be added. For example, the structure of the configurable processor can be changed so as to allow the weighting coefficients Wi to be loaded in one cycle. Also, the structure of the configurable processor can be changed in accordance with the numbers of parallels in the intermediate synapse calculation process (S300) and the output synapse calculation process (S350). Even if a configurable processor having a lower clock frequency than a general-purpose CPU or DSP is used, the image recognition processing of the present embodiment of the present invention can be carried out in real time. Further, the particulars of neuro calculation can be changed more easily in this case than a case where the neuro calculation unit 4 is implemented by a hardware circuit. Therefore, the neuro calculation unit 4 which can handle image recognition processing for not only a pedestrian but also various objects can be easily implemented.
In the above preferred embodiment, description has been made about an example in which the block generation unit 24 normalizes the brightness frame 31F (step S254) before normalizing a detection block (step S255, refer to
In the above preferred embodiment, description has been made about an example in which the neuro calculation unit 4 carries out neuro calculation with three-layer perceptron (refer to
While the invention has been shown and described in detail, the foregoing description is in all aspects illustrative and not restrictive. It is therefore understood that numerous modifications and variations can be devised without departing from the scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2011-172181 | Aug 2011 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2012/068842 | 7/25/2012 | WO | 00 | 2/3/2014 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2013/021823 | 2/14/2013 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4941122 | Weideman | Jul 1990 | A |
5091864 | Baji | Feb 1992 | A |
5253330 | Ramacher | Oct 1993 | A |
5440671 | Shiratani | Aug 1995 | A |
5555512 | Imai | Sep 1996 | A |
5701398 | Glier | Dec 1997 | A |
5719955 | Mita | Feb 1998 | A |
5729623 | Omatu | Mar 1998 | A |
5872864 | Imade | Feb 1999 | A |
5884296 | Nakamura | Mar 1999 | A |
6026178 | Toda | Feb 2000 | A |
6463163 | Kresch | Oct 2002 | B1 |
6549646 | Yeh | Apr 2003 | B1 |
7088860 | Matsugu et al. | Aug 2006 | B2 |
7274819 | Matsugu | Sep 2007 | B2 |
7496546 | Hoya | Feb 2009 | B2 |
7512271 | Matsugu et al. | Mar 2009 | B2 |
7643702 | Brandt | Jan 2010 | B1 |
8358342 | Park | Jan 2013 | B2 |
8391306 | Ito et al. | Mar 2013 | B2 |
20030007674 | Tsujii | Jan 2003 | A1 |
20030194124 | Suzuki | Oct 2003 | A1 |
20040034611 | Kee | Feb 2004 | A1 |
20040161134 | Kawato | Aug 2004 | A1 |
20060197845 | Masaki | Sep 2006 | A1 |
20070206849 | Sakata | Sep 2007 | A1 |
20070208678 | Matsugu | Sep 2007 | A1 |
20080062275 | Miyazaki | Mar 2008 | A1 |
20090033745 | Yeredor | Feb 2009 | A1 |
20090222388 | Hua | Sep 2009 | A1 |
20100034420 | Xiong | Feb 2010 | A1 |
20100104266 | Yashiro | Apr 2010 | A1 |
20100119156 | Noguchi | May 2010 | A1 |
20100158398 | Noguchi | Jun 2010 | A1 |
20100214936 | Ito | Aug 2010 | A1 |
20110222759 | Yokono | Sep 2011 | A1 |
20120170808 | Ogata | Jul 2012 | A1 |
Number | Date | Country |
---|---|---|
5 20455 | Jan 1993 | JP |
06119454 | Apr 1994 | JP |
7 220087 | Aug 1995 | JP |
11 120158 | Apr 1999 | JP |
2003 85560 | Mar 2003 | JP |
2007 25902 | Feb 2007 | JP |
2008 21034 | Jan 2008 | JP |
2009 80693 | Apr 2009 | JP |
Entry |
---|
International Search Report Issued Oct. 9, 2012 in PCT/JP12/068842 Filed Jul. 25, 2012. |
Office Action issued Dec. 22, 2015 in Japanese Patent Application No. 2011-172181 (with English language translation). |
Number | Date | Country | |
---|---|---|---|
20140169631 A1 | Jun 2014 | US |