Object detection device and object detection method based on neural network

CROSS-REFERENCE TO RELATED APPLICATION

This application claims the priority benefit of Taiwan patent application serial no. 109110751, filed on Mar. 30, 2020. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.

BACKGROUND
Technical Field

The disclosure relates to an object detection device and an object detection method, and in particular to an object detection device and an object detection method based on a neural network.

Description of Related Art

Detecting and identifying objects by applying a deep learning technology is one of mainstream technologies in an image identification field at present. In order to achieve favorable detection effects, a neural network framework with a large and deep size is usually adopted. The detection result obtained through the aforesaid neural network framework may be accurate; however, demands for computation and memory storage capacities are significantly increased, such that it is difficult to apply the neural network framework to an edge computing device with less computing power.

Accordingly, how to provide a neural network framework characterized by low complexity and high identification accuracy is one of the focuses of people skilled in the art.

SUMMARY

The disclosure provides an object detection device and an object detection method based on a neural network, which may improve a YOLO-V2 neural network to lower computational burden of the YOLO-V2 neural network.

The object detection device based on the neural network in an embodiment of the disclosure includes a processor, a storage medium, and a transceiver. The storage medium stores an improved YOLO-V2 neural network. The processor is coupled to the storage medium and the transceiver. The processor receives an input image through the transceiver and identifies an object in the input image according to the improved YOLO-V2 neural network. The improved YOLO-V2 neural network includes: a residual block, wherein a first input of the residual block is connected to a first convolution layer of the improved YOLO-V2 neural network, an output of the residual block is connected to a second convolution layer of the improved YOLO-V2 neural network, and the residual block is configured to transmit, to the second convolution layer, a summation result corresponding to the first convolution layer; and a third convolution layer including a first number of filters and a fourth convolution layer including a second number of filters, wherein the third convolution layer and the fourth convolution layer are generated by decomposing a convolution layer of an original YOLO-V2 neural network via the processor, the convolution layer includes a third number of filters, and the first number is less than the third number.

In an embodiment of the disclosure, the object detection device further includes a concatenation layer. A second input of the concatenation layer is connected to a pooling layer and a fifth convolution layer of the improved YOLO-V2 neural network.

In an embodiment of the disclosure, the processor adjusts the second number to be less than half the second number.

In an embodiment of the disclosure, the first convolution layer includes an activation function. The first input of the residual block is connected to the activation function of the first convolution layer.

In an embodiment of the disclosure, the activation function is a leaky rectified linear unit.

In an embodiment of the disclosure, the first input of the residual block is further connected to a pooling layer of the improved YOLO-V2 neural network. The residual block is configured to transmit, to the second convolution layer, a summation result of the first convolution layer and the pooling layer.

In an embodiment of the disclosure, the first input of the residual block is further connected to a fifth convolution layer of the improved YOLO-V2 neural network. The residual block is configured to transmit, to the second convolution layer, a summation result of the first convolution layer and the fifth convolution layer.

In an embodiment of the disclosure, the improved YOLO-V2 neural network does not include a reorganization layer.

The object detection method based on the neural network in an embodiment of the disclosure includes: receiving an input image and identifying an object in the input image according to the improved YOLO-V2 neural network. The improved YOLO-V2 neural network includes a residual block, a third convolution layer including a first number of filters, and a fourth convolution layer including a second number of filters. A first input of the residual block is connected to a first convolution layer of the improved YOLO-V2 neural network. An output of the residual block is connected to a second convolution layer of the improved YOLO-V2 neural network. The residual block is configured to transmit, to the second convolution layer, a summation result corresponding to the first convolution layer. The third convolution layer and the fourth convolution layer are generated by decomposing a convolution layer of an original YOLO-V2 neural network, the convolution layer includes a third number of filters, and the first number is less than the third number.

Based on the above, the computational burden of the improved YOLO-V2 neural network provided by one or more embodiments of the disclosure, may be reduced, and identification accuracy of the neural network may be improved by adding the residual block, decomposing the convolution layers, reducing the number of the filters of the convolution layers, removing the reorganization layer, and so on.

Several exemplary embodiments accompanied with figures are described in detail below to further describe the disclosure in details.

BRIEF DESCRIPTION OF THE DRAWINGS

Reference will now be made in detail to the present preferred embodiments of the disclosure, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.

FIG. 1 is a schematic diagram of an object detection device based on a neural network according to an embodiment of the disclosure.

FIG. 2 is a flowchart of an object detection method based on a neural network according to an embodiment of the disclosure.

DESCRIPTION OF THE EMBODIMENTS

In order to make the content of the present disclosure more comprehensible, embodiments are described below as examples of implementation of the present disclosure. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts, components or steps.

In recent years, the field of object detection makes a great progress due to development of a deep learning technology. The deep learning technology utilized in the field of object detection may include a two-phase method and a one-phase method. Generally, the two-phase method may utilize a region proposal network (RPN) to find the position of an object from an image, and then judge a category of the object with a neural network used for classification. In another aspect, the one-phase method uses a single neural network framework to detect the position of an object and judge a category of the object. The two-phase method can provide good detection accuracy, but computing complexity is also relatively high. Relatively, the one-phase method has relatively low complexity, and requires relatively low computation. Based on these advantages, the one-phase method is preferred by users. The one-phase method includes, for example, a single shot multi Box detector (SSD) or YOLO or other methods. A framework of a traditional YOLO-V2 neural network is shown in Table 1. In Table 1, a field at a higher position represents a lower layer of structure of the YOLO-V2 neural network. For example, a convolution layer 1 represents the first layer (namely the bottommost layer) of structure of the YOLO-V2 neural network, and a convolution layer 22 represents the last layer (namely, the topmost layer) of structure of the YOLO-V2 neural network. In addition, adjacent fields are mutually connected. For example, an output of the convolution layer 1 is connected to an input of a pooling layer 1, and an output of the pooling layer 1 is connected to an input of a convolution layer 2.

TABLE 1

Number of

filters (or

convolution

Name
Type
kernels)
Size/stride
Bottom
Bottom

Convolution
Convolution
32
3 × 3

layer 1
kernel + BN + SC + leaky ReLU

Pooling layer
Maxpooling

2 × 2/2

1

Convolution
Convolution
64
3 × 3

layer 2
kernel + BN + SC + leaky ReLU

Pooling layer
Maxpooling

2 × 2/2

2

Convolution
Convolution
128
3 × 3

layer 3
kernel + BN + SC + leaky ReLU

Convolution
Convolution
64
1 × 1

layer 4
kernel + BN + SC + leaky ReLU

Convolution
Convolution
128
3 × 3

layer 5
kernel + BN + SC + leaky ReLU

Pooling layer
Maxpooling

2 × 2/2

5

Convolution
Convolution
256
3 × 3

layer 6
kernel + BN + SC + leaky ReLU

Convolution
Convolution
128
1 × 1

layer 7
kernel + BN + SC + leaky ReLU

Convolution
Convolution
256
3 × 3

layer 8
kernel + BN + SC + leaky ReLU

Pooling layer
Maxpooling

2 × 2/2

8

Convolution
Convolution
512
3 × 3

layer 9
kernel + BN + SC + leaky ReLU

Convolution
Convolution
256
1 × 1

layer 10
kernel + BN + SC + leaky ReLU

Convolution
Convolution
512
3 × 3

layer 11
kernel + BN + SC + leaky ReLU

Convolution
Convolution
256
1 × 1

layer 12
kernel + BN + SC + leaky ReLU

Convolution
Convolution
512
3 × 3

layer 13
kernel + BN + SC + leaky ReLU

Pooling layer
Maxpooling

2 × 2/2

13

Convolution
Convolution
1024
3 × 3

layer 14
kernel + BN + SC + leaky ReLU

Convolution
Convolution
512
1 × 1

layer 15
kernel + BN + SC + leaky ReLU

Convolution
Convolution
1024
3 × 3

layer 16
kernel + BN + SC + leaky ReLU

Convolution
Convolution
512
1 × 1

layer 17
kernel + BN + SC + leaky ReLU

Convolution
Convolution
1024
3 × 3

layer 18
kernel + BN + SC + leaky ReLU

Convolution
Convolution
1024
3 × 3

layer 19
kernel + BN + SC + leaky ReLU

Convolution
Convolution
1024
3 × 3

layer 20
kernel + BN + SC + leaky ReLU

Reorganization
Reorganization
2048

SC 13

layer 13

Concatenation
Concatenation

Reorganization
SC 20

layer

layer 13

Convolution
Convolution
1024
3 × 3

layer 21
kernel + BN + SC + leaky ReLU

Convolution
Convolution
425
1 × 1

layer 22
kernel + BN + SC + leaky ReLU

The convolution layer 1 may include a plurality of convolution kernels, batch normalization (BN), scaling (SC), and an activation function. The activation function is, for example, a leaky rectified linear unit (leaky ReLU). The convolution layer 1 may have 32 convolution kernels, wherein a size of each convolution kernel is, for example, 3×3.

The pooling layer 1 may be configured to perform maxpooling. A size of the pooling layer 1 is, for example, 2×2, and a stride of the pooling layer 1 is, for example, 2.

The convolution layer 2 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 2 may have 64 convolution kernels. A size of each convolution kernel of the convolution layer 2 is, for example, 3×3.

A pooling layer 2 may be configured to perform maxpooling. A size of the pooling layer 2 is, for example, 2×2, and a stride of the pooling layer 2 is, for example, 2.

A convolution layer 3 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 3 may have 128 convolution kernels. A size of each convolution kernel of the convolution layer 3 is, for example, 3×3.

A convolution layer 4 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 4 may have 64 convolution kernels. A size of each convolution kernel of the convolution layer 4 is, for example, 1×1.

A pooling layer 5 may be configured to perform maxpooling. A size of the pooling layer 5 is, for example, 2×2, and a stride of the pooling layer 5 is, for example, 2.

A convolution layer 6 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 6 may have 256 convolution kernels. A size of each convolution kernel of the convolution layer 6 is, for example, 3×3.

A convolution layer 7 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 7 may have 128 convolution kernels. A size of each convolution kernel of the convolution layer 7 is, for example, 1×1.

A convolution layer 8 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 8 may have 256 convolution kernels. A size of each convolution kernel of the convolution layer 8 is, for example, 3×3.

A pooling layer 8 may be configured to perform maxpooling. A size of the pooling layer 8 is, for example, 2×2, and a stride of the pooling layer 8 is, for example, 2.

A convolution layer 9 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 9 may have 512 convolution kernels. A size of each convolution kernel of the convolution layer 9 is, for example, 3×3.

A convolution layer 10 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 10 may have 256 convolution kernels. A size of each convolution kernel of the convolution layer 10 is, for example, 1×1.

A convolution layer 11 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 11 may have 512 convolution kernels. A size of each convolution kernel of the convolution layer 11 is, for example, 3×3.

A convolution layer 12 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 12 may have 256 convolution kernels. A size of each convolution kernel of the convolution layer 12 is, for example, 1×1.

A convolution layer 13 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 13 may have 512 convolution kernels. A size of each convolution kernel of the convolution layer 13 is, for example, 3×3.

A pooling layer 13 may be configured to perform maxpooling. A size of the pooling layer 13 is, for example, 2×2, and a stride of the pooling layer 13 is, for example, 2.

A convolution layer 14 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 14 may have 1024 convolution kernels. A size of each convolution kernel of the convolution layer 14 is, for example, 3×3.

A convolution layer 15 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 15 may have 512 convolution kernels. A size of each convolution kernel of the convolution layer 15 is, for example, 1×1.

A convolution layer 16 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 16 may have 1024 convolution kernels. A size of each convolution kernel of the convolution layer 16 is, for example, 3×3.

A convolution layer 17 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 17 may have 512 convolution kernels. A size of each convolution kernel of the convolution layer 17 is, for example, 1×1.

A convolution layer 18 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 18 may have 1024 convolution kernels. A size of each convolution kernel of the convolution layer 18 is, for example, 3×3.

A convolution layer 19 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 19 may have 1024 convolution kernels. A size of each convolution kernel of the convolution layer 19 is, for example, 3×3.

A convolution layer 20 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 20 may have 1024 convolution kernels. A size of each convolution kernel of the convolution layer 20 is, for example, 3×3.

A bottom of a reorganization layer 13 is connected to the SC 13 in the convolution layer 13. In other words, an input of the reorganization layer 13 is connected to an output of the SC 13. The reorganization layer 13 may be configured to reorganize the output of the SC 13.

A bottom of a concatenation layer is connected to the reorganization layer 13 and the SC 20 in the convolution layer 20. In other words, an input of the concatenation layer is connected to an output of the reorganization layer 13 and an output of the SC 20. The concatenation layer may be configured to concatenate the output of the reorganization layer 13 and the output of the SC 20.

A convolution layer 21 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 21 may have 1024 convolution kernels. A size of each convolution kernel of the convolution layer 21 is, for example, 3×3.

The convolution layer 22 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The convolution layer 22 may have 425 convolution kernels. A size of each convolution kernel of the convolution layer 22 is, for example, 1×1.

However, in use, the one-phase method still needs to be improved. For example, the framework of the YOLO neural network requires relatively large computation and is relatively low in detection accuracy. In order to lower the computation of the framework of the YOLO neural network and improve the accuracy of the framework of the YOLO neural network, the disclosure provides an object detection device 100 based on a neural network. The object detection device 100 may detect an object by utilizing an improved YOLO-V2 neural network.

FIG. 1 is a schematic diagram of the object detection device 100 based on the neural network according to an embodiment of the disclosure. The object detection device 100 may include a processor 110, a storage medium 120, and a transceiver 130.

The processor 110 is, for example, a central processing unit (CPU), or other programmable general-purpose or special-purpose elements, such as a micro control unit (MCU), a microprocessor, a digital signal processor (DSP), a programmable controller, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), an image signal processor (ISP), an image processing unit (IPU), an arithmetic logic unit (ALU), a complex programmable logic device (CPLD) and a field programmable gate array (FPGA), or other similar elements or a combination of the above elements. The processor 110 may be coupled to the storage medium 120 and the transceiver 130, and accesses and executes a plurality of modules and various application programs stored in the storage medium 120.

The storage medium 120 is, for example, fixed or mobile elements of any form, such as a random access memory (RAM), a read-only memory (ROM), a flash memory, a hard disk drive (HDD) and a solid state drive (SSD), or similar elements or a combination of the above elements, so that the storage medium 120 is configured to store the plurality of modules or the various application programs capable of being executed by the processor 110. In the present embodiment, the storage medium 120 may store the improved YOLO-V2 neural network configured to detect an object, wherein the improved YOLO-V2 neural network is obtained by improving the YOLO-V2 neural network shown in Table 1.

The transceiver 130 transmits and receives signals in a wireless or wired manner. The transceiver 130 may further execute, for example, low noise amplification, impedance matching, frequency mixing, up or down frequency conversion, filtering, amplification and similar operations. The processor 110 may receive an input image through the transceiver 130 and identify an object in the input image according to the improved YOLO-V2 neural network in the storage medium 120. A framework of the improved YOLO-V2 neural network is shown in Table 2. It should be noted that the improved YOLO-V2 neural network may not include a reorganization layer. Therefore, the improved YOLO-V2 neural network may save computation or memories required to be consumed by the reorganization layer.

TABLE 2

Number of

filters (or

convolution

Name
Type
kernels)
Size/stride
Bottom
Bottom

New
Convolution
32
3 × 3

convolution
kernel + BN + SC + leaky ReLU

layer 1

New pooling
Maxpooling

2 × 2/2

layer 1

New
Convolution
64
3 × 3

convolution
kernel + BN + SC + leaky ReLU

layer 2

New pooling
Maxpooling

2 × 2/2

layer 2

New
Convolution
128
3 × 3

convolution
kernel + BN + SC + leaky ReLU

layer 3

New
Convolution
64
1 × 1

convolution
kernel + BN + SC + leaky ReLU

layer 4

New
Convolution
128
3 × 3

convolution
kernel + BN + SC + leaky ReLU

layer 5

New pooling
Maxpooling

2 × 2/2

layer 5

New
Convolution
256
3 × 3

convolution
kernel + BN + SC + leaky ReLU

layer 6

New
Convolution
128
1 × 1

convolution
kernel + BN + SC + leaky ReLU

layer 7

Residual
Residual computation

New pooling
Leaky ReLU

block 7

layer 5
7

New
Convolution
256
3 × 3

convolution
kernel + BN + SC + leaky ReLU

layer 8

New pooling
Maxpooling

2 × 2/2

layer 8

New
Convolution
512
3 × 3

convolution
kernel + BN + SC + leaky ReLU

layer 9

New
Convolution
256
1 × 1

convolution
kernel + BN + SC + leaky ReLU

layer 10

New
Convolution
512
3 × 3

convolution
kernel + BN + SC + leaky ReLU

layer 11

New
Convolution
256
1 × 1

convolution
kernel + BN + SC + leaky ReLU

layer 12

Residual
Residual computation

Leaky ReLU
Leaky ReLU

block 12

10
12

New
Convolution
512
3 × 3

convolution
kernel + BN + SC + leaky ReLU

layer 13

New pooling
Maxpooling

2 × 2/2

layer 13

New
Convolution kernel
64
3 × 3

convolution

layer

14_lower

layer

New
Convolution
1024
1 × 1

convolution
kernel + BN + SC + leaky ReLU

layer

14_upper

layer

New
Convolution
512
1 × 1

convolution
kernel + BN + SC + leaky ReLU

layer 15

Residual
Residual computation

New pooling
Leaky ReLU

block 15

layer 13
15

New
Convolution kernel
64
3 × 3

convolution

layer

16_lower

layer

New
Convolution
1024
1 × 1

convolution
kernel + BN + SC + leaky ReLU

layer

16_upper

layer

New
Convolution
512
1 × 1

convolution
kernel + BN + SC + leaky ReLU

layer 17

New
Convolution kernel
64
3 × 3

convolution

layer

18_lower

layer

New
Convolution
1024
1 × 1

convolution
kernel + BN + SC + leaky ReLU

layer

18_upper

layer

New
Convolution kernel
64
3 × 3

convolution

layer

19_lower

layer

New
Convolution
1024
1 × 1

convolution
kernel + BN + SC + leaky ReLU

layer

19_upper

layer

New
Convolution kernel
64
3 × 3

convolution

layer

20_lower

layer

New
Convolution
1024
1 × 1

convolution
kernel + BN + SC + leaky ReLU

layer

20_upper

layer

New
Concatenation

New pooling
Leaky ReLU

concatenation

layer 13
20

layer

New
Convolution kernel
64
3 × 3

convolution

layer

21_lower

layer

New
Convolution
1024
1 × 1

convolution
kernel + BN + SC + leaky ReLU

layer

21_upper

layer

New
Convolution
40
1 × 1

convolution
kernel + BN + SC + leaky ReLU

layer 22

A new convolution layer 1 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 1 may have 32 convolution kernels. A size of each convolution kernel of the new convolution layer 1 is, for example, 3×3. In an embodiment, the new convolution layer 1 shown in Table 2 may be the same as the convolution layer 1 shown in Table 1.

A new pooling layer 1 may be configured to perform maxpooling. A size of the new pooling layer 1 is, for example, 2×2, and a stride of the new pooling layer 1 is, for example, 2. In an embodiment, the new pooling layer 1 shown in Table 2 may be the same as the pooling layer 1 shown in Table 1.

A new convolution layer 2 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 2 may have 64 convolution kernels. A size of each convolution kernel of the new convolution layer 2 is, for example, 3×3. In an embodiment, the new convolution layer 2 shown in Table 2 may be the same as the convolution layer 2 shown in Table 1.

A new pooling layer 2 may be configured to perform maxpooling. A size of the new pooling layer 2 is, for example, 2×2, and a stride of the new pooling layer 2 is, for example, 2. In an embodiment, the new pooling layer 2 shown in Table 2 may be the same as the pooling layer 2 shown in Table 1.

A new convolution layer 3 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 3 may have 128 convolution kernels. A size of each convolution kernel of the new convolution layer 3 is, for example, 3×3. In an embodiment, the new convolution layer 3 shown in Table 2 may be the same as the convolution layer 3 shown in Table 1.

A new convolution layer 4 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 4 may have 64 convolution kernels. A size of each convolution kernel of the new convolution layer 4 is, for example, 1×1. In an embodiment, the new convolution layer 4 shown in Table 2 may be the same as the convolution layer 4 shown in Table 1.

A new convolution layer 5 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 5 may have 128 convolution kernels. A size of each convolution kernel of the new convolution layer 5 is, for example, 3×3. In an embodiment, the new convolution layer 5 shown in Table 2 may be the same as the convolution layer 5 shown in Table 1.

A new pooling layer 5 may be configured to perform maxpooling. A size of the new pooling layer 5 is, for example, 2×2, and a stride of the new pooling layer 5 is, for example, 2. In an embodiment, the new pooling layer 5 shown in Table 2 may be the same as the pooling layer 5 shown in Table 1.

A new convolution layer 6 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 6 may have 256 convolution kernels. A size of each convolution kernel of the new convolution layer 6 is, for example, 3×3. In an embodiment, the new convolution layer 6 shown in Table 2 may be the same as the convolution layer 6 shown in Table 1.

A new convolution layer 7 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 7 may have 128 convolution kernels. A size of each convolution kernel of the new convolution layer 7 is, for example, 1×1. In an embodiment, the new convolution layer 7 shown in Table 2 may be the same as the convolution layer 7 shown in Table 1.

A bottom of a residual block 7 is connected to the new pooling layer 5 and the leaky ReLU 7 in the new convolution layer 7. In other words, an input of the residual block 7 is connected to an output of the new pooling layer 5 and an output of the leaky ReLU 7 of the new convolution layer 7. The residual block 7 may be configured to summate the output of the new pooling layer 5 and the output of the leaky ReLU 7 to generate a summation result. The residual block 7 may further transmit the summation result to a new convolution layer 8. By adding the residual block into the framework of the YOLO-V2 neural network, the accuracy of object detection can be effectively improved.

The new convolution layer 8 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 8 may have 256 convolution kernels. A size of each convolution kernel of the new convolution layer 8 is, for example, 3×3. In an embodiment, the new convolution layer 8 shown in Table 2 may be the same as the convolution layer 8 shown in Table 1.

A new pooling layer 8 may be configured to perform maxpooling. A size of the new pooling layer 8 is, for example, 2×2, and a stride of the new pooling layer 8 is, for example, 2. In an embodiment, the new pooling layer 8 shown in Table 2 may be the same as the pooling layer 8 shown in Table 1.

A new convolution layer 9 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 9 may have 512 convolution kernels. A size of each convolution kernel of the new convolution layer 9 is, for example, 3×3. In an embodiment, the new convolution layer 9 shown in Table 2 may be the same as the convolution layer 9 shown in Table 1.

A new convolution layer 10 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 10 may have 256 convolution kernels. A size of each convolution kernel of the new convolution layer 10 is, for example, 1×1. In an embodiment, the new convolution layer 10 shown in Table 2 may be the same as the convolution layer 10 shown in Table 1.

A new convolution layer 11 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 11 may have 512 convolution kernels. A size of each convolution kernel of the new convolution layer 11 is, for example, 3×3. In an embodiment, the new convolution layer 11 shown in Table 2 may be the same as the convolution layer 11 shown in Table 1.

A new convolution layer 12 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 12 may have 256 convolution kernels. A size of each convolution kernel of the new convolution layer 12 is, for example, 1×1. In an embodiment, the new convolution layer 12 shown in Table 2 may be the same as the convolution layer 12 shown in Table 1.

A bottom of a residual block 12 is connected to the leaky ReLU 10 in the new convolution layer 10 and the leaky ReLU 12 in the new convolution layer 12. In other words, an input of the residual block 12 is connected to an output of the leaky ReLU 10 and an output of the leaky ReLU 12. The residual block 12 may be configured to summate the output of the leaky ReLU 10 and the output of the leaky ReLU 12 to generate a summation result. The residual block 12 may further transmit the summation result to a new convolution layer 13.

The new convolution layer 13 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 13 may have 512 convolution kernels. A size of each convolution kernel of the new convolution layer 13 is, for example, 3×3. In an embodiment, the new convolution layer 13 shown in Table 2 may be the same as the convolution layer 13 shown in Table 1.

A new pooling layer 13 may be configured to perform maxpooling. A size of the new pooling layer 13 is, for example, 2×2, and a stride of the new pooling layer 13 is, for example, 2. In an embodiment, the new pooling layer 13 shown in Table 2 may be the same as the pooling layer 13 shown in Table 1.

A new convolution layer 14_lower layer and a new convolution layer 14_upper layer are generated by decomposing the convolution layer 14 shown in Table 1 by the processor 110. The number of convolution kernels of the new convolution layer 14_lower layer may be less than the number of the convolution kernels of the convolution layer 14. For example, the new convolution layer 14_lower layer may have 64 convolution kernels. A size of each convolution kernel of the new convolution layer 14_lower layer is, for example, 3×3. The new convolution layer 14_upper layer may have 2048 convolution kernels. A size of each convolution kernel of the new convolution layer 14_upper layer is, for example, 1×1. In an embodiment, the processor 110 may adjust the number of the convolution kernels of the new convolution layer 14_upper layer to be half (namely 1024) the current convolution kernel number (namely 2048) of the new convolution layer 14_upper layer or to be less than half the current convolution kernel number, so that the computation required to be consumed by the new convolution layer 14_upper layer is lowered.

An original YOLO-V2 neural network model needs to occupy memories of 260 million bytes approximately. It is a large burden for an edge computing device with less computing power. In order to shrink the model, the processor 110 may decompose a convolution layer (such as the 3×3 convolution layer 14) of the original YOLO-V2 neural network model into two new convolution layers, which are respectively a new convolution layer_lower layer (such as the 3×3 new convolution layer 14 lower layer) and a new convolution layer_upper layer (such as the 1×1 new convolution layer 14 upper layer). The number of convolution kernels of the new convolution layer_lower layer is far less than the number of convolution kernels of the convolution layer. Therefore, the number of parameters can be obviously reduced, and a computation speed is increased. The processor 110 may decompose the convolution layers (such as the convolution layers 14, 16, 18 and 19-21 shown in Table 1) located on upper layers of the original YOLO-V2 neural network model so as to generate the new convolution layer_lower layers and the new convolution layer_upper layers of the improved YOLO-V2 neural network.

A new convolution layer 15 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 15 may have 512 convolution kernels. A size of each convolution kernel of the new convolution layer 15 is, for example, 1×1. In an embodiment, the new convolution layer 15 shown in Table 2 may be the same as the convolution layer 15 shown in Table 1.

A bottom of a residual block 15 is connected to the new pooling layer 13 and the leaky ReLU 15 in the new convolution layer 15. In other words, an input of the residual block 15 is connected to an output of the new pooling layer 13 and an output of the leaky ReLU 15. The residual block 15 may be configured to summate the output of the new pooling layer 13 and the output of the leaky ReLU 15 to generate a summation result. The residual block 15 may further transmit the summation result to a new convolution layer 16_lower layer.

The new convolution layer 16_lower layer and a new convolution layer 16 upper layer are generated by decomposing the convolution layer 16 shown in Table 1 by the processor 110. The number of convolution kernels of the new convolution layer 16_lower layer may be less than the number of the convolution kernels of the convolution layer 16. For example, the new convolution layer 16_lower layer may have 64 convolution kernels. A size of each convolution kernel of the new convolution layer 16_lower layer is, for example, 3×3. The new convolution layer 16_upper layer may have 2048 convolution kernels. A size of each convolution kernel of the new convolution layer 16_upper layer is, for example, 1×1. In an embodiment, the processor 110 may adjust the number of the convolution kernels of the new convolution layer 16_upper layer to be half (namely 1024) the current convolution kernel number (namely 2048) of the new convolution layer 16_upper layer or to be less than half the current convolution kernel number, so that the computation required to be consumed by the new convolution layer 16_upper layer is lowered.

A new convolution layer 17 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 17 may have 512 convolution kernels. A size of each convolution kernel of the new convolution layer 17 is, for example, 1×1. In an embodiment, the new convolution layer 17 shown in Table 2 may be the same as the convolution layer 17 shown in Table 1.

A new convolution layer 18_lower layer and a new convolution layer 18_upper layer are generated by decomposing the convolution layer 18 shown in Table 1 by the processor 110. The number of convolution kernels of the new convolution layer 18_lower layer may be less than the number of the convolution kernels of the convolution layer 18. For example, the new convolution layer 18_lower layer may have 64 convolution kernels. A size of each convolution kernel of the new convolution layer 18_lower layer is, for example, 3×3. The new convolution layer 18_upper layer may have 2048 convolution kernels. A size of each convolution kernel of the new convolution layer 18_upper layer is, for example, 1×1. In an embodiment, the processor 110 may adjust the number of the convolution kernels of the new convolution layer 18_upper layer to be half (namely 1024) the current convolution kernel number (namely 2048) of the new convolution layer 18_upper layer or to be less than half the current convolution kernel number, so that the computation required to be consumed by the new convolution layer 18_upper layer is lowered.

A new convolution layer 19_lower layer and a new convolution layer 19_upper layer are generated by decomposing the convolution layer 19 shown in Table 1 by the processor 110. The number of convolution kernels of the new convolution layer 19_lower layer may be less than the number of the convolution kernels of the convolution layer 19. For example, the new convolution layer 19_lower layer may have 64 convolution kernels. A size of each convolution kernel of the new convolution layer 19_lower layer is, for example, 3×3. The new convolution layer 19_upper layer may have 2048 convolution kernels. A size of each convolution kernel of the new convolution layer 19_upper layer is, for example, 1×1. In an embodiment, the processor 110 may adjust the number of the convolution kernels of the new convolution layer 19_upper layer to be half (namely 1024) the current convolution kernel number (namely 2048) of the new convolution layer 19_upper layer or to be less than half the current convolution kernel number, so that the computation required to be consumed by the new convolution layer 19_upper layer is lowered.

A new convolution layer 20_lower layer and a new convolution layer 20 upper layer are generated by decomposing the convolution layer 20 shown in Table 1 by the processor 110. The number of convolution kernels of the new convolution layer 20_lower layer may be less than the number of the convolution kernels of the convolution layer 20. For example, the new convolution layer 20_lower layer may have 64 convolution kernels. A size of each convolution kernel of the new convolution layer 20_lower layer is, for example, 3×3. The new convolution layer 20_upper layer may have 2048 convolution kernels. A size of each convolution kernel of the new convolution layer 20_upper layer is, for example, 1×1. In an embodiment, the processor 110 may adjust the number of the convolution kernels of the new convolution layer 20_upper layer to be half (namely 1024) the current convolution kernel number (namely 2048) of the new convolution layer 20_upper layer or to be less than half o the current convolution kernel number, so that the computation required to be consumed by the new convolution layer 20_upper layer is lowered.

A bottom of a new concatenation layer is connected to the new pooling layer 13 and the leaky ReLU 20 in the convolution layer 20. In other words, an input of the new concatenation layer is connected to an output of the new pooling layer 13 and an output of the leaky ReLU 20. The new concatenation layer may be configured to concatenate the output of the new pooling layer 13 and the output of the leaky ReLU 20.

A new convolution layer 21_lower layer and a new convolution layer 21_upper layer are generated by decomposing the convolution layer 21 shown in Table 1 by the processor 110. The number of convolution kernels of the new convolution layer 21_lower layer may be less than the number of the convolution kernels of the convolution layer 21. For example, the new convolution layer 21_lower layer may have 64 convolution kernels. A size of each convolution kernel of the new convolution layer 21_lower layer is, for example, 3×3. The new convolution layer 21_upper layer may have 2048 convolution kernels. A size of each convolution kernel of the new convolution layer 21_upper layer is, for example, 1×1. In an embodiment, the processor 110 may adjust the number of the convolution kernels of the new convolution layer 21_upper layer to be half (namely 1024) the current convolution kernel number (namely 2048) of the new convolution layer 21_upper layer or to be less than half the current convolution kernel number, so that the computation required to be consumed by the new convolution layer 21 upper layer is lowered.

A new convolution layer 22 may include a plurality of convolution kernels, BN, SC and an activation function. The activation function is, for example, a leaky ReLU. The new convolution layer 22 may have 425 convolution kernels. A size of each convolution kernel of the new convolution layer 22 is, for example, 1×1. In an embodiment, the new convolution layer 22 shown in Table 2 may be the same as the convolution layer 22 shown in Table 1.

FIG. 2 is a flowchart of an object detection method based on a neural network according to an embodiment of the disclosure. The object detection method may be implemented by the object detection device 100 shown in FIG. 1. In step S210, an input image is received. In step S220, an object in the input image is identified according to an improved YOLO-V2 neural network. The improved YOLO-V2 neural network includes a residual block, a third convolution layer including a first number of filters, and a fourth convolution layer including a second number of filters. A first input of the residual block is connected to a first convolution layer of the improved YOLO-V2 neural network. An output of the residual block is connected to a second convolution layer of the improved YOLO-V2 neural network. The residual block is configured to transmit, to the second convolution layer, a summation result corresponding to the first convolution layer. The third convolution layer and the fourth convolution layer are generated by decomposing a convolution layer of an original YOLO-V2 neural network, the convolution layer includes a third number of filters, and the first number is less than the third number.

Based on the above, according to the improved YOLO-V2 neural network provided by the disclosure, the residual block can be added to the original YOLO-V2 neural network to improve the accuracy of identification. In addition, the improved YOLO-V2 neural network further includes the two convolution layers generated by decomposing a single convolution layer of the original YOLO-V2 neural network. Compared with the single convolution layer of the original YOLO-V2 neural network, the number of the filters in the two convolution layers can be greatly reduced. Therefore, the computation required by the improved YOLO-V2 neural network will be significantly lowered. Moreover, in the improved YOLO-V2 neural network, the reorganization layer of the original YOLO-V2 neural network is removed, so that the computational burden of the neural network is reduced.

Number	Name	Date	Kind
10426442	Schnorr	Oct 2019	B1
20180047272	Chandraker	Feb 2018	A1
20190114544	Sundaram	Apr 2019	A1
20190122113	Chen	Apr 2019	A1
20190130204	Li	May 2019	A1
20190130275	Chen	May 2019	A1
20190147335	Wang	May 2019	A1
20190187718	Zou	Jun 2019	A1
20190286982	Ambai	Sep 2019	A1
20200043475	Nguyen et al.	Feb 2020	A1
20200104706	Sandler	Apr 2020	A1
20200372325	Yamamoto	Nov 2020	A1
20210406663	Timofejevs	Dec 2021	A1
20220198243	Brothers, III	Jun 2022	A1
20220261623	Sung	Aug 2022	A1

Number	Date	Country
2019101224	Jan 2020	AU
107888843	Apr 2018	CN
109447066	Mar 2019	CN
110287835	Sep 2019	CN
110310227	Oct 2019	CN

Object detection device and object detection method based on neural network

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Term Extension

Abstract

Description

Claims

Priority Claims (1)

US Referenced Citations (15)

Foreign Referenced Citations (5)

Non-Patent Literature Citations (4)

Related Publications (1)

Entry
Tsang, Sik-Ho, Review: YOLOV2 and YOLO9000—You Only Look Once(Object Detection), Nov. 21, 2018, Towards Data Science, 13 pages (Year: 2018).
Wu Zhao Qi, “Study of Object Detection Based on Infrared Imaging Video”, Thesis of Master Degree, University of Electronic Science and Technology of China, Jan. 15, 2020, with English abstract, pp. 1-89.
Ricky Fok et al., “Decoupling the Layers in Residual Networks”, International Conference on Learning Representations, Dec. 31, 2018, pp. 1-11.
“Office Action of China Counterpart Application”, dated Aug. 26, 2022, p. 1-p. 5.