This application claims priority to Chinese Patent Application No. 201610084119.0, filed on Feb. 6, 2016, which is hereby incorporated by reference in its entirety.
Embodiments of the present invention relate to the field of image processing technologies, and specifically, to an object detection method and a computer device.
Object detection refers to a process in which an object computer marks out an object in an input image, and is a basic issue in machine vision. As shown in
To resolve the foregoing problem that the detection efficiency of the object detection device is relatively low, an existing solution mainly uses a maximum suppression method, in which the object detection device selects a region currently having a highest score each time, and then deletes a region that has a relatively high coincidence degree with the region currently having a highest score. This process is repeated until all regions are selected or deleted.
However, after a detection accuracy value of a region in an image is high enough, a score of a candidate region and actual location accuracy of the candidate region are not strongly correlated (a Pearson correlation coefficient is lower than 0.3). Therefore, it is difficult to guarantee accuracy of a target region that is determined in a manner in which a region having a highest score is selected each time but information of another region is not used.
Embodiments of the present invention provide an object detection method and a computer device, which help improve accuracy of detecting a location of an object by the computer device.
According to a first aspect, an embodiment of the present invention provides an object detection method, including:
obtaining a to-be-processed image;
obtaining, according to the to-be-processed image, n reference regions used to identify a to-be-detected object in the to-be-processed image, and n detection accuracy values, of the to-be-detected object, corresponding to the n reference regions, where n is an integer greater than 1;
determining sample reference regions in the n reference regions, where coincidence degrees between the sample reference regions and a reference region that corresponds to a maximum value in the n detection accuracy values is greater than a preset threshold; and
determining, based on the sample reference regions, a target region corresponding to the to-be-detected object, where the target region is used to identify the to-be-detected object in the to-be-processed image.
With reference to the first aspect, in some possible implementation manners, the determining, based on the sample reference regions, a target region corresponding to the to-be-detected object includes:
normalizing coordinate values of the sample reference regions, to obtain normalized coordinate values of the sample reference regions, where the coordinate value of the sample reference regions is used to represent the sample reference regions;
determining, based on the normalized coordinate values of the sample reference regions, characteristic values of the sample reference regions; and
determining, based on the characteristic values, a coordinate value used to identify the target region corresponding to the to-be-detected object in the to-be-processed image.
It can be learned that, in this embodiment of the present invention, a reference region with a relatively high region coincidence degree is not simply deleted, and instead, sample reference regions with relatively high quality is used to predict a location of a target region of an object, with a relationship of the sample reference regions being fully considered, which helps improve accuracy of detecting a location of the object.
With reference to the first aspect, in some possible implementation manners, after the determining a target region corresponding to the to-be-detected object, the method further includes:
outputting the to-be-processed image with the target region identified.
With reference to the first aspect, in some possible implementation manners, the normalizing coordinate values of the sample reference regions, to obtain normalized coordinate values of the sample reference regions includes:
calculating, based on the following formula, the normalized coordinate values of the sample reference regions:
where
a quantity of the sample reference regions is p, p is a positive integer less than or equal to n, and x1i is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-left corner of the ith reference region in the sample reference regions;
x1j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-left corner of the jth reference region in the sample reference regions, x2j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in a lower-right corner of the jth reference region, and {circumflex over (x)}1i is a normalized horizontal ordinate of the pixel that is located in the upper-left corner of the ith reference region; or
x1j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in a lower-left corner of the jth reference region, x2j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-right corner of the jth reference region, and {circumflex over (x)}1i is a normalized horizontal ordinate of a pixel that is located in a lower-left corner of the ith reference region; and
I(sj) is an indicator function, where when a detection accuracy value sj corresponding to the jth reference region is greater than a preset accuracy value, I(sj) is 1, when a detection accuracy value sj corresponding to the jth reference region is less than or equal to the preset accuracy value, I(sj) is 0, Π=Σj=1pI(sj), and both i and j are positive integers less than or equal to p.
In the normalization processing step in this embodiment of the present invention, a coordinate value of sample reference regions is normalized, which is conducive to reducing an impact of a reference region with a relatively low detection accuracy value on object detection accuracy, and further improves the object detection accuracy.
With reference to the first aspect, in some possible implementation manners, the characteristic values include a first characteristic value, and the determining, based on the normalized coordinate values of the sample reference regions, characteristic values of the sample reference regions includes:
calculating, based on the following formula, the first characteristic value:
where
the quantity of the sample reference regions is p, p is a positive integer less than or equal to n, the first characteristic value u({circumflex over (B)}) includes ut, Πt=Σi=1pgt(si), si is a detection accuracy value corresponding to the ith reference region in the sample reference regions, a function gt(si) is a function of si, the function gt(si) is used as a weighting function of {circumflex over (b)}i, {circumflex over (b)}i is the normalized coordinate values of the sample reference regions, i is a positive integer less than or equal to p, {circumflex over (b)}i={{circumflex over (x)}1i,ŷ1i,{circumflex over (x)}2i,ŷ2i}, and {circumflex over (B)} represents the sample reference regions; and
{circumflex over (x)}1i is the normalized horizontal coordinate, in the to-be-processed image, of the pixel that is located in the upper-left corner of the ith reference region in the sample reference regions, ŷ1i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the upper-left corner of the ith reference region, {circumflex over (x)}2i is a normalized horizontal coordinate, in the to-be-processed image, of a pixel that is located in a lower-right corner of the ith reference region, and ŷ2i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the lower-right corner of the ith reference region; or
{circumflex over (x)}1i is the normalized horizontal coordinate, in the to-be-processed image, of the pixel that is located in the lower-left corner of the ith reference region in the sample reference regions, ŷ1i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the lower-left corner of the ith reference region, {circumflex over (x)}2i is a normalized horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-right corner of the ith reference region, and ŷ2i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the upper-right corner of the ith reference region.
It should be noted that {circumflex over (b)}i={{circumflex over (x)}1i,ŷ1i,{circumflex over (x)}2i,ŷ2i} in the foregoing formula of ut specifically refers to:
if a currently calculated first characteristic value is a first characteristic value corresponding to an x1 coordinate of the sample reference regions, {circumflex over (b)}i={circumflex over (x)}1i; if a currently calculated first characteristic value is a first characteristic value corresponding to a y1 coordinate of the sample reference regions, {circumflex over (b)}i=ŷ1i; if a currently calculated first characteristic value is a first characteristic value corresponding to an x2 coordinate of the sample reference regions, {circumflex over (b)}i={circumflex over (x)}2i; or if a currently calculated first characteristic value is a first characteristic value corresponding to a y2 coordinate of the sample reference regions, {circumflex over (b)}i=ŷ2i, where the x1 coordinate corresponds to the foregoing x1j coordinate, and the x2 coordinate corresponds to the foregoing x2j coordinate.
In this embodiment of the present invention, because the first characteristic value is a weighted average of values obtained by using different weighting functions for coordinates of all sample reference regions, an impact of a coordinate value of each sample reference regions on a target region of a to-be-detected object is comprehensively considered for a coordinate value, of the target region of the to-be-detected object, that is determined based on the first characteristic value, which helps improve object detection accuracy.
With reference to the first aspect, in some possible implementation manners, the first characteristic value u({circumflex over (B)})=[u1, . . . , ud]T, d is a positive integer, t is a positive integer less than or equal to d, ut is the tth characteristic value of the first characteristic value, the function gt(si) is the tth weighting function of weighting functions of {circumflex over (b)}i, and the weighting functions of {circumflex over (b)}i include at least one of the following:
where
the ρ1, τ1, ρ2, τ2, ρ3, and τ3 are normalization coefficients.
With reference to the first aspect, in some possible implementation manners, the characteristic values further include a second characteristic value, and the determining, based on the normalized coordinate values of the sample reference regions, characteristic values of the sample reference regions includes:
calculating, based on the following formula, the second characteristic value:
where
M({circumflex over (B)}) is the second characteristic value, the quantity of the sample reference regions is p, p is a positive integer less than or equal to n, a matrix D includes the normalized coordinate values of the sample reference regions, the ith row in the matrix D includes normalized coordinate value of the ith reference region in the sample reference regions, and {circumflex over (B)} represents the sample reference regions.
In the embodiments of the present invention, because the second characteristic value is obtained by means of calculation based on a matrix that includes a coordinate of sample reference regions, two-dimensional relationships of coordinates of different sample reference regions are comprehensively considered for a coordinate value, of a target region of a to-be-detected object, that is determined based on the second characteristic value, which helps improve object detection accuracy.
With reference to the first aspect, in some possible implementation manners, the determining, based on the characteristic values, a coordinate value of the target region corresponding to the to-be-detected object includes:
calculating, according to the following formula, the coordinate value of the target region:
where
h1({circumflex over (B)}) is the coordinate value of the target region corresponding to the to-be-detected object, u({circumflex over (B)}) is the first characteristic value, m({circumflex over (B)})T is a vector form of the second characteristic value M({circumflex over (B)}), λ, Λ1, and Λ2 are coefficients, Λ=[λ,Λ1T,Λ2T]T, R({circumflex over (B)})=[1, u({circumflex over (B)})T, m({circumflex over (B)})T]T, and {circumflex over (B)} represents the sample reference regions.
With reference to the first aspect, in some possible implementation manners, a value of the coefficient Λ is determined by using the following model:
where
C and ε are preset values, K is a quantity of pre-stored training sets, {circumflex over (Z)}1k is a preset coordinate value of a target region corresponding to a reference region in the kth training set of the K training sets, and {circumflex over (B)}k represents the reference region in the kth training set.
According to a second aspect, an embodiment of the present invention discloses a computer device, including:
an obtaining unit, configured to obtain a to-be-processed image, where
the obtaining unit is further configured to obtain, according to the to-be-processed image, n reference regions used to identify a to-be-detected object in the to-be-processed image, and n detection accuracy values, of the to-be-detected object, corresponding to the n reference regions, where n is an integer greater than 1;
a first determining unit, configured to determine sample reference regions in the n reference regions, where coincidence degrees between the sample reference regions and a reference region that corresponds to a maximum value in the n detection accuracy values is greater than a preset threshold; and
a second determining unit, configured to determine, based on the sample reference regions, a target region corresponding to the to-be-detected object, where the target region is used to identify the to-be-detected object in the to-be-processed image.
With reference to the second aspect, in some possible implementation manners, the second determining unit includes:
a normalizing unit, configured to normalize a coordinate value of the sample reference regions, to obtain normalized coordinate values of the sample reference regions, where the coordinate value of the sample reference regions is used to represent the sample reference regions;
a characteristic value determining unit, configured to determine, based on the normalized coordinate values of the sample reference regions, characteristic values of the sample reference regions; and
a coordinate value determining unit, configured to determine, based on the characteristic values, a coordinate value used to identify the target region corresponding to the to-be-detected object in the to-be-processed image.
With reference to the second aspect, in some possible implementation manners, the normalizing unit is specifically configured to:
calculate, based on the following formula, the normalized coordinate values of the sample reference regions:
where
a quantity of the sample reference regions is p, p is a positive integer less than or equal to n, and x1i is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-left corner of the ith reference region in the sample reference regions;
x1j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-left corner of the jth reference region in the sample reference regions, x2j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in a lower-right corner of the jth reference region, and x1i is a normalized horizontal ordinate of the pixel that is located in the upper-left corner of the ith reference region; or
x1j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in a lower-left corner of the jth reference region, x2j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-right corner of the jth reference region, and {circumflex over (x)}1i is a normalized horizontal ordinate of a pixel that is located in a lower-left corner of the ith reference region; and
I(sj) is an indicator function, where when a detection accuracy value sj corresponding to the jth reference region is greater than a preset accuracy value, I(sj) is 1, when a detection accuracy value sj corresponding to the jth reference region is less than or equal to the preset accuracy value, I(sj) is 0, Π=Σi=1pI(sj), and both i and j are positive integers less than or equal to p.
With reference to the second aspect, in some possible implementation manners, the characteristic values include a first characteristic value, and the characteristic value determining unit is specifically configured to:
calculate, based on the following formula, the first characteristic value:
where
the quantity of the sample reference regions is p, p is a positive integer less than or equal to n, the first characteristic value u({circumflex over (B)}) includes ut, Πt=Σi=1pgt(si), si is a detection accuracy value corresponding to the ith reference region in the sample reference regions, a function gt(si) is a function of si, the function gt(si) is used as a weighting function of {circumflex over (b)}i, {circumflex over (b)}i is the normalized coordinate values of the sample reference regions, i is a positive integer less than or equal to p, {circumflex over (b)}i={{circumflex over (x)}1i,ŷ1i,{circumflex over (x)}2i,ŷ2i}, and {circumflex over (B)} represents the sample reference regions; and
{circumflex over (x)}1i is the normalized horizontal coordinate, in the to-be-processed image, of the pixel that is located in the upper-left corner of the ith reference region in the sample reference regions, ŷ1i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the upper-left corner of the ith reference region, {circumflex over (x)}2i is a normalized horizontal coordinate, in the to-be-processed image, of a pixel that is located in a lower-right corner of the ith reference region, and ŷ2i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the lower-right corner of the ith reference region; or
{circumflex over (x)}1i is the normalized horizontal coordinate, in the to-be-processed image, of the pixel that is located in the lower-left corner of the ith reference region in the sample reference regions, ŷ1i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the lower-left corner of the ith reference region, {circumflex over (x)}2i is a normalized horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-right corner of the ith reference region, and ŷ2i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the upper-right corner of the ith reference region.
It should be noted that {circumflex over (b)}i={{circumflex over (x)}1i,ŷ1i,{circumflex over (x)}2i,ŷ2i} in the foregoing formula of ui specifically refers to:
if a currently calculated first characteristic value is a first characteristic value corresponding to an x1 coordinate of the sample reference regions, {circumflex over (b)}i={circumflex over (x)}1i; if a currently calculated first characteristic value is a first characteristic value corresponding to a y1 coordinate of the sample reference regions, {circumflex over (b)}i=ŷ1i; if a currently calculated first characteristic value is a first characteristic value corresponding to an x2 coordinate of the sample reference regions, {circumflex over (b)}i={circumflex over (x)}2i; or if a currently calculated first characteristic value is a first characteristic value corresponding to a y2 coordinate of the sample reference regions, {circumflex over (b)}i=ŷ2i, where the x1 coordinate corresponds to the foregoing x1j coordinate, and the x2 coordinate corresponds to the foregoing coordinate.
With reference to the second aspect, in some possible implementation manners, the first characteristic value u({circumflex over (B)})=[u1, . . . , ud]T, d is a positive integer, t is a positive integer less than or equal to d, ut is the tth characteristic value of the first characteristic value, the function gt(si) is the tth weighting function of weighting functions of {circumflex over (b)}i, and the weighting functions of {circumflex over (b)}i include at least one of the following:
where
the ρ1, τ1, ρ2, τ2, ρ3, and τ3 are normalization coefficients.
With reference to the second aspect, in some possible implementation manners, the characteristic values further include a second characteristic value, and the characteristic value determining unit is specifically configured to:
calculate, based on the following formula, the second characteristic value:
where
M({circumflex over (B)}) is the second characteristic value, the quantity of the sample reference regions is p, p is a positive integer less than or equal to n, a matrix D includes the normalized coordinate values of the sample reference regions, the ith row in the matrix D includes normalized coordinate value of the ith reference region in the sample reference regions, and {circumflex over (B)} represents the sample reference regions.
With reference to the second aspect, in some possible implementation manners, the coordinate value determining unit is specifically configured to:
calculate, according to the following formula, the coordinate value of the target region:
where to-be-detected object, u({circumflex over (B)}) is the first characteristic value, m({circumflex over (B)})T is a vector form of the second characteristic value M({circumflex over (B)}), λ, Λ1, and Λ2 are coefficients, Λ=[λ,Λ1T,Λ2T]T, R({circumflex over (B)})=[1, u({circumflex over (B)})T, m({circumflex over (B)})T]T, and {circumflex over (B)} represents the sample reference regions.
With reference to the second aspect, in some possible implementation manners, a value of the coefficient Λ is determined by using the following model:
where
C and ε are preset values, K is a quantity of pre-stored training sets, {circumflex over (Z)}1k is a preset coordinate value of a target region corresponding to a reference region in the kth training set of the K training sets, and {circumflex over (B)}k represents the reference region in the kth training set.
According to a third aspect, an embodiment of the present invention discloses a computer device, where the computer device includes a memory and a processor that is coupled with the memory, the memory is configured to store executable program code, and the processor is configured to run the executable program code, to perform some or all of steps described in any method in the first aspect of the embodiments of the present invention.
According to a fourth aspect, an embodiment of the present invention discloses a computer readable storage medium, where the computer readable storage medium stores program code to be executed by a computer device, the program code specifically includes an instruction, and the instruction is used to perform some or all of steps described in any method in the first aspect of the embodiments of the present invention.
In the embodiments of the present invention, after n reference regions used to identify a to-be-detected object in a to-be-processed image, and n detection accuracy values, of the to-be-detected object, corresponding to the n reference regions are obtained, and sample reference regions is determined in the n reference regions, a target region corresponding to the to-be-detected object can be determined based on the sample reference regions, where the target region is used to identify the to-be-detected object in the to-be-processed image, coincidence degrees of the sample reference regions is greater than a preset threshold, and the coincidence degrees of the sample reference regions is coincidence degrees between the sample reference regions and a reference region that corresponds to a maximum value in the n detection accuracy values. It can be learned that, in the embodiments of the present invention, a reference region with a relatively high region coincidence degree is not simply deleted, and instead, sample reference regions with relatively high quality is used to predict a location of a target region of an object, with a relationship of the sample reference regions being fully considered, which helps improve accuracy of detecting a location of the object.
To describe the technical solutions in the embodiments of the present invention more clearly, the following briefly describes the accompanying drawings required for describing the embodiments. Apparently, the accompanying drawings in the following description show merely some embodiments of the present invention, and a person of ordinary skill in the art may still derive other drawings from these accompanying drawings without creative efforts.
The following clearly describes the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention.
In the specification, claims, and accompanying drawings of the present invention, the terms “first”, “second”, “third”, “fourth”, and so on are intended to distinguish between different objects but do not indicate a particular order. In addition, the terms “include”, “contain”, and any other variants thereof are intended to cover a non-exclusive inclusion. For example, a process, a method, a system, a product, or a device that includes a series of steps or units is not limited to the listed steps or units, but optionally further includes an unlisted step or unit, or optionally further includes another inherent step or unit of the process, the method, the product, or the device.
To facilitate understanding of the embodiments of the present invention, the following first briefly describes a method of detecting a location of a to-be-detected object in an image by a computer device in the prior art. The computer device first generates, by using a potential region classification method, multiple reference regions used to identify the to-be-detected object, classifies the reference regions by using a region based convolutional neural network (Region Based Convolutional Neural Network, RCNN) classifier, determines detection accuracy values, of the to-be-detected object, corresponding to the reference regions, and then, selects a reference region corresponding to a maximum detection accuracy value as a target region of the to-be-detected object. After a detection accuracy value of a reference region in the image is high enough, a score of the reference region and actual location accuracy of the reference region are not strongly correlated (a Pearson correlation coefficient is lower than 0.3), which makes it difficult to guarantee accuracy of the finally determined target region of the to-be-detected object.
Based on this, an object detection method is proposed in the solutions of the present invention. After obtaining n reference regions used to identify a to-be-detected object in a to-be-processed image, and n detection accuracy values, of the to-be-detected object, corresponding to the n reference regions, and determining sample reference regions in the n reference regions, a computer device may determine, based on the sample reference regions, a target region corresponding to the to-be-detected object, where the target region is used to identify the to-be-detected object in the to-be-processed image, coincidence degrees of the sample reference regions is greater than a preset threshold, and the coincidence degrees of the sample reference regions is coincidence degrees between the sample reference regions and a reference region that corresponds to a maximum value in the n detection accuracy values. It can be learned that, in the embodiments of the present invention, a reference region with a relatively high region coincidence degree is not simply deleted, and instead, sample reference regions with relatively high quality is used to predict a location of a target region of an object, with a relationship of the sample reference regions being fully considered, which helps improve accuracy of detecting a location of the object.
A detailed description is given below.
Referring to
The computer device may further include an output device 305 and an input device 306. The output device 305 communicates with the processor 301 and may display information in multiple manners. The input device 306 communicates with the processor 301 and may accept an input from a user in multiple manners.
In specific implementation, the foregoing computer device may be, for example, a desktop computer, a portable computer, a network server, a palm computer (Personal Digital Assistant, PDA), a mobile phone, a tablet computer, a wireless terminal device, a communications device, an embedded device, or a device that has a structure similar to the structure shown in
The processor 301 in the foregoing computer device can couple the at least one memory 303. The memory 303 pre-stores program code, where the program code specifically includes an obtaining module, a first determining module, and a second determining module. In addition, the memory 303 further stores a kernel module, where the kernel module includes an operating system (for example, WINDOWS™, ANDROID™, or IOS™).
The processor 301 of the computer device invokes the program code to execute the object detection method disclosed in this embodiment of the present invention, which specifically includes the following steps:
running, by the processor 301 of the computer device, the obtaining module in the memory 303, to obtain a to-be-processed image, and obtain, according to the to-be-processed image, n reference regions used to identify a to-be-detected object in the to-be-processed image, and n detection accuracy values, of the to-be-detected object, corresponding to the n reference regions, where n is an integer greater than 1, where
the detection accuracy values, of the to-be-detected object, corresponding to the reference regions may be obtained by means of calculation by using a region based convolutional neural network (Region Based Convolutional Neural Network, RCNN) classifier;
running, by the processor 301 of the computer device, the first determining module in the memory 303, to determine sample reference regions in the n reference regions, where coincidence degrees between the sample reference regions and a reference region that corresponds to a maximum value in the n detection accuracy values is greater than a preset threshold, where
if a coincidence degree corresponding to two reference regions that completely coincide is 1, the preset threshold may be, for example, 0.99 or 0.98; or if a coincidence degree corresponding to two reference regions that completely coincide is 100, the preset threshold may be, for example, 99, 98, or 95, and the preset threshold may be set by a user in advance; and
running, by the processor 301 of the computer device, the second determining module in the memory 303, to determine, based on the sample reference regions, a target region corresponding to the to-be-detected object, where the target region is used to identify the to-be-detected object in the to-be-processed image.
It can be learned that the computer device provided in this embodiment of the present invention does not simply delete a reference region with a relatively high region coincidence degree, and instead, uses sample reference regions with relatively high quality to predict a location of a target region of an object, with a relationship of the sample reference regions being fully considered, which helps improve accuracy of detecting a location of the object.
Optionally, after the processor 301 determines the target region corresponding to the to-be-detected object, the processor 301 is further configured to:
output the to-be-processed image with the target region identified.
Optionally, a specific implementation manner of the determining, by the processor 301 and based on the sample reference regions, a target region corresponding to the to-be-detected object is:
normalizing coordinate values of the sample reference regions, to obtain normalized coordinate values of the sample reference regions, where the coordinate value of the sample reference regions is used to represent the sample reference regions;
determining, based on the normalized coordinate values of the sample reference regions, characteristic values of the sample reference regions; and
determining, based on the characteristic values, a coordinate value used to identify the target region corresponding to the to-be-detected object in the to-be-processed image.
Optionally, a specific implementation manner of the normalizing, by the processor 301, a coordinate value of the sample reference regions, to obtain normalized coordinate values of the sample reference regions is:
calculating, based on the following formula, the normalized coordinate values of the sample reference regions:
where P a quantity of the sample reference regions is p, p is a positive integer less than or equal to n, and x1i is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-left corner of the ith reference region in the sample reference regions;
x1j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-left corner of the jth reference region in the sample reference regions, x2j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in a lower-right corner of the jth reference region, and {circumflex over (x)}1i is a normalized horizontal ordinate of the pixel that is located in the upper-left corner of the ith reference region; or
x1j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in a lower-left corner of the jth reference region, x2j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-right corner of the jth reference region, and {circumflex over (x)}1i is a normalized horizontal ordinate of a pixel that is located in a lower-left corner of the ith reference region; and
I(sj) is an indicator function, where when a detection accuracy value sj corresponding to the jth reference region is greater than a preset accuracy value, I(sj) is 1, when a detection accuracy value sj corresponding to the jth reference region is less than or equal to the preset accuracy value, I(sj) is 0, Π=Σj=1pI(sj), and both i and j are positive integers less than or equal to p.
The preset accuracy value may be set by a user in advance, or may be a reference value obtained by means of calculation according to the maximum value in the n detection accuracy values, which is not uniquely limited in this embodiment of the present invention.
In the normalization processing step in this embodiment of the present invention, a coordinate value of sample reference regions is normalized, which is conducive to reducing an impact of a reference region with a relatively low detection accuracy value on object detection accuracy, and further improves the object detection accuracy.
Optionally, the characteristic values include a first characteristic value, and a specific implementation manner of the determining, by the processor 301 and based on the normalized coordinate values of the sample reference regions, characteristic values of the sample reference regions is:
calculating, based on the following formula, the first characteristic value:
where
the quantity of the sample reference regions is p, p is a positive integer less than or equal to n, the first characteristic value u({circumflex over (B)}) includes ut, Πt=Σi=1pgt(si), si is a detection accuracy value corresponding to the ith reference region in the sample reference regions, a function gt(si) is a function of si, the function gt(si) is used as a weighting function of {circumflex over (b)}i, {circumflex over (b)}i is the normalized coordinate values of the sample reference regions, i is a positive integer less than or equal to p, {circumflex over (b)}i={{circumflex over (x)}1i,ŷ1i,{circumflex over (x)}2i,ŷ2i}, and {circumflex over (B)} represents the sample reference regions; and
{circumflex over (x)}1i is the normalized horizontal coordinate, in the to-be-processed image, of the pixel that is located in the upper-left corner of the ith reference region in the sample reference regions, ŷ1i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the upper-left corner of the ith reference region, {circumflex over (x)}2i is a normalized horizontal coordinate, in the to-be-processed image, of a pixel that is located in a lower-right corner of the ith reference region, and ŷ2i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the lower-right corner of the ith reference region; or
{circumflex over (x)}1i is the normalized horizontal coordinate, in the to-be-processed image, of the pixel that is located in the lower-left corner of the ith reference region in the sample reference regions, ŷ1i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the lower-left corner of the ith reference region, {circumflex over (x)}2i is a normalized horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-right corner of the ith reference region, and ŷ2i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the upper-right corner of the ith reference region.
It should be noted that {circumflex over (b)}i={{circumflex over (x)}1i,ŷ1i,{circumflex over (x)}2i,ŷ2i} in the foregoing formula of ut specifically refers to:
if a currently calculated first characteristic value is a first characteristic value corresponding to an x1 coordinate of the sample reference regions, {circumflex over (b)}i={circumflex over (x)}1i; if a currently calculated first characteristic value is a first characteristic value corresponding to a y1 coordinate of the sample reference regions, {circumflex over (b)}i=ŷ1i; if a currently calculated first characteristic value is a first characteristic value corresponding to an x2 coordinate of the sample reference regions, {circumflex over (b)}i={circumflex over (x)}2i; or if a currently calculated first characteristic value is a first characteristic value corresponding to a y2 coordinate of the sample reference regions, {circumflex over (b)}i=ŷ2i, where the x1 coordinate corresponds to the foregoing x1j coordinate, and the x2 coordinate corresponds to the foregoing x2j coordinate.
In this embodiment of the present invention, because the first characteristic value is a weighted average of values obtained by using different weighting functions for coordinates of all sample reference regions, an impact of a coordinate value of each sample reference regions on a target region of a to-be-detected object is comprehensively considered for a coordinate value, of the target region of the to-be-detected object, that is determined based on the first characteristic value, which helps improve object detection accuracy.
Optionally, the first characteristic value u({circumflex over (B)})=[u1, . . . , ud]T, d is a positive integer, t is a positive integer less than or equal to d, ut is the tth characteristic value of the first characteristic value, the function gt(si) is the tth weighting function of weighting functions of {circumflex over (b)}i, and the weighting functions of {circumflex over (b)}i include at least one of the following:
where
the ρ1, τ1, ρ2, τ2, ρ3, and τ3 are normalization coefficients.
Optionally, the characteristic values further include a second characteristic value, and a specific implementation manner of the determining, by the processor 301 and based on the normalized coordinate values of the sample reference regions, characteristic values of the sample reference regions is:
calculating, based on the following formula, the second characteristic value:
where
M({circumflex over (B)}) is the second characteristic value, the quantity of the sample reference regions is p, p is a positive integer less than or equal to n, a matrix D includes the normalized coordinate values of the sample reference regions, the ith row in the matrix D includes normalized coordinate value of the ith reference region in the sample reference regions, and {circumflex over (B)} represents the sample reference regions.
In this embodiment of the present invention, because the second characteristic value is obtained by means of calculation based on a matrix that includes a coordinate of sample reference regions, two-dimensional relationships of coordinates of different sample reference regions are comprehensively considered for a coordinate value, of a target region of a to-be-detected object, that is determined based on the second characteristic value, which helps improve object detection accuracy.
Optionally, a specific implementation manner of the determining, by the processor 301 and based on the characteristic values, a coordinate value of the target region corresponding to the to-be-detected object is:
calculating, according to the following formula, the coordinate value of the target region:
where
h1({circumflex over (B)}) is the coordinate value of the target region corresponding to the to-be-detected object, f0({circumflex over (B)}Λ0)=λ, f1({circumflex over (B)},Λ1)=Λ1Tu({circumflex over (B)}), f2({circumflex over (B)},Λ2)=Λ2Tm({circumflex over (B)}), u({circumflex over (B)}) is the first characteristic value, m({circumflex over (B)})T is a vector form of the second characteristic value M({circumflex over (B)}), λ, Λ1, and Λ2 are coefficients, Λ=[λ,Λ1T,Λ2T]T, R({circumflex over (B)})=[1, u({circumflex over (B)}), m({circumflex over (B)})T]T, and {circumflex over (B)} represents the sample reference regions.
Optionally, a value of the coefficient Λ is determined by using the following model:
where
C and ε are preset values, K is a quantity of pre-stored training sets, {circumflex over (Z)}1k is a preset coordinate value of a target region corresponding to a reference region in the kth training set of the K training sets, and {circumflex over (B)}k represents the reference region in the kth training set.
It can be learned that, in this embodiment of the present invention, after obtaining n reference regions used to identify a to-be-detected object in a to-be-processed image, and n detection accuracy values, of the to-be-detected object, corresponding to the n reference regions, and determining sample reference regions in the n reference regions, a computer device may determine, based on the sample reference regions, a target region corresponding to the to-be-detected object, where the target region is used to identify the to-be-detected object in the to-be-processed image, coincidence degrees of the sample reference regions is greater than a preset threshold, and the coincidence degrees of the sample reference regions is coincidence degrees between the sample reference regions and a reference region that corresponds to a maximum value in the n detection accuracy values. It can be learned that, in this embodiment of the present invention, a reference region with a relatively high region coincidence degree is not simply deleted, and instead, sample reference regions with relatively high quality is used to predict a location of a target region of an object, with a relationship of the sample reference regions being fully considered, which helps improve accuracy of detecting a location of the object.
Being consistent with the foregoing technical solutions, referring to
As shown in
S401: A computer device obtains a to-be-processed image.
S402: The computer device obtains, according to the to-be-processed image, n reference regions used to identify a to-be-detected object in the to-be-processed image, and n detection accuracy values, of the to-be-detected object, corresponding to the n reference regions, where n is an integer greater than 1.
The detection accuracy values, of the to-be-detected object, corresponding to the reference regions may be obtained by means of calculation by using a region based convolutional neural network (Region Based Convolutional Neural Network, RCNN) classifier.
S403: The computer device determines sample reference regions in the n reference regions, where coincidence degrees between the sample reference regions and a reference region that corresponds to a maximum value in the n detection accuracy values is greater than a preset threshold.
If a coincidence degree corresponding to two reference regions that completely coincide is 1, the preset threshold may be, for example, 0.99 or 0.98; or if a coincidence degree corresponding to two reference regions that completely coincide is 100, the preset threshold may be, for example, 99, 98, or 95. The preset threshold may be set by a user in advance.
S404: The computer device determines, based on the sample reference regions, a target region corresponding to the to-be-detected object, where the target region is used to identify the to-be-detected object in the to-be-processed image.
It can be learned that, in this embodiment of the present invention, after obtaining n reference regions used to identify a to-be-detected object in a to-be-processed image, and n detection accuracy values, of the to-be-detected object, corresponding to the n reference regions, and determining sample reference regions in the n reference regions, a computer device may determine, based on the sample reference regions, a target region corresponding to the to-be-detected object, where the target region is used to identify the to-be-detected object in the to-be-processed image, coincidence degrees of the sample reference regions is greater than a preset threshold, and the coincidence degrees of the sample reference regions is coincidence degrees between the sample reference regions and a reference region that corresponds to a maximum value in the n detection accuracy values. It can be learned that, in this embodiment of the present invention, a reference region with a relatively high region coincidence degree is not simply deleted, and instead, sample reference regions with relatively high quality is used to predict a location of a target region of an object, with a relationship of the sample reference regions being fully considered, which helps improve accuracy of detecting a location of the object.
Optionally, in this embodiment of the present invention, after the computer device determines the target region corresponding to the to-be-detected object, the computer device is further configured to:
output the to-be-processed image with the target region identified.
Optionally, in this embodiment of the present invention, a specific implementation manner of the determining, by the computer device and based on the sample reference regions, a target region corresponding to the to-be-detected object is:
normalizing, by the computer device, a coordinate value of the sample reference regions, to obtain normalized coordinate values of the sample reference regions, where the coordinate value of the sample reference regions is used to represent the sample reference regions;
determining, by the computer device and based on the normalized coordinate values of the sample reference regions, characteristic values of the sample reference regions; and
determining, by the computer device and based on the characteristic values, a coordinate value used to identify the target region corresponding to the to-be-detected object in the to-be-processed image.
Optionally, in this embodiment of the present invention, a specific implementation manner of the normalizing, by the computer device, a coordinate value of the sample reference regions, to obtain normalized coordinate values of the sample reference regions is:
calculating, by the computer device and based on the following formula, the normalized coordinate values of the sample reference regions:
where
a quantity of the sample reference regions is p, p is a positive integer less than or equal to n, and x1i is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-left corner of the ith reference region in the sample reference regions;
x1j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-left corner of the jth reference region in the sample reference regions, x2j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in a lower-right corner of the jth reference region, and {circumflex over (x)}1i is a normalized horizontal ordinate of the pixel that is located in the upper-left corner of the ith reference region; or
x1j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in a lower-left corner of the jth reference region, x2j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-right corner of the jth reference region, and {circumflex over (x)}1i is a normalized horizontal ordinate of a pixel that is located in a lower-left corner of the ith reference region; and
I(sj) is an indicator function, where when a detection accuracy value sj corresponding to the jth reference region is greater than a preset accuracy value, I(sj) is 1, when a detection accuracy value sj corresponding to the jth reference region is less than or equal to the preset accuracy value, I(sj) is 0, Π=Σj=1pI(sj), and both i and j are positive integers less than or equal to p.
The preset accuracy value may be set by a user in advance, or may be a reference value obtained by means of calculation according to the maximum value in the n detection accuracy values, which is not uniquely limited in this embodiment of the present invention.
Optionally, in this embodiment of the present invention, the characteristic values include a first characteristic value, and a specific implementation manner of the determining, by the computer device and based on the normalized coordinate values of the sample reference regions, characteristic values of the sample reference regions is:
calculating, by the computer device and based on the following formula, the first characteristic value:
where
the quantity of the sample reference regions is p, p is a positive integer less than or equal to n, the first characteristic value u({circumflex over (B)}) includes ut, Πt=Σi=1pgt(si), si is a detection accuracy value corresponding to the ith reference region in the sample reference regions, a function gt(si) is a function of si, the function gt(si) is used as a weighting function of {circumflex over (b)}i, {circumflex over (b)}i is the normalized coordinate values of the sample reference regions, i is a positive integer less than or equal to p, {circumflex over (b)}i={{circumflex over (x)}1i,ŷ1i,{circumflex over (x)}2i,ŷ2i}, and {circumflex over (B)} represents the sample reference regions; and
{circumflex over (x)}1i is the normalized horizontal coordinate, in the to-be-processed image, of the pixel that is located in the upper-left corner of the ith reference region in the sample reference regions, ŷ1i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the upper-left corner of the ith reference region, {circumflex over (x)}2i is a normalized horizontal coordinate, in the to-be-processed image, of a pixel that is located in a lower-right corner of the ith reference region, and ŷ2i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the lower-right corner of the ith reference region; or
{circumflex over (x)}1i is the normalized horizontal coordinate, in the to-be-processed image, of the pixel that is located in the lower-left corner of the ith reference region in the sample reference regions, ŷ1i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the lower-left corner of the ith reference region, {circumflex over (x)}2i is a normalized horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-right corner of the ith reference region, and ŷ2i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the upper-right corner of the ith reference region.
It should be noted that {circumflex over (b)}i={{circumflex over (x)}1i,ŷ1i,{circumflex over (x)}2i,ŷ2i} in the foregoing formula of ut specifically refers to:
if a currently calculated first characteristic value is a first characteristic value corresponding to an x1 coordinate of the sample reference regions, {circumflex over (b)}i={circumflex over (x)}1i; if a currently calculated first characteristic value is a first characteristic value corresponding to a y1 coordinate of the sample reference regions, {circumflex over (b)}i=ŷ1i; if a currently calculated first characteristic value is a first characteristic value corresponding to an x2 coordinate of the sample reference regions, {circumflex over (b)}i={circumflex over (x)}2i; or if a currently calculated first characteristic value is a first characteristic value corresponding to a y2 coordinate of the sample reference regions, {circumflex over (b)}i=ŷ2i, where the x1 coordinate corresponds to the foregoing x1j coordinate, and the x2 coordinate corresponds to the foregoing x2j coordinate.
Optionally, in this embodiment of the present invention, the first characteristic value u({circumflex over (B)})=[u1, . . . , ud]T, d is a positive integer, t is a positive integer less than or equal to d, ut is the tth characteristic value of the first characteristic value, the function gt(si) is the tth weighting function of weighting functions of {circumflex over (b)}i, and the weighting functions of {circumflex over (b)}i include at least one of the following:
where
the ρ1, τ1, ρ2, τ2, ρ3, and τ3 are normalization coefficients.
Optionally, in this embodiment of the present invention, the characteristic values further include a second characteristic value, and a specific implementation manner of the determining, by the computer device and based on the normalized coordinate values of the sample reference regions, characteristic values of the sample reference regions is:
calculating, by the computer device and based on the following formula, the second characteristic value:
where
M({circumflex over (B)}) is the second characteristic value, the quantity of the sample reference regions is p, p is a positive integer less than or equal to n, a matrix D includes the normalized coordinate values of the sample reference regions, the ith row in the matrix D includes normalized coordinate value of the ith reference region in the sample reference regions, and {circumflex over (B)} represents the sample reference regions.
Optionally, in this embodiment of the present invention, a specific implementation manner of the determining, by the computer device and based on the characteristic values, a coordinate value of the target region corresponding to the to-be-detected object is:
calculating, by the computer device and according to the following formula, the coordinate value of the target region:
where
h1({circumflex over (B)}) is the coordinate value of the target region corresponding to the to-be-detected object, f0({circumflex over (B)}Λ0)=λ, f1({circumflex over (B)},Λ1)=Λ1Tu({circumflex over (B)}), f2({circumflex over (B)},Λ2)=Λ2Tm({circumflex over (B)}), u({circumflex over (B)}) is the first characteristic value, m({circumflex over (B)})T is a vector form of the second characteristic value M({circumflex over (B)}), λ, Λ1, and Λ2 are coefficients, Λ=[λ,Λ1T,Λ2T]T, R({circumflex over (B)})=[1, u({circumflex over (B)}), m({circumflex over (B)})T]T, and {circumflex over (B)} represents the sample reference regions.
Optionally, in this embodiment of the present invention, a value of the coefficient Λ is determined by using the following model:
where
C and ε are preset values, K is a quantity of pre-stored training sets, {circumflex over (Z)}1k is a preset coordinate value of a target region corresponding to a reference region in the kth training set of the K training sets, and {circumflex over (B)}k represents the reference region in the kth training set.
Some or all of the steps performed by the foregoing computer device may be specifically implemented by the computer device by executing software modules (program code) in the foregoing memory. For example, step S401 and step S402 may be implemented by the computer device by executing the obtaining module shown in
The following is an apparatus embodiment of the present invention. Referring to
the obtaining unit 501 is configured to obtain a to-be-processed image;
the obtaining unit 501 is further configured to obtain, according to the to-be-processed image, n reference regions used to identify a to-be-detected object in the to-be-processed image, and n detection accuracy values, of the to-be-detected object, corresponding to the n reference regions, where n is an integer greater than 1;
the first determining unit 502 is configured to determine sample reference regions in the n reference regions, where coincidence degrees between the sample reference regions and a reference region that corresponds to a maximum value in the n detection accuracy values is greater than a preset threshold; and
the second determining unit 503 is configured to determine, based on the sample reference regions, a target region corresponding to the to-be-detected object, where the target region is used to identify the to-be-detected object in the to-be-processed image.
Optionally, the second determining unit 503 includes:
a normalizing unit, configured to normalize a coordinate value of the sample reference regions, to obtain normalized coordinate values of the sample reference regions, where the coordinate value of the sample reference regions is used to represent the sample reference regions;
a characteristic value determining unit, configured to determine, based on the normalized coordinate values of the sample reference regions, characteristic values of the sample reference regions; and
a coordinate value determining unit, configured to determine, based on the characteristic values, a coordinate value used to identify the target region corresponding to the to-be-detected object in the to-be-processed image.
Optionally, the normalizing unit is specifically configured to:
calculate, based on the following formula, the normalized coordinate values of the sample reference regions:
where
a quantity of the sample reference regions is p, p is a positive integer less than or equal to n, and x1i is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-left corner of the ith reference region in the sample reference regions;
x1j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-left corner of the jth reference region in the sample reference regions, x2j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in a lower-right corner of the jth reference region, and {circumflex over (x)}1i is a normalized horizontal ordinate of the pixel that is located in the upper-left corner of the ith reference region; or
x1j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in a lower-left corner of the jth reference region, x2j is a horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-right corner of the jth reference region, and {circumflex over (x)}1i is a normalized horizontal ordinate of a pixel that is located in a lower-left corner of the ith reference region; and
I(sj) is an indicator function, where when a detection accuracy value sj corresponding to the jth reference region is greater than a preset accuracy value, I(sj) is 1, when a detection accuracy value sj corresponding to the jth reference region is less than or equal to the preset accuracy value, I(sj) is 0, Π=Σj=1pI(sj), and both i and j are positive integers less than or equal to p.
The preset accuracy value may be set by a user in advance, or may be a reference value obtained by means of calculation according to the maximum value in the n detection accuracy values, which is not uniquely limited in this embodiment of the present invention.
Optionally, the characteristic values include a first characteristic value, and the characteristic value determining unit is specifically configured to:
calculate, based on the following formula, the first characteristic value:
where
the quantity of the sample reference regions is p, p is a positive integer less than or equal to n, the first characteristic value u({circumflex over (B)}) includes ut, Πt=Σi=1pgt(si), si is a detection accuracy value corresponding to the ith reference region in the sample reference regions, a function gt(si) is a function of si, the function gt(si) is used as a weighting function of {circumflex over (b)}i, {circumflex over (b)}i is the normalized coordinate values of the sample reference regions, i is a positive integer less than or equal to p, {circumflex over (b)}i={{circumflex over (x)}1i,ŷ1i,{circumflex over (x)}2i,ŷ2i}, and {circumflex over (B)} represents the sample reference regions; and
{circumflex over (x)}1i is the normalized horizontal coordinate, in the to-be-processed image, of the pixel that is located in the upper-left corner of the ith reference region in the sample reference regions, ŷ1i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the upper-left corner of the ith reference region, {circumflex over (x)}2i is a normalized horizontal coordinate, in the to-be-processed image, of a pixel that is located in a lower-right corner of the ith reference region, and ŷ2i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the lower-right corner of the ith reference region; or
{circumflex over (x)}1i is the normalized horizontal coordinate, in the to-be-processed image, of the pixel that is located in the lower-left corner of the ith reference region in the sample reference regions, ŷ1i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the lower-left corner of the ith reference region, {circumflex over (x)}2i is a normalized horizontal coordinate, in the to-be-processed image, of a pixel that is located in an upper-right corner of the ith reference region, and ŷ2i is a normalized vertical coordinate, in the to-be-processed image, of the pixel that is located in the upper-right corner of the ith reference region.
It should be noted that {circumflex over (b)}i={{circumflex over (x)}1i,ŷ1i,{circumflex over (x)}2i,ŷ2i} in the foregoing formula of ut specifically refers to:
if a currently calculated first characteristic value is a first characteristic value corresponding to an x1 coordinate of the sample reference regions, {circumflex over (b)}i={circumflex over (x)}1i; if a currently calculated first characteristic value is a first characteristic value corresponding to a y1 coordinate of the sample reference regions, {circumflex over (b)}i=ŷ1i; if a currently calculated first characteristic value is a first characteristic value corresponding to an x2 coordinate of the sample reference regions, {circumflex over (b)}i={circumflex over (x)}2i; or if a currently calculated first characteristic value is a first characteristic value corresponding to a y2 coordinate of the sample reference regions, {circumflex over (b)}i=ŷ2i, where the x1 coordinate corresponds to the foregoing x1j coordinate, and the x2 coordinate corresponds to the foregoing x2j coordinate.
Optionally, the first characteristic value u({circumflex over (B)})=[u1, . . . , ud]T, d is a positive integer, t is a positive integer less than or equal to d, ut is the tth characteristic value of the first characteristic value, the function gt(si) is the tth weighting function of weighting functions of {circumflex over (b)}i, and the weighting functions of {circumflex over (b)}i include at least one of the following:
where
the ρ1, τ1, ρ2, τ2, ρ3, and τ3 are normalization coefficients.
Optionally, the characteristic values further include a second characteristic value, and the characteristic value determining unit is specifically configured to:
calculate, based on the following formula, the second characteristic value:
where
M({circumflex over (B)}) is the second characteristic value, the quantity of the sample reference regions is p, p is a positive integer less than or equal to n, a matrix D includes the normalized coordinate values of the sample reference regions, the ith row in the matrix D includes normalized coordinate value of the ith reference region in the sample reference regions, and {circumflex over (B)} represents the sample reference regions.
Optionally, the coordinate value determining unit is specifically configured to:
calculate, according to the following formula, the coordinate value of the target region:
where
h1({circumflex over (B)}) is the coordinate value of the target region corresponding to the to-be-detected object, f0({circumflex over (B)}Λ0)=λ, f1({circumflex over (B)},Λ1)=Λ1Tu({circumflex over (B)}), f2({circumflex over (B)},Λ2)=Λ2Tm({circumflex over (B)}), u({circumflex over (B)}) is the first characteristic value, m({circumflex over (B)})T is a vector form of the second characteristic value M({circumflex over (B)}), λ, Λ1, and Λ2 are coefficients, Λ=[λ,Λ1T,Λ2T]T, R({circumflex over (B)})=[1, u({circumflex over (B)}), m({circumflex over (B)})T]T, and {circumflex over (B)} represents the sample reference regions.
Optionally, a value of the coefficient Λ is determined by using the following model:
where
C and ε are preset values, K is a quantity of pre-stored training sets, {circumflex over (Z)}1k is a preset coordinate value of a target region corresponding to a reference region in the kth training set of the K training sets, and {circumflex over (B)}k represents the reference region in the kth training set.
It should be noted that the computer device described in this functional unit apparatus embodiment of the present invention is represented in a form of functional units. The term “unit” used herein should be understood as a meaning as broadest as possible. The unit is an object that is used to implement a function of each “unit”, and may be, for example, an integrated circuit ASIC or a single circuit; or is a processor (a shared processor, a dedicated processor, or a chipset) and a memory that are used to execute one or multiple software or firmware programs, a combinational logic circuit, and/or another appropriate component that provides and implements the foregoing functions.
For example, a person skilled in the art may know that a composition form of a hardware carrier of the computer device may be specifically the computer device shown in
a function of the obtaining unit 501 may be implemented by the processor 301 and the memory 303 in the computer device, where specifically, the processor 301 runs the obtaining module in the memory 303 to obtain a to-be-processed image and obtain, according to the to-be-processed image, n reference regions used to identify a to-be-detected object in the to-be-processed image, and n detection accuracy values, of the to-be-detected object, corresponding to the n reference regions;
a function of the first determining unit 502 may be implemented by the processor 301 and the memory 303 in the computer device, where specifically, the processor 301 runs the first determining module in the memory 303 to determine sample reference regions in the n reference regions; and
a function of the second determining unit 503 may be implemented by the processor 301 and the memory 303 in the computer device, where specifically, the processor 301 runs the second determining module in the memory 303 to determine, based on the sample reference regions, a target region corresponding to the to-be-detected object.
It can be learned that, in this embodiment of the present invention, an obtaining unit of a computer device disclosed in this embodiment of the present invention first obtains a to-be-processed image and obtains, according to the to-be-processed image, n reference regions used to identify a to-be-detected object in the to-be-processed image, and n detection accuracy values, of the to-be-detected object, corresponding to the n reference regions; then, a first determining unit of the computer device determines sample reference regions in the n reference regions; and finally, a second determining unit of the computer device determines, based on the sample reference regions, a target region corresponding to the to-be-detected object, where coincidence degrees of the sample reference regions is greater than a preset threshold, and the coincidence degrees of the sample reference regions is coincidence degrees between the sample reference regions and a reference region that corresponds to a maximum value in the n detection accuracy values. It can be learned that, in this embodiment of the present invention, a reference region with a relatively high region coincidence degree is not simply deleted, and instead, sample reference regions with relatively high quality is used to predict a location of a target region of an object, with a relationship of the sample reference regions being fully considered, which helps improve accuracy of detecting a location of the object.
A person of ordinary skill in the art may understand that all or some of the steps of the methods in the embodiments may be implemented by a program instructing relevant hardware. The program may be stored in a computer readable storage medium. The storage medium may include a flash memory, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, an optical disc, or the like.
The object detection method and the computer device that are disclosed in the embodiments of the present invention have been described in detail above. The principle and the implementation manners of the present invention are described herein by using specific examples. The descriptions about the embodiments are merely provided to help understand the method and the core idea of the present invention. In addition, a person of ordinary skill in the art can make variations and modifications to the present invention regarding the specific implementation manners and the application scope, according to the idea of the present invention. Therefore, the content of this specification shall not be construed as a limitation on the present invention.
Number | Date | Country | Kind |
---|---|---|---|
201610084119.0 | Feb 2016 | CN | national |