The present description relates to an object identification unit for identifying objects and a corresponding method for identifying objects using an artificial neural network. The object identification unit uses an artificial neural network to recognize objects with a high level of reliability even if the objects are rotated relative to an image acquisition unit.
Artificial neural networks can be used to classify and/or identify objects. The face of a person assumes a key role for identifying a person. Identifying a person includes in particular assigning a face to a specific person.
In order that an artificial neural network can identify a person, the artificial neural network has to have been trained using an image of this person. This takes place in the so-called training phase. The artificial neural network is then capable of comparing an image to be identified of a person to all trained persons and to present the person who is closest to the image to be identified.
Certain specifications or requirements can be placed on the image of a person which is used for the identification process. One example of this are the so-called biometric images, on which the face of a person has to fill up a specific region (requirement for the dimension of the face on the image) and also has to be acquired from a predetermined viewing angle range (requirement for the perspective).
An aspect relates to improving the reliability of the identification of persons using an artificial neural network.
According to one aspect, an object identification unit is specified for identifying objects. The object identification unit comprises an artificial neural network, ANN. The ANN contains an input layer and an output layer, wherein the ANN is designed to identify objects based on inputs at the input layer, wherein the input layer includes a plurality of input neurons, and wherein the input layer is designed to obtain an input vector, wherein the input vector contains items of object identification information on the object to be identified. The ANN is designed to identify an object based on the items of object identification information, wherein the items of object identification information correspond to a two-dimensional image of the object to be identified. The object identification unit is configured to divide the two-dimensional image into a plurality of triangles, which are delimited by characteristic points of the object to be identified at their corners. The object identification unit is configured to ascertain a rotational angle of the object to be identified in the two-dimensional image from symmetry properties of the object to be identified and relative area components of the individual triangles in the total sum of the area of the triangles and to multiply the areas of the individual triangles by a correction factor in order to obtain corrected areas of the triangles, which correspond to a symmetry condition or fulfill the symmetry condition. The object identification unit is configured to scale the corrected areas of the triangles to a rotation-invariant dimension of the object to be identified and to supply the scaled areas of the triangles to the ANN as an input vector.
The object identification unit described here can be used for identifying various objects, in particular those objects which are symmetrical with respect to an axis of symmetry. These objects can be faces of humans, images of animals and plants or of machines. The object identification unit is described hereinafter in conjunction with recognizing faces or identifying persons. However, it is to be understood that this is solely an application example and the general validity of the use of the object identification unit for identifying arbitrary objects is not restricted by the reference to a human face.
One challenge in identifying persons is when the face of a person on a photographic image from an image acquisition unit, for example a camera for moving or unmoving images, is rotated, i.e., the face was not acquired frontally. Such a rotation of the face makes the automated machine recognition of the face more difficult, in particular if the comparison images of the persons were acquired frontally or at a viewing angle which differs from the viewing angle of the image to be identified.
The object identification unit described here meets this challenge in that sections of an image of a face to be identified are firstly provided with a correction factor in order to compensate for the deviation of the dimensions of the face caused by the rotation. The sections of the image of the face to be identified are then scaled in order to compensate for dimension differences in the images to be compared. In particular, the relative component of the area of each triangle in the total of the areas of all triangles is ascertained in order to ascertain a rotational angle of the face. The relative component of the area of each triangle in the total of the areas of all triangles is then scaled to a rotation-invariant dimension of the face. The scaled area of the triangles is supplied to the artificial neural network in order to identify a person.
In particular, it is provided that an image of a face is divided into multiple triangles, wherein each triangle is defined by three characteristic points in the image of the face. At least some of these triangles are placed in the image of the face so that each two triangles are located axially symmetrically with respect to a vertical or horizontal axis of the face. This arrangement of the triangles is based on the concept that a face is symmetrical with respect to a vertical centre axis. It is accordingly presumed that triangles in the two halves of the face which are defined by the same characteristic points or characteristic points corresponding to one another are of equal size. This is referred to in the present case as the symmetry property or symmetry condition. If this symmetry condition is not met, it can be presumed that the face is rotated on the image. However, this rotation can be compensated for on the basis of the known symmetry property of a human face in that areas or dimensions of symmetrical triangles are related to one another.
The areas of two triangles which are arranged symmetrically with respect to a centre axis of the face can be used to form a quotient. This quotient states by how much the areas of these triangles differ from one another. In the case of an object which is symmetrical per se, such as a human face, the areas would have to be equal in size and the quotient would be 1. However, if the face is rotated, the quotient assumes a different value. The correction factor to be applied is thus dimensioned so that the area of a triangle represented in corrupted form in the image is corrected to its actual value or its actual component in the total area of the face (this relates here to the relative component of a triangle in the total of the areas of all triangles into which a face was divided). A rotation of the face is compensated for by this approach, which facilitates automated machine recognition by means of an artificial neural network.
In a further step, the area of the triangles corrected in this way is scaled to a rotation-invariant dimension of the face. For example, a horizontal or vertical distance between two points in the face can be used as the rotation-invariant dimension, which does not change upon a rotation of the face. For example, the vertical distance between the height of the eyes and the centre of the upper lip does not change if the head is rotated around the vertical axis. Other vertical distances in the face also do not change if the head is solely rotated around the vertical axis. Therefore, those dimensions which are independent of a rotation can be used to scale the areas of the triangles.
It is thus provided that a two-step correction of an image of a face is to be executed in order to make the identification of a person using an artificial neural network more reliable: first, the face is divided into triangles which are delimited by characteristic points, and this is done so that at least some of the triangles are axially symmetrical with respect to an axis of the face. The triangles axially symmetrical to one another are of equal size in an image of the face without rotation. If this symmetry condition does not apply in an image, it results that the face is rotated. A correction factor can be ascertained from the dimension ratio of triangles axially symmetrical to one another, with the aid of which the relative component of the area of a triangle in the total of the areas of all triangles can be ascertained, specifically for the case that the face is not rotated. In a second step, the corrected areas of the triangles are scaled to a rotation-invariant dimension in the face. A corrected and scaled image of a face is created by this approach, so that the rotation is compensated for the face and it is brought into a common image plane with the trained images of faces.
In principle, the ANN is constructed from an input layer, one or more intermediate layers, and an output layer. It is conceivable that the ANN is a two-layer ANN without concealed layer. Each layer contains a plurality of neurons, wherein each neuron of one layer is connected to all neurons of the next layer. An ANN can be trained using a large number of input data and thus configured so that it recognizes objects which have been learned once and supplies a corresponding output value. In the present case, the ANN is trained using images of faces, which are imaged frontally and in which the facial dimension is established. These faces are also divided into a plurality of triangles, so that the areas of the triangles, the relative component of the area of each triangle in the total sum of the areas of the triangles, and the scaling of the areas of the triangles to a rotation-invariant dimension of the face are used by the ANN as identification parameters.
The artificial neural network can be executed on a computer or a computer network, wherein the computer or the computer network (then by means of parallel processing of the instructions) can execute the ANN in various configurations. For example, the ANN can be executed on a computer, a field programmable gate array (FPGA), or a processor. If the ANN is executed in various configurations, this does not necessarily change anything in the hardware on which the ANN is executed. Rather, for example, the configuration of the individual neurons and/or the weighting of the information transfer between neurons of various layers changes.
The ANN is designed, for example, to assume a plurality of different configurations, wherein each configuration of the ANN corresponds to a trained person and wherein the ANN supplies a specific input vector of multiple configurations from the plurality of various configurations. In particular, the information of the individual configurations is contained in the weights of the individual neurons and represents a minimum on a multidimensional hypersurface. This minimum represents the minimum deviation between a face to be identified and one or more faces compared thereto. In the case of very similar faces, as can be the case with twins, for example, confusions can occur however (also referred to as a “false positive”). The similarity of the image to be identified to various faces known to the ANN can thus be ascertained. The face having the greatest similarity (or the least dissimilarity) is then output by the ANN as the result of the identification procedure.
Each configuration of the ANN corresponds to a person and is designed so that the ANN recognizes this person, and does so under various perspectives on the face of the person and at varying distances of the person from the acquiring image acquisition unit.
In one embodiment, the rotation-invariant dimension is a vertical or horizontal distance between two characteristic points of the object to be identified.
Depending on the rotation of the face in the image, either horizontal distances or vertical distances between unchanging points are not influenced by the rotation. In the present case, both horizontal and also vertical distances can be used as the rotation-invariant dimension.
In a further embodiment, the object identification unit is configured to be trained in a training phase using a plurality of images of objects, wherein the images show the objects without rotational angles.
By way of the above-described approach, it is possible to compare the image of a rotated face using an artificial neural network to those images of faces which were recorded frontally and without rotation of the face.
In a further embodiment, the symmetry condition is an axial symmetry with respect to an axis of the object to be identified.
The approach described here can preferably be used for those objects which have at least one axis of symmetry. The human face, for example, is generally symmetrical with respect to a vertical centre axis. This symmetry property permits rotations of the face around a vertical axis to be recognized, because the areas of triangles on the left half of the face differ from the areas of corresponding triangles on the right half of the face (or vice versa), although they would have to be identical due to the symmetry property.
For objects which are axially symmetrical with respect to a horizontal axis, the same statement applies for the upper and the lower half of the image of the object. A rotation of the object around the transverse axis can be compensated for in this case. A missile or guided missile can be mentioned as an example of this application, which is observed from the side and in which the elevator is not engaged.
In a further embodiment, the object identification unit is configured to divide the two-dimensional image into a plurality of triangles so that at least some of the triangles are symmetrical with respect to an axis of the object to be identified.
A completely symmetrical division is preferably performed. This may be helpful and advantageous. However, it is also possible to execute an identification without a complete symmetrical division of the object in that the symmetrical triangles are used.
In a further embodiment, the object identification unit is embodied to ascertain the correction factor in that a dimension ratio is ascertained for each pair of triangles consisting of two triangles corresponding to one another.
A virtual image is created or triangle dimensions of the image are created without rotational angles.
According to a further aspect, a method is specified for identifying objects using an artificial neural network, ANN. The method includes the following steps: dividing a two-dimensional image of an object to be identified into a plurality of triangles which are delimited by characteristic points of the object to be identified at their corners; ascertaining a rotational angle of the object to be identified in the two-dimensional image based on symmetry properties of the object to be identified and relative arca components of the individual triangles in a total sum of the area of the triangles; ascertaining corrected areas of the triangles in that the areas of the triangles are multiplied by a correction factor, wherein the corrected areas of the triangles correspond to a symmetry condition; scaling the corrected areas of the triangles to a rotation-invariant dimension of the object to be identified; and supplying the scaled corrected areas to the ANN, in order to carry out a comparison of objects based thereon and to identify these objects.
The same statements as are described above or below with reference to the object identification unit apply to the method. To avoid unnecessary repetition, the functional features of the object identification unit are not reproduced at this point. In any case, the functional features of the object identification unit can be implemented as method steps and vice versa. Some of the functional features are specified by way of example here as method steps, which is not to be understood as a restriction, however, in such a way that other functional features cannot be implemented as method steps.
In one embodiment of the method, the rotation-invariant dimension is a vertical or horizontal distance between two characteristic points of the object to be identified.
A horizontal distance upon a rotation of the object/face around the vertical axis is only rotation-invariant after a correction of the rotational angle around the vertical axis, however. Expressed in general terms, a vertical distance is rotation-invariant with respect to rotations around the vertical axis and a horizontal distance is rotation-invariant with respect to rotations around the transverse axis.
In a further embodiment, the symmetry condition is an axial symmetry with respect to an axis of the object to be identified.
Exemplary embodiments are described in greater detail hereinafter on the basis of the appended drawings. The illustrations are schematic and are not to scale. Identical reference signs relate to identical or similar elements. In the figures:
It is a characteristic feature of the ANN that each neuron of a layer is connected to all neurons of the following layers (located farther to the right in the illustration of
The data source 50 supplies images of one or more faces to the ANN 100, where the images are subjected to an identification procedure. The data source 50 can be, for example, a camera or a data memory, in which images are stored and can be output to the ANN 100. The data source 50 can be connected to the ANN 100 directly or via a data transmission network. The transmission of the images from the data source 50 to the ANN 100 can be initiated or controlled by the control unit 80.
The ANN 100 can access a data memory 70. For example, the parameters of various configurations of the ANN 100 can be stored in the data memory 70. The weightings for the individual neurons can be loaded with the aid of these configurations to compare an image of one person to the image of all trained persons. The result of the identification procedure is transmitted by the ANN 100 to the output unit 60.
It can be seen from
In the case of a face rotated around the vertical axis, the areas of triangles corresponding to one another are not identical in a two-dimensional image of the rotated face. For example, if the face in
The approach for identifying persons is described by way of example at this point with reference to
Firstly, the face is decomposed into a finite number of elementary basic triangles, the corner points of which are in a distance ratio to one another singular for each human. The idea of a facial recognition is based on the concept that there are no two identical faces with the exception of identical twins. If we compare two faces to one another, we establish that humans have noses of different lengths, different distances between the eyes, or different widths of the mouth. These features are referred to as biometric data.
Thus, lines can be drawn through the face, for example on the basis of noteworthy points such as centre of the eyes and nose, chin tip, etc. or characteristic dimensions such as distance between the ears, length of nose or upper lip, which result in a biometric profile of a human and thus represent an invariable physical feature. If one uses a characteristic dimension, such as the distance of the eye line (horizontal line which connects the eyes to one another) to the nose line (horizontal line which connects the nostrils to one another) as a scaling dimension, all humans will differ in at least one, but extremely probably in multiple, possibly even many features from the reference person.
If we now select a finite number of characteristic points in the face (circles 151 in
Since a human face is curved multiple times and in different ways, it is first decomposed into a sufficient number of area elements. The arrangement shown in
In this case, each of the total of 27 corner points is at least once the starting or end point of one of the 25 triangular areas Ai+1,i+2, which can be arranged in rising sequence as follows
To thus be able to decompose an area into m triangles, m+2 corner points are required. In the present case, m=25. With respect to the decomposition, it is also preferable to select it so that a progressive series of points results, in which as much as possible none is omitted or is not a corner point of a basic triangle. Double counts are expressly permitted, possibly even unavoidable.
All partial triangles of the image plane are added together according to the following formula to form a total area
The total area is used as a reference variable because the neurons are handled for comparative purposes as values less than or equal to 1. The following expression then applies for the scaled triangular areas
To calculate a triangular area in three-dimensional space, we require the surface normals, which may be represented as a cross product of the two distance vectors Δrij and Ari from a common corner point i to each of the two other corner points j and k in the triangle, i.e.
For plane vectors in the x-y plane, only the z component remains of this vector.
wherein absolute value bars are placed since areas can also be negative. In detail, the following relations apply for the respective distance differences
Δxij=xj−xi,Δxik=xk−xi,
Δyij=yj−yi,Δyik=yk−yi,
Δzij=zj−zi,Δzik=zk−zi.
To take into consideration a large number of possible faces, we additionally introduce a training pattern with the index p for each individual face, which is schematically shown in
To save running indices, we first define the output neurons αpj and the target output values {circumflex over (α)}pj by
for j=1, 2, . . . , m. In the two-layer network from
After the network has been taught using a large number of faces (
wherein Ap,18⊥=0. To explicitly differentiate from the three-dimensional surface elements, we have identified the two-dimensional projections with Apj⊥ instead of Apj. Furthermore
with Âp,18⊥=0 and Âpj⊥ instead of Âpj applies.
For this purpose, the index p is omitted hereinafter, because only a single image is observed. For j∈{1, 2, 3, 4, 5, 6, 7, 8}, the following applies in the case of the biometric horizontal projection (Θ=0) for the reference areas
for
j∈{9,18,19,20,21,22,23,24,25}
Because it can be presumed that, as shown in
for
j∈{9,18,19,20,21,22,23,24,25}
For the asymmetrical areas, the following instead applies
If both equations are divided through one another, relations are obtained from which the rotational angle Θ can be determined,
For scaled facial areas, Âj=Aj etc. The angles can thus be calculated directly by means
To scale the images, the distance is used
between middle of eyes and middle of nose from the reference image. In the actual presented target image, this distance reads, for example
To now scale an actual image to be evaluated, all position data have to be multiplied by the factor x=h0/h, i.e., κ<1 if h≥ h0 and ≥1 if h<h0.
Thus, for example, for a comparison person, a κ of 0.7173 results and thus a corrected error of E=0.0012898, while the κ of another image is equal to 1.0139 and thus corresponds to an error of E=0.0018017. The image having the higher error thus differs from the comparison person by 39.7%, which enables a clear differentiation. The training algorithm is also described hereinafter.
The starting point for deriving a back propagation algorithm is the minimization rule of the learning error
in a pattern p. The following approach is based on the consideration that the calculation of the gradient for the total error function E can be carried out by summation of the gradients over all error functions Ep of the individual training patterns, since the total error E is nothing more than the sum to be minimized
of all q learning errors. Since the learning error E is decreasing, i.e., δE/δwi<0, the following has to apply for i=1, 2, . . . , n:
wherein the proportionality factor y indicates the learning rate. The modification rule for all n connecting weights can accordingly be written as
The change of the weight vector is therefore proportional to the negative gradient of the error function. The net input of the output neurons is calculated according to the linear standard propagation function as
wherein wij is the weight between the transmitting neuron i and the receiving neuron j. The special feature of multilayer perceptrons is the use of a nonlinear sigmoid function
for activating the neurons. The error in the training pattern p over all output neurons thus has to become minimal after use of the activation function, i.e.
The learning problem may be approximately described verbally in that the connecting weights wij are to be determined so that the square total error difference between target and actual output (wherein the latter represents a nonlinear function of the weighted net input) becomes minimal for all output neurons. The Euclidean distance between target vector and output vector is thus to be minimized. The required condition for solving the extreme value task has the consequence that the gradient of the error function has to become zero, i.e.
In this case, the coefficients for all modifiable connecting weights can be represented more comprehensibly as a connection matrix
The gradient descent method is based on the concept of performing the change of the individual connecting weights proportionally to the negative gradient of the error function:
wherein the equation
was used. If the activation function were linear, any multilayer back propagation network could thus be reduced to a two-stage network without concealed layer, since a linear propagation and activation function connected in series could be summarized as a single linear function.
The partial derivative of the above activation function required to fulfill the stated object reads
If the input to the ANN
is derived according to wij, the following is finally obtained
If all expressions obtained up to this point are inserted into the expression for the gradient of the above error function, at the end it reads
In summary, with
and because i=k and j=l, the following expression results
Ultimately, the following is obtained as the weight adaptation
The general considerations up to this point may now be transferred to a specific starting situation, which enables a step-by-step adaptation of the weights. The starting values wij0 can be selected arbitrarily. With the corresponding network input
the zeroth output may be specified directly
and the zeroth weight adaptation reads
The connecting weights thus result in the next learning step from
wij1=wij0+Δwij0
and with
it follows as the first difference weight the following
The second approximation thus reads
wij2=wij1+Δwij1
and with
the second difference weight follows as
The third weight results similarly
wij3=wij2+Δwij2
In general, if wijn is known, an iterative sequence of the following variables applies
The application of neural networks is divided into two phases, a learning phase and an application phase. In the learning phase, a set of training faces is presented to the network, on the basis of which the connecting weights of the neurons are adjusted so that the network fulfills the desired behaviour and the difference between the output value and the desired target value is minimal. In the application phase, input vectors are again offered, wherein the output vectors are calculated independently with retrieval of the learned knowledge from the network.
In addition, it is to be noted that “comprising” or “including” does not exclude other elements or steps and “a” or “one” does not exclude a plurality. Furthermore, it is to be noted that features or steps which have been described with reference to one of the above exemplary embodiments can also be used in combination with other features or steps of other above-described exemplary embodiments. Reference numerals in the claims are not to be viewed as a restriction.
While at least one exemplary embodiment of the present invention(s) is disclosed herein, it should be understood that modifications, substitutions and alternatives may be apparent to one of ordinary skill in the art and can be made without departing from the scope of this disclosure. This disclosure is intended to cover any adaptations or variations of the exemplary embodiment(s). In addition, in this disclosure, the terms “comprise” or “comprising” do not exclude other elements or steps, the terms “a” or “one” do not exclude a plural number, and the term “or” means either or both. Furthermore, characteristics or steps which have been described may also be used in combination with other characteristics or steps and in any order unless the disclosure or context suggests otherwise. This disclosure hereby incorporates by reference the complete disclosure of any patent or application from which it claims benefit or priority.
Number | Date | Country | Kind |
---|---|---|---|
10 2020 111 955.0 | May 2020 | DE | national |
Number | Name | Date | Kind |
---|---|---|---|
8626686 | Rhodes | Jan 2014 | B1 |
10311334 | Florez Choque et al. | Jun 2019 | B1 |
10805556 | Sorgi | Oct 2020 | B1 |
20020168100 | Woodall | Nov 2002 | A1 |
20180353042 | Gil | Dec 2018 | A1 |
20190294889 | Sriram | Sep 2019 | A1 |
20200021733 | Liu | Jan 2020 | A1 |
20200043192 | Zhang | Feb 2020 | A1 |
20200167893 | Gupta | May 2020 | A1 |
20200302612 | Marrero | Sep 2020 | A1 |
20200311903 | Tsutsumi | Oct 2020 | A1 |
20210295594 | Sinha | Sep 2021 | A1 |
20210342576 | Hiebl | Nov 2021 | A1 |
20230177699 | Ray | Jun 2023 | A1 |
20230204424 | Ranganathan | Jun 2023 | A1 |
Number | Date | Country |
---|---|---|
69911780 | Jul 2004 | DE |
Entry |
---|
“Polygon Mesh”, cited in Wikipedia, The Free Encyclopedia, Processing status: Mar. 29, 2020, 11:21 UTC. URL: https://en.wikipedia.org/w/index.php?title=Polygon_mesh&oldid=947958119 [accessed on Mar. 9, 2021]. |
Osman Ali AS, Asirvadam VS, Malik AS, Eltoukhy MM, Aziz A. Age-invariant face recognition using triangle geometric features. International Journal of Pattern Recognition and Artificial Intelligence. May 26, 2015;29(05):1556006. |
Extended European Search Report including Written Opinion for EP21171300.3 issued Sep. 15, 2021; 7 pages. |
Number | Date | Country | |
---|---|---|---|
20210342576 A1 | Nov 2021 | US |