This application claims the benefit of Chinese Patent Application No. 201810721890.3, filed Jul. 4, 2018, which is hereby incorporated by reference herein in its entirety.
The present invention relates to image processing, and more particularly to, for example, attribute recognition.
Since personal attributes can generally depict an appearance and/or a body shape of a person, person attribute recognition (especially, multi-tasking person attribute recognition) is generally used to perform monitoring processing such as crowd counting, identity verification, and the like. Here, the appearance includes, for example, age, gender, race, hair color, whether the person wears glasses, whether the person wears a mask, etc., and the body shape includes, for example, height, weight, and clothes worn by the person, whether the person carries a bag, whether the person pulls a suitcase, etc. Here, the multi-tasking person attribute recognition indicates that a plurality of attributes of one person are to be recognized at the same time. However, in the actual monitoring processing, since the variability and complexity of the monitoring scene usually cause a case where the illumination of the captured image is insufficient, a case where the face/body of the person in the captured image is occluded, or the like, it becomes an important part in the entire monitoring processing about how to maintain high recognition accuracy of the person attribute recognition in a variable monitoring scene.
As for variable and complex scenes, an exemplary processing method is disclosed in “Switching Convolutional Neural Network for Crowd Counting” (Deepak Babu Sam, Shiv Surya, R. Venkatesh Babu; IEEE Computer Society, 2017:4031-4039), which is mainly to estimate the crowd density in the image by using two neural networks independent of each other. Specifically, firstly, one neural network is used to determine a level corresponding to the crowd density in the image, where the level corresponding to the crowd density indicates a range of the number of persons that may exist at this level; secondly, one neural network candidate corresponding to the level is selected from a set of neural network candidates according to the determined level, where each neural network candidate among the set of neural network candidates corresponds to one level of the crowd density; and then, the actual crowd density in the image is estimated by using the selected neural network candidate, to ensure the accuracy of estimating the crowd density at different levels.
According to the above exemplary processing method, it can be seen that, as for the person attribute recognition at different scenes (i.e., variable and complex scenes), the accuracy of recognition can be improved by using two neural networks independent of each other. For example, at first, one neural network may be used to recognize a scene of an image, where the scene may be recognized, for example, by a certain attribute (e.g., whether or not a mask is worn) of a person in the image; and then, a neural network corresponding to the scene is selected to recognize a person attribute (e.g., age, gender, etc.) in the image. However, the scene recognition operation and the person attribute recognition operation respectively performed by using the two neural networks are independent of each other, and the result of the scene recognition operation is merely used to select a suitable neural network for the person attribute recognition operation to perform the corresponding recognition operation, but the mutual association and mutual influence that may exist between the two recognition operations are not considered, so that the entire recognition processing requires to take a long time.
In view of the above recordation in the Description of the Related Art, the present disclosure is directed to solve at least one of the above issues.
According to one aspect of the present disclosure, there is provided an attribute recognition apparatus comprising: an extraction unit that extracts a first feature from an image by using a feature extraction neural network; a first recognition unit that recognizes a first attribute of an object in the image based on the first feature by using a first recognition neural network; a determination unit that determines a second recognition neural network from a plurality of second recognition neural network candidates based on the first attribute; and a second recognition unit that recognizes at least one second attribute of the object based on the first feature by using a second recognition neural network. Wherein, the first attribute is, for example, whether the object is occluded by an occluder.
According to another aspect of the present disclosure, there is provided an attribute recognition method comprising: an extracting step of extracting a first feature from an image by using a feature extraction neural network; a first recognizing step of recognizing a first attribute of an object in the image based on the first feature by using a first recognition neural network; a determination step of determining a second recognition neural network from a plurality of second recognition neural network candidates based on the first attribute; and a second recognizing step of recognizing at least one second attribute of the object based on the first feature by using a second recognition neural network.
Since the present disclosure extracts, for the subsequent first recognition operation and second recognition operation, a feature (i.e., a first feature) which they need to use commonly, by using a feature extraction neural network, redundant operations (for example, repeated extraction of features) between the first recognition operation and the second recognition operation can be greatly reduced, and further, the time required to be taken by the entire recognition processing can be greatly reduced.
Further features and advantages of the present disclosure will become apparent from the following description of typical embodiments with reference to the attached drawings.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the present disclosure and, together with the description of the embodiments, serve to explain the principles of the present disclosure.
Exemplary embodiments of the present disclosure will be described below in detail with reference to the accompanying drawings. It should be noted that the following description is essentially merely illustrative and exemplary, and is in no way intended to limit the invention and its application or use. The relative arrangement of the components and steps, numerical expressions and numerical values set forth in the embodiments do not limit the scope of the invention, unless specifically stated otherwise. In addition, techniques, methods, and devices known by those skilled in the art may not be discussed in detail, but should be a part of the specification where appropriate.
It is noted that similar reference signs and characters refer to similar items in the drawings, and therefore, once one item is defined in one figure, it is not necessary to discuss this item in the following figures.
As for the object attribute recognition (for example, person attribute recognition) at different scenes, especially, the multi-tasking object attribute recognition, the inventor has found that the recognition operations for the scenes and/or the object attributes in an image are actually recognition operations performed on the same image for different purposes/tasks, thus these recognition operations will necessarily use certain features (for example, features that are identical or similar in semantics) in the image commonly. Therefore, the inventor believes that, before using a neural network (for example, “first recognition neural network” and “second recognition neural network” referred to hereinafter) to perform a corresponding recognition operation, if these features (for example, “first feature” and “shared feature” referred to hereinafter) can be extracted from the image at first by using a specific network (for example, “feature extraction neural network” referred to hereinafter) and then are used in subsequent recognition operations respectively, redundant operations (for example, repeated extraction of features) between the recognition operations can be greatly reduced, and further, the time required to be taken by the entire recognition processing can be greatly reduced.
Further, as for the multi-tasking object attribute recognition, the inventor has found that, when recognizing a certain attribute of an object, the features associated with this attribute will be mainly used. For example, when recognizing whether a person wears a mask, a feature that will be mainly used is, for example, a probability distribution of the mask. Moreover, the inventor has found that, when a certain attribute of the object has been recognized and other attributes of the object need to be subsequently recognized, if the feature associated with the attribute that has been already recognized can be removed so as to obtain, for example, “second feature” and “filtered feature” referred to hereinafter, the interference caused by the removed feature on the recognition of other attributes of the object can be reduced, thereby the accuracy of the entire recognition processing can be improved and the robustness of the object attribute recognition can be enhanced. For example, after recognizing that a person wears a mask, when it is still necessary to continue recognizing attributes, such as age, gender, etc., of the person, if the feature associated with the mask can be removed, the interference caused by the feature associated with the mask on the recognition of the attributes, such as age, gender, etc., can be reduced.
The present disclosure has been proposed in view of the findings described above and will be described below in detail with reference to the accompanying drawings.
(Hardware Configuration)
A hardware configuration which can implement the technique described below will be described at first with reference to
The hardware configuration 100 includes, for example, a central processing unit (CPU) 110, a random access memory (RAM) 120, a read only memory (ROM) 130, a hard disk 140, an input device 150, an output device 160, a network interface 170, and a system bus 180. Further, the hardware configuration 100 may be implemented by, for example, a camera, a video camera, a personal digital assistant (PDA), a tablet, a laptop, a desktop, or other suitable electronic devices.
In one implementation, the attribute recognition according to the present disclosure is configured by hardware or firmware and functions as a module or a component of the hardware configuration 100. For example, the attribute recognition apparatus 200, which will be described below in detail with reference to
The CPU 110 is any suitable programmable control device such as a processor, and may execute various functions to be described below by executing various application programs stored in the ROM 130 or the hard disk 140 (such as a memory). The RAM 120 is used to temporarily store program or data loaded from the ROM 130 or the hard disk 140, and is also used as a space in which the CPU 110 executes various processes (such as, carries out a technique which will be described below in detail with reference to
In one implementation, the input device 150 is used to allow a user to interact with the hardware configuration 100. In one example, the user may input image/video/data through the input device 150. In another example, the user may trigger corresponding processing of the present disclosure through the input device 150. In addition, the input device 150 may adopt various forms, such as a button, a keyboard or a touch screen. In another implementation, the input device 150 is used to receive image/video output from specialized electronic devices such as digital camera, video camera, network camera, and/or the like.
In one implementation, the output device 160 is used to display a recognition result (such as, an attribute of an object) to the user. Moreover, the output device 160 may adopt various forms such as a cathode ray tube (CRT), a liquid crystal display, or the like.
The network interface 170 provides an interface for connecting the hardware configuration 100 to a network. For example, the hardware configuration 100 may perform data communication, via the network interface 170, with another electronic device connected via the network.
Alternatively, the hardware configuration 100 may be provided with a wireless interface to perform wireless data communication. The system bus 180 may provide a data transmission path for transmitting data among the CPU 110, the RAM 120, the ROM 130, the hard disk 140, the input device 150, the output device 160, the network interface 170, and the like to one another. Although being referred to as a bus, the system bus 180 is not limited to any particular data transmission technique.
The hardware configuration 100 described above is merely illustrative and is in no way intended to limit the invention and its application or use. Moreover, for the sake of brevity, only one hardware configuration is illustrated in
(Attribute Recognition)
Next, the attribute recognition according to the present disclosure will be described with reference to
In addition, the storage device 240 illustrated in
Firstly, the input device 150 illustrated in
Then, as illustrated in
The first recognition unit 220 acquires the first recognition neural network from the storage device 240, and recognizes the first attribute of an object in the received image based on the shared feature extracted by the extraction unit 210 by using the first recognition neural network. Here, the first attribute of the object is, for example, whether the object is occluded by an occluder (e.g., whether the face of the person is occluded by a mask, whether the clothes worn by the person are occluded by another object, etc.).
The second recognition unit 230 acquires the second recognition neural network from the storage device 240, and recognizes at least one second attribute (e.g., age of person, gender of person, and/or the like) of the object based on the shared feature extracted by the extraction unit 210 by using the second recognition neural network. Here, one second recognition neural network candidate is determined from a plurality of second recognition neural network candidates stored in the storage device 240 as the second recognition neural network that can be used by the second recognition unit 230, based on the first attribute recognized by the first recognition unit 220. In one implementation, the determination of the second recognition neural network can be implemented by the second recognition unit 230. In another implementation, the determination of the second recognition neural network can be implemented by a dedicated selection unit or determination unit (not illustrated).
Finally, the first recognition unit 220 and the second recognition unit 230 transmit the recognition results (e.g., the recognized first attribute of the object, and the recognized second attribute of the object) to the output device 160 via the system bus 180 illustrated in
Here, the recognition processing performed by the attribute recognition apparatus 200 may be regarded as a multi-tasking object attribute recognition processing. For example, the operation executed by the first recognition unit 220 may be regarded as a recognition operation of a first task, and the operation executed by the second recognition unit 230 may be regarded as a recognition operation of a second task. The second recognition unit 230 can recognize a plurality of attributes of the object.
Here, what the attribute recognition apparatus 200 recognizes is an attribute of one object of the received image. In the case where a plurality of objects (e.g., a plurality of persons) are included in the received image, all of the objects in the received image may be detected at first, and then, for each of the objects, the attribute thereof may be recognized by the attribute recognition apparatus 200.
The flowchart 300 illustrated in
As illustrated in
In the first recognizing step S320, the first recognition unit 220 acquires the first recognition neural network from the storage device 240, and recognizes the first attribute of the target person, i.e., whether the face of the target person is occluded by a mask, based on the shared feature extracted from the extracting step S310 by using the first recognition neural network. In an implementation, the first recognition unit 220 acquires at first a scene feature of the region where the target person is located from the shared feature, and then obtains a probability value (for example, P(M1)) that the face of the target person is occluded by the mask and a probability value (for example, P(M2)) that the face of the target person is not occluded by the mask based on the acquired scene feature by using the first recognition neural network, and after this, selects the attribute with the largest probability value as the first attribute of the target person, where P(M1)+P(M2)=1. For example, in the case of P(M1)>P(M2), the first attribute of the target person is that the face is occluded by the mask, and the confidence of the first attribute of the target person at this time is Ptask1=P(M1); and in the case of P(M1)<P(M2), the first attribute of the target person is that the face is not occluded by the mask, and the confidence of the first attribute of the target person at this time is Ptask1=P(M2).
In step S330, for example, the second recognition unit 230 determines one second recognition neural network candidate from the plurality of second recognition neural network candidates stored in the storage device 240 as the second recognition neural network that can be used by the second recognition unit 230, based on the first attribute of the target person. For example, in the case where the first attribute of the target person is that the face is occluded by the mask, the second recognition neural network candidate trained through the training samples of the face wearing a mask will be determined as the second recognition neural network. On the contrary, in the case where the first attribute of the target person is that the face is not occluded by the mask, the second recognition neural network candidate trained through the training samples of the face not wearing a mask will be determined as the second recognition neural network. Obviously, in the case where the first attribute of the target person is another attribute, for example, whether the clothes worn by the person are occluded by another object, the second recognition neural network candidate corresponding to the attribute may be determined as the second recognition neural network.
In the second recognizing step S340, the second recognition unit 230 recognizes the second attribute of the target person, i.e., the age of the target person, based on the shared feature extracted from the extracting step S310 by using the determined second recognition neural network. In one implementation, the second recognition unit 230 acquires at first the person attribute feature of the target person from the shared feature, and then recognizes the second attribute of the target person based on the acquired person attribute feature by using the second recognition neural network.
Finally, the first recognition unit 220 and the second recognition unit 230 transmit the recognition results (e.g., whether the target person is occluded by a mask, and the age of the target person) to the output device 160 via the system bus 180 illustrated in
Further, as described above, in the multi-tasking object attribute recognition, as for the attribute that has been already recognized, if the feature associated with the recognized attribute can been removed, the interference caused by this feature on the subsequent recognition of the second attribute can be reduced, thereby the accuracy of the entire recognition processing can be improved and the robustness of the object attribute recognition can be enhanced. Thus,
As illustrated in
After the first generation unit 221 generates the saliency feature, on the one hand, the classification unit 222 recognizes the first attribute of the object to be recognized based on the saliency feature generated by the first generation unit 221 using the first recognition neural network. Here, the first recognition neural network used by the first recognition unit 220 (that is, the first generation unit 221 and the classification unit 222) in the present embodiment may be used to generate the saliency feature in addition to recognizing the first attribute of the object, and the first recognition neural network that can be used in the present embodiment may also be similarly obtained by referring to the generation method of each neural network described with reference to
On the other hand, the second generation unit 410 generates a second feature based on the shared feature extracted by the extraction unit 210 and the saliency feature generated by the first generation unit 221. Here, the second feature is a feature associated with a second attribute of the object to be recognized by the second recognition unit 230. In other words, the operation performed by the second generation unit 410 is to perform a feature filtering operation on the shared feature extracted by the extraction unit 210 by using the saliency feature generated by the first generation unit 221, so as to remove the feature associated with the first attribute of the object (that is, remove the feature associated with the attribute that has been already recognized). Thus, hereinafter, the generated second feature will be referred to as a “filtered feature” for example.
After the second generation unit 410 generates the filtered feature, the second recognition unit 230 recognizes the second attribute of the object based on the filtered feature by using the second recognition neural network.
In addition, since the extraction unit 210 and the second recognition unit 230 illustrated in
The flowchart 500 illustrated in
As illustrated in
After the first generation unit 221 generates the probability distribution map of the mask in the first generating step S321, on the one hand, in the classifying step S322, the classification unit 222 recognizes the first attribute of the target person (i.e., whether the face of the target person is occluded by a mask) based on the probability distribution map of the mask generated in the first generating step S321 by using the first recognition neural network. Since the operation of the classifying step S322 is similar to the operation of the first recognizing step S320 illustrated in
On the other hand, in the second generating step S510, the second generation unit 410 generates a filtered feature (that is, the feature associated with the mask is removed from this feature) based on the shared feature extracted in the extracting step S310 and the probability distribution map of the mask generated in the first generating step S321. In one implementation, as for each pixel block (e.g., pixel block 670 as illustrated in
After the second generation unit 410 generates the filtered feature in the second generating step S510, on the one hand, in step S330, for example, the second recognition unit 230 determines the second recognition neural network that can be used by the second recognition unit 230 based on the first attribute of the target person. Since the operation of step S330 here is the same as the operation of step S330 illustrated in
In addition, since the extracting step S310 illustrated in
As described above, according to the present disclosure, on the one hand, before a multi-tasking object attribute recognition is performed, the present disclosure may extract at first a feature (i.e., a “shared feature”), which needs to be used commonly when recognizing each attribute, from the image by using a specific network (i.e., the “feature extraction neural network”), thereby redundant operations between the attribute recognition operations can be greatly reduced, and further, the time required to be taken by the entire recognition processing can be greatly reduced. On the other hand, when a certain attribute (e.g., the first attribute) of the object has been recognized and other attributes (e.g., the second attribute) of the object need to be subsequently recognized, the present disclosure may remove at first the feature associated with the attribute that has been already recognized from the shared feature so as to obtain the “filtered feature”, and then the interference caused by the removed feature on the recognition of other attributes of the object can be reduced, thereby the accuracy of the entire recognition processing can be improved and the robustness of the object attribute recognition can be enhanced.
(Generation of Neural Network)
In order to generate a neural network that can be used in the first embodiment and the second embodiment of the present disclosure, a corresponding neural network may be generated in advance based on a preset initial neural network and training samples by using the generation method described with reference to
In one implementation, in order to increase the convergence and stability of the neural network,
First, as illustrated in
Then, in step S710, the CPU 110 updates the feature extraction neural network and the first recognition neural network simultaneously based on the acquired training samples in a manner of back propagation.
In one implementation, as for the first embodiment of the present disclosure, firstly, the CPU 110 passes the currently acquired training samples through the current “feature extraction neural network” (e.g., the initial “feature extraction neural network”) to obtain a “shared feature”, and passes the “shared feature” through the current “first recognition neural network” (e.g., the initial “first recognition neural network”) to obtain a predicted probability value for the first attribute of the object. For example, in the case where the first attribute of the object is whether the face of the person is occluded by an occluder, the obtained predicted probability value is a predicted probability value that the face of the person is occluded by the occluder. Secondly, the CPU 110 determines a loss between the predicted probability value and the true value for the first attribute of the object, which may be represented as Ltask1 for example, by using loss functions (e.g., Softmax Loss function, Hinge Loss function, Sigmoid Cross Entropy function, etc.). Here, the true value for the first attribute of the object may be obtained according to the corresponding labels in the currently acquired training samples. Again, the CPU 110 updates the parameters of each layer in the current “feature extraction neural network” and the current “first recognition neural network” based on the loss Ltask1 in the manner of back propagation, where the parameters of each layer here are, for example, the weight values in each convolutional layer in the current “feature extraction neural network” and the current “first recognition neural network”. In one example, the parameters of each layer are updated based on the loss Ltask1 by using a stochastic gradient descent method for example.
In another implementation, as for the second embodiment of the present disclosure, firstly, the CPU 110 passes the currently acquired training samples through the current “feature extraction neural network” (e.g., the initial “feature extraction neural network”) to obtain the “shared feature”, passes the “shared feature” through the current “first recognition neural network” (e.g., the initial “first recognition neural network”) to obtain a “saliency feature” (e.g., a probability distribution map of the occluder), and passes the “saliency feature” through the current “first recognition neural network” to obtain the predicted probability value for the first attribute of the object. Here, the operation of passing through the current “first recognition neural network” to obtain the “saliency feature” can be realized by using a weak supervised learning algorithm. Secondly, as described above, the CPU 110 determines the loss Ltask1 between the predicted probability value and the true value for the first attribute of the object, and updates the parameters of each layer in the current “feature extraction neural network” and the current “first recognition neural network” based on the loss Ltask1.
Returning to
As a replacement of the steps S710 and S720, for example, after the loss Ltask1 is determined, the CPU 110 compares the determined Ltask1 with a threshold (e.g., TH1). In the case where Ltask1 is less than or equal to TH1, the current “feature extraction neural network” and the current “first recognition neural network” are determined to have satisfied the predetermined condition, and then the generation process proceeds to other update operations (for example, step S730), otherwise, the CPU 110 updates the parameters of each layer in the current “feature extraction neural network” and the current “first recognition neural network” based on the loss Ltask1. After this, the generation process re-proceeds to the operation of updating the feature extraction neural network and the first recognition neural network (e.g., step S710).
Returning to
In one implementation, as for the first embodiment of the present disclosure, firstly, on the one hand, the CPU 110 passes the currently acquired training samples through the current “feature extraction neural network” (e.g., the “feature extraction neural network” updated via step S710) to obtain the “shared feature”, and passes the “shared feature” through the current “first recognition neural network” (e.g., the “first recognition neural network” updated via step S710) to obtain the predicted probability value for the first attribute of the object, for example, the predicted probability value that the face of the person is occluded by the occluder, as described above for step S710. On the other hand, the CPU 110 passes the “shared feature” through the current “nth candidate network” (e.g., the initial “nth candidate network”) to obtain a predicted probability value for the second attribute of the object, wherein how many second attributes that need to be recognized via the nth candidate network are there, how many corresponding predicted probability values are there. Secondly, on the one hand, the CPU 110 determines the loss (which may be represented as Ltask1 for example) between the predicted probability value and the true value for the first attribute of the object and the loss (which may be represented as Ltask-others for example) between the predicted probability value and the true value for the second attribute of the object respectively by using loss functions. Here, the true value for the second attribute of the object may also be obtained according to the corresponding labels in the currently acquired training samples. On the other hand, the CPU 110 calculates a loss sum (which may be represented as L1 for example), that is, the loss sum L1 is the sum of the loss Ltask1 and the loss Ltask-others. That is, the loss sum L1 may be obtained by the following formula (1):
L1=Ltask1+Ltask-others (1)
Furthermore, the CPU 110 updates the parameters of each layer in the current “nth candidate network”, the current “feature extraction neural network”, and the current “first recognition neural network” based on the loss sum L1 in the manner of back propagation.
In another implementation, as for the second embodiment of the present disclosure, firstly, on the one hand, the CPU 110 passes the currently acquired training samples through the current “feature extraction neural network” (e.g., the “feature extraction neural network” updated via step S710) to obtain the “shared feature”, passes the “shared feature” through the current “first recognition neural network” (e.g., the “first recognition neural network” updated via step S710) to obtain the “saliency feature”, and passes the “saliency feature” through the current “first recognition neural network” to obtain the predicted probability value for the first attribute of the object. On the other hand, the CPU 110 performs a feature filtering operation on the “shared feature” by using the “saliency feature” to obtain a “filtered feature”, and passes the “filtered feature” through the current “nth candidate network” to obtain the predicted probability value for the second attribute of the object.
Secondly, as described above, the CPU 110 determines each loss and calculates the loss sum L1, and updates the parameters of each layer in the current “nth candidate network”, the current “feature extraction neural network”, and the current “first recognition neural network” based on the loss sum L1.
Returning to
As described above, how many categories of the first attribute of the object are there, how many second recognition neural network candidates are there correspondingly. Assuming that the number of categories of the first attribute of the object is N, in step S750, the CPU 110 determines whether all of the second recognition neural network candidates are updated, that is, determines whether n is greater than N. In the case of n>N, the generation process proceeds to step S770. Otherwise, in step S760, the CPU 110 sets n=n+1, and the generation process re-proceeds to step S730.
In step S770, the CPU 110 updates each of the second recognition neural network candidates, the feature extraction neural network, and the first recognition neural network simultaneously based on the acquired training samples in the manner of back propagation.
In one implementation, as for the first embodiment of the present disclosure, firstly, on the one hand, the CPU 110 passes the currently acquired training samples through the current “feature extraction neural network” (e.g., the “feature extraction neural network” updated via step S730) to obtain the “shared feature”, and passes the “shared feature” through the current “first recognition neural network” (e.g., the “first recognition neural network” updated via step S730) to obtain the predicted probability value for the first attribute of the object, for example, the predicted probability value that the face of the person is occluded by the occluder, as described above for step S710. On the other hand, as for each candidate network among the second recognition neural network candidates, the CPU 110 passes the “shared feature” through the current candidate network (e.g., the candidate network updated via step S730) to obtain a predicted probability value for the second attribute of the object under this candidate network. Secondly, on the one hand, the CPU 110 determines the loss (which may be represented as Ltask1 for example) between the predicted probability value and the true value for the first attribute of the object and the loss (which may be represented as Lask-others(n) for example) between the predicted probability value and the true value for the second attribute of the object under each candidate network respectively by using loss functions. Here, Lask-others(n) represents a loss between the predicted probability value and the true value for the second attribute of the object under the nth candidate network. On the other hand, the CPU 110 calculates a loss sum (which may be represented as L2 for example), that is, the loss sum L2 is the sum of the loss Ltask1 and the loss Ltask-others(n). That is, the loss sum L2 may be obtained by the following formula (2):
L2=Ltask1+Ltask-others(1)+ . . . +Ltask-others(n)+ . . . +Ltask-others(N) (2)
As a replacement, in order to obtain a more robust neural network, Ltask-others(n) may be weighted based on the obtained predicted probability value for the first attribute of the object during the process of calculating the loss sum L2 (that is, the obtained predicted probability value for the first attribute of the object may be used as a parameter for Ltask-others(n)), such that the accuracy of the prediction of the second attribute of the object can be maintained even in the case where an error occurs in the prediction of the first attribute of the object. For example, taking an example that the first attribute of the object is whether the face of the person is occluded by an occluder, and assuming that the obtained predicted probability value that the face of the person is occluded by the occluder is P(C), the predicted probability value that the face of the person is not occluded by the occluder may be obtained to be 1-P(C), thereby the loss sum L2 may be obtained by the following formula (3):
L3=Ltask1+P(C)*Ltask-others(1)+(1−P(C))*Ltask-others(2) (3)
Where, Ltask-others(1) represents a loss between the predicted probability value and the true value for the second attribute of the person in the case where the face thereof is occluded by an occluder, and Ltask-others(2) represents a loss between the predicted probability value and the true value for the second attribute of the person in the case where the face thereof is not occluded by an occluder. Again, after the loss sum L2 is calculated, the CPU 110 updates the parameters of each layer in each of the current second recognition neural network candidates, the current “feature extraction neural network”, and the current “first recognition neural network” based on the loss sum L2 in the manner of back propagation.
In another implementation, as for the second embodiment of the present disclosure, firstly, on the one hand, the CPU 110 passes the currently acquired training samples through the current “feature extraction neural network” (e.g., the “feature extraction neural network” updated via step S730) to obtain the “shared feature”, passes the “shared feature” through the current “first recognition neural network” (e.g., the “first recognition neural network” updated via step S730) to obtain the “saliency feature”, and passes the “saliency feature” through the current “first recognition neural network” to obtain the predicted probability value for the first attribute of the object. On the other hand, the CPU 110 performs the feature filtering operation on the “shared feature” by using the “saliency feature” to obtain the “filtered feature”. And for each candidate network among the second recognition neural network candidates, the CPU 110 passes the “filtered feature” through the current candidate network to obtain the predicted probability value for the second attribute of the object under this candidate network. Secondly, as described above, the CPU 110 determines each loss and calculates the loss sum L2, and updates the parameters of each layer in each of the current second recognition neural network candidates, the current “feature extraction neural network”, and the current “first recognition neural network” based on the loss sum L2.
Returning to
All of the units described above are exemplary and/or preferred modules for implementing the processing described in this disclosure. These units may be hardware units, such as field programmable gate arrays (FPGAs), digital signal processors, application specific integrated circuits, etc., and/or software modules, such as computer readable programs. The units for implementing each of the steps are not described exhaustedly above. However, when there is a step to perform a particular process, there may be a corresponding functional module or unit (implemented by hardware and/or software) for implementing the same process. The technical solutions of all combinations of the steps described and the units corresponding to these steps are included in the disclosed content of the present application, as long as the technical solutions constituted by them are complete and applicable.
The method and apparatus of the present disclosure may be implemented with a plurality of manners. For example, the method and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination thereof. The above described order of steps of the present method is intended to be merely illustrative, and the steps of the method of the present disclosure are not limited to the order specifically described above, unless specified otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as a program recorded in a recording medium, which includes machine readable instructions for implementing the method according to the invention. Accordingly, the present disclosure also encompasses a recording medium storing a program for implementing the method according to the present disclosure.
While some specific embodiments of the present disclosure have been shown in detail by way of examples, it is to be appreciated by those skilled in the art that the above examples are intended to be merely illustrative and do not limit the scope of the invention. It is to be appreciated by those skilled in the art that the above embodiments may be modified without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is restricted by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
201810721890.3 | Jul 2018 | CN | national |