The present invention relates to detection learning apparatus, method, and program that classify data as a positive example or a negative example.
Techniques for detecting target data from a lot of data have been devised based on machine learning approaches, and in recent years, deep learning detectors have been known to exhibit high performance on complex data.
Indicators of performance of the detector include a recall rate (or true positive rate) indicating a ratio at which target data to be detected can be correctly detected and a false positive rate indicating a ratio at which data that should not be detected is incorrectly detected. However, because there is a trade-off relationship between these rates, when training is made to increase the true positive rate (TPR), the false positive rate (FPR) disadvantageously increases. An approach of using an area under the curve (AUC) of a receiver operating characteristic (ROC) curve has been often used as an indicator for resolving such trade-off The ROC curve is a curve on a graph plotting the correspondence between the TPR and the FPR, i.e., a curve representing the correspondence between the true positive rate (TPR.) that is the probability of correctly classifying positive example data as the positive example and the false positive rate (FPR) that is the probability of incorrectly classifying negative example data as the positive example. By maximizing the AUC, which is the area formed by the ROC curve, a well-balanced detector can be trained.
Non Patent Literature 1: Naonori Ueda and Akinori Fujino “Partial AUC Maximization via Nonlinear Scoring Functions.” arXiv preprint arXiv: 1806.04838 (2018).
However, in actually utilizing the detector for a particular purpose, a detector that ensures particular performance rather than the well-balanced detector may be required. For example, given that parts produced at the factory are inspected using images to detect defective products, it is necessary to set a sufficiently high TPR so as not to pass any defective product, but for the FPR, error detection may be acceptable to some extent. Maximizing partial AUC (pAUC) has been proposed as an indicator for increasing detection performance assuming a certain TPR (Non Patent Literature 1). This is an approach that maximizes the detection performance in the corresponding TPR or FPR by maximizing a part of the area indicated by the AUC, as illustrated in
To address such problem, according to the present invention, the detection performance in the desired TPR, or FPR is maximized by using an approach that narrows the target region in a stepwise manner to maximize the pAUC.
The present invention is made in light of the foregoing circumstances, and its object is to provide detection learning apparatus, method, and program that can train a well-balanced detector in the vicinity of a desired TPR or FPR.
To achieve the above-described object, a detection learning apparatus according to a first aspect of the invention includes: a region-to-be-maximized setting unit configured to make a setting so as to narrow, at each repetition, a range determined by an upper limit and a lower limit of a true positive rate or false positive rate for defining a part of an area under a Receiver Operating Characteristic (ROC) curve on a graph representing a correspondence between the true positive rate that is probability of correctly classifying positive example data as a positive example and the false positive rate that is probability of incorrectly classifying negative example data as the positive example; a maximization learning unit configured to train a score function so as to optimize an objective function represented using positive example data selected from ranked. positive example data, negative example data, and the score function that calculates a score representing likelihood of a positive example according to the set range between the upper limit and the lower limit of the true positive rate or the false positive rate; a ranking unit configured to rank the positive example data based on the score calculated using the score function; and a determination unit configured to cause the maximization learning unit and the ranking unit to repeat the processing until the objective function is converged, and the region-to-be-maximized setting unit to repeat setting until the range between the upper limit and the lower limit of the true positive rate or the false positive rate becomes a predetermined size,
Further, in the detection learning apparatus according to the first aspect of the invention, the maximization learning unit may select positive example data included in the range between the upper limit and the lower limit when the ranking is indicated as a percentage of the total positive example data, from the ranked positive example data
A detection learning apparatus according to a second aspect of the invention includes: a region-to-be-maximized setting unit configured to make a setting so as to narrow, at each repetition, a range determined by an upper limit and a lower limit of a false positive rate for defining a part of an area under a Receiver Operating Characteristic (ROC) curve on a graph representing a correspondence between a true positive rate that is probability of correctly classifying positive example data as a positive example and the false positive rate that is probability of incorrectly classifying negative example data as the positive example; a maximization learning unit configured to train a score function so as to optimize an objective function represented using negative example data selected from ranked negative example data, positive example data, and the score function that calculates a score representing likelihood of a positive example according to the set range between the upper limit and the lower limit of the false positive rate; a ranking unit configured to rank the negative example data based on the score calculated using the score function; and a determination unit configured to cause the maximization learning unit and the ranking unit to repeat the processing until the objective function is converged, and the region-to-be-maximized setting unit to repeat setting until the range between the upper limit and the lower limit of the false positive rate becomes a predetermined size.
A detection learning method according to a third aspect of the present invention includes: at a region-to-be-maximized setting unit, making setting so as to narrow, at each repetition, a range determined by an upper limit and a lower limit of a true positive rate or false positive rate for defining a part of an area under a Receiver Operating Characteristic (ROC) curve on a graph representing a correspondence between the true positive rate that is probability of correctly classifying positive example data as a positive example and a false positive rate that is probability of incorrectly classifying negative example data. as the positive example; at a maximization learning unit, training a score function so as to optimize an objective function represented using positive example data selected from ranked positive example data, negative example data, and the score function that calculates a score representing likelihood of a positive example according to the set range between the upper limit and the lower limit of the true positive rate or the false positive rate; at a ranking unit, ranking the positive example data based on the score calculated using the score function; and at a determination unit, causing the maximization learning unit and the ranking unit to repeat the processing until the objective function is converged, and the region-to-be-maximized setting unit to repeat setting until the range between the upper limit and the lower limit of the true positive rate or the false positive rate becomes a predetermined size.
Further, in the detection learning method according to the third aspect of the invention, the maximization learning unit may select positive example data included in the range between the upper limit and the lower limit when the ranking is indicated as a. percentage of the total positive example data, from the ranked positive example data.
A detection learning method according to a fourth aspect of the present invention includes: at a region-to-be-maximized setting unit, making setting so as to narrow, at each repetition, a range determined by an upper limit and a lower limit of a false positive rate for defining a part of an area under a Receiver Operating Characteristic (ROC) curve on a graph representing a correspondence between a true positive rate that is probability of correctly classifying positive example data as a positive example and the false positive rate that is probability of incorrectly classifying negative example data as the positive example; at a maximization learning unit, training a score function so as to optimize an objective function represented using negative example data selected from ranked negative example data, positive example data, and the score function that calculates a score representing likelihood of a positive example according to the set range between the upper limit and the lower limit of the false positive rate; at a ranking unit, ranking the negative example data based on the score calculated using the score function; and at a determination unit, causing the maximization learning unit and the ranking unit to repeat the processing until the objective function is converged, and the region-to-be-maximized setting unit to repeat setting until the range between the upper limit and the lower limit of the false positive rate becomes a predetermined size.
A program according to a fifth aspect of the invention is a program that causes a computer to function as each unit of the detection learning apparatus described in the first aspect of the invention.
The detection learning apparatus, method, and program of the present invention can advantageously train a well-balanced detector in the vicinity of a desired TPR or FPR.
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
A detector is trained by maximization of the pAUC in the vicinity of a desired TPR or FPR, In the embodiment of the present invention, a case of training the detector by the pAUC maximization in the vicinity of the TPR is described as an example. In this case, the narrow pAUC easily results in a local solution and hardly achieves high performance, while the wide pAUC cannot achieve performance specific to a desired parameter. In the embodiment of the present invention, a target region of the pAUC is initially set wide and gradually narrowed, thereby facilitating learning to achieve optimization in specific parameters.
Configuration of Detection Learning Apparatus According to Embodiment of Present Invention
Next, a configuration of a detection learning apparatus according to the embodiment of the present invention will be described. As illustrated in
The detection learning apparatus 100 receives the training data 10 with positive and negative examples.
The operation unit 20 includes a region-to-be-maximized setting unit 30, a maximization learning unit 32, a ranking unit 34, and a determination unit 36. The operation unit 20 is configured to include a region 21 to be maximized set by the region-to-be-maximized setting unit 30, a detector parameter 22 learned by the maximization learning unit 32, and score ranking 23 found by the ranking unit 34.
The region-to-be-maximized setting unit 30 determines a partial region of the AUC to be maximized. For the received training data 10, the maximization learning unit 32 trains the detector so as to maximize the pAUC for the set partial region. The ranking unit 34 executes processing of sorting the training data in the order of score according to the learned detector. The score ranking acquired by the ranking unit 34 is used in the maximization learning unit 32. The region 21 to be maximized is gradually narrowed while the determination unit 36 repeats three processes, such that the detector parameter 22 acquired at optimization in a sufficiently narrow region is output as a learning result.
Details of each processing unit will be described below.
The region-to-be-maximized setting unit 30 gradually narrows a range determined by an upper limit and a lower limit of the true positive rate for defining a part of the area under the ROC curve (region 21 to be maximized) at each repetition,
The region-to-be-maximized setting unit 30 sets a partial region of the AUC maximized based on a value of the required TPR or FPR as the region 21 to be maximized. In the present embodiment, it is assumed that the required TPR is α as an example. In this case, the FPR at the TPR of a can be minimized by maximizing the vicinity of the region where the TPR=α and however, to avoid a. local solution, learning is performed by gradually narrowing the region 21 to be maximized.
A lower limit Rl and an upper limit Ru of the set region 21 to be maximized are expressed as a following equation (1).
[Math. 1]
R
l=α−δl(n)
R
u=α+δu(n) (1)
Here, n written on the upper right of δ indicates the number of times the region-to-be-maximized setting unit 30 has made setting. Because the entire region of 0<TPR<1 is set as the target region at initial setting, it is given that δl(0)=α, δu(0)=1−α. For the second and subsequent times, the region 21 to be maximized is changed according to a following equation (2) each time the region-to-be-maximized setting unit 30 makes setting.
[Math. 2]
δl(n+1)=ηδl(n)
δu(n+1)=ηδu(n) (2)
Here, η is a parameter indicating am attenuation ratio of the region 21 to be maximized. η may be defined for each of l and u.
The maximization learning unit 32 trains a score function in accordance with the range between the upper limit and the lower limit of the true positive rate set by the region-to-be-maximized setting unit 30 (the region 21 to be maximized). The score function is trained so as to optimize an objective function represented using positive example data selected from ranked positive example data (the score ranking 23), negative example data, and the score function that calculates a score representing likelihood of a positive example.
The maximization learning unit 32 trains the detector parameter 22 so as to maximize the pAUC according to the set region 21 to be maximized. Here, the detector is built of a deep neural network (DNN), and learns the detector parameter 22 of the DNN based on the proper objective function by the error back-propagation method. L (Rl, Ru) described below is used as the objective function to be minimized.
Here, f(●) indicates an output value of the DNN, and l(●) is set as a function that gives a loss to 0 or a negative value. For example, l(z)=(1−z)2 proposed in Reference 1 can be used, but other functions may be used.
[Reference 1] Gao, Wei, and Zhi-Hua Zhou. “On the Consistency of AUC Pairwise Optimization” IJCAI. 2015,
xp and xn indicate the positive example data to be detected and the negative example data to be detected, respectively. In the case where total positive example data xp are sorted in descending order according to their score functions f(xp), when the ranking is expressed as the percentage of the total positive example data, Xp(Rl, Ru) indicates a set of a positive example data. larger than the lower limit Rl and smaller than the upper limit Ru. In other words, the maximization learning unit 32 selects the positive example data Xp(Rl, Ru) included in the range between the upper limit and the lower limit when the ranking is expressed as the percentage of the total positive example data, from the ranked positive example data (the score ranking 23).
Similarly, mp(Rl, Ru) indicates the total number of a positive example data included in Xp(Rl, Ru), mn indicates the total number of negative example data. By minimizing the objective function in the equation (3) above, a detector that outputs a high score for positive example data and a low score for negative example data can be obtained. In particular, limiting the positive example data to some data according to the ranking of the detection scores allows for optimization comparable to the maximization of the pAUC.
The ranking unit 34 ranks the positive example data based on the scores calculated using the score functions. The ranking unit 34 uses the learned detector parameter 22 to calculate the detection scores for total positive example data, and calculates the ranking of the detection scores in descending order as the score ranking 23. Because the ranking unit 34 is located in the subsequent stage of the maximizing unit, no data on the score ranking 23 is present at initial learning. However, the region 21 to be maximized is all data enabling learning without using ranking data.
The determination unit 36 causes the maximization learning unit 32 and the ranking unit 34 to repeat the processing until the objective function in the equation (3) is converged, and the region-to-be-maximized setting unit 30 to repeat setting until the range between the upper limit and the lower limit of the true positive rate (TPR) (the region 21 to be maximized) becomes a predetermined size.
An example of detection processing performed using the detector parameter 22 acquired by the detection learning apparatus 100 according to the embodiment of the present invention will be described. In the detection processing, the score f(x) is calculated for input data x using the detector parameter 22, and detects the data as target data when the calculated score is larger than a threshold value θ. It is desirable to prepare verification data different from the training data. in the learning processing and set a threshold value at which the TPR is a in the verification data as the threshold value θ used herein.
Actions of Detection Learning Apparatus According to Embodiment of Present Invention
Next, actions of the detection learning apparatus 100 according to the embodiment of the present invention will be described. The detection learning apparatus 100 executes the detection learning processing routine illustrated in
In Step S100, the region-to-be-maximized setting unit 30 gradually narrows the range determined by the upper limit and the lower limit of the true positive rate for defining the area under the ROC curve (the region 21 to be maximized) according to the above equation (1) at each repetition.
In Step S102, the maximization learning unit 32 trains the score function in accordance with the range between the upper limit and the lower limit of the true positive rate set in Step S100 (the region 21 to be maximized). The score function is trained so as to optimize the objective function in the above equation (3) represented using the positive example data selected from ranked positive example data (the score ranking 23). the negative example data, and the score function that calculates a score representing likelihood of a positive example.
In Step S104, the ranking unit 34 ranks the positive example data and calculates the score ranking 23 based on the scores calculated using the score functions.
In Step S106, the determination unit 36 determines whether the objective function in the above equation (3) has been converged, in accordance of a determination of convergence, proceeds to Step S108, and in accordance of a determination of non-convergence, returns to Step S102 to repeat the processing.
In Step S108, the determination unit 36 determines whether the range between the upper limit and the lower limit of the true positive rate (TPR) (the region 21 to be maximized) has been reduced to a predetermined size, in accordance with a determination that reduced to the predetermined size, terminates the processing, and in accordance with a determination that not reduced to the predetermined size, returns to Step S100 to repeat the processing
As described above, the detection learning apparatus according to the embodiment of the present invention can train a well-balanced detector in the vicinity of a desired TPR.
Note that the present invention is not limited to the above-described embodiments, and various modifications and applications may be made without departing from the gist of the present invention.
For example, in the embodiment described above, the score function is trained in the range determined by the upper limit and the lower limit of the true positive rate (TPR). However, the present invention is not limited thereto, and the score function may be trained in a range determined by the upper limit and the lower limit of the false positive rate (FPR) rather than the true positive rate. For example, in the embodiment described above, the positive example data is selected in the maximization learning unit 32, but in the case of using the false positive rate, the negative example data may be selected by replacing the positive example data with the negative example data and ranking the negative example data. In the case where total negative example data xn are sorted in descending order according to their score functions f(xn), when the ranking is expressed as the percentage of the total negative example data, a set of negative example data larger than the lower limit and smaller than the upper limit is selected.
Number | Date | Country | Kind |
---|---|---|---|
2018-231895 | Dec 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/047006 | 12/2/2019 | WO | 00 |