Embodiments of the present invention relate to an information processing device, an information processing method, and a program.
Technologies for recognition processing in units of series are used. More specifically, a technique called connectionist temporal classification (CTC) is used. By performing recognition in units of series, an accuracy higher than that of a case in which recognition is performed in units of symbols constituting a series can be acquired. The reason for this is that recognition processing not only depending on features in units of symbols but also using features as a series can be performed. In a conventional technology for recognition processing in units of series, in a case in which the reliability of a result of the recognition processing is low or the like, rejection (discard) of the result of the recognition processing is performed. However, in the recognition processing in units of series, a target that is rejected is the entire series. In such a conventional technology, even in a case in which only a part of a series has a low reliability, the entire series needs to be rejected. In other words, there is a problem that a part of the information acquired in the process of recognition processing cannot be effectively utilized.
An object of the present invention is to provide an information processing device, an information processing method, and a program capable of rejecting only a part of a series in recognition processing performed in units of series.
An information processing device according to an embodiment includes a rejection class adding unit, a label sequence selecting unit, and an output unit. The rejection class adding unit adds a rejection class by acquiring a categorical distribution sequence formed by aligning a plurality of categorical distributions having scores for each class and acquiring a score of a rejection class on the basis of the categorical distribution for each of the categorical distributions included in the categorical distribution sequence. The label sequence selecting unit calculates likelihoods of label sequence candidates corresponding to the categorical distribution sequence on the basis of the categorical distribution sequence after addition of the rejection class and selects a label sequence among a plurality of the label sequence candidates in accordance with the likelihoods of the label sequence candidates. The output unit outputs the selected label sequence.
Hereinafter, an information processing device, an information processing method, and a program according to embodiments will be described with reference to the drawings.
An information processing device according to this embodiment acquires a result of recognition of a series recognition problem using a CTC. The CTC represents a connectionist temporal classification. Specific examples of the series recognition problem include recognition processes such as a speech recognition, a character string recognition, and a gesture recognition. The information processing device according to this embodiment is applicable to all such series recognition problems. For example, the speech recognition is a process of receiving speech as an input and outputting text (character string) corresponding to the speech. In addition, for example, the character string recognition is a process of receiving an image of a printed character string using handwritten characters or a predetermined font as an input and outputting text (character string) corresponding to the image. For example, the gesture recognition is a process of receiving signals of a time series detected by a sensor (for example, a touch panel, an acceleration sensor, or the like) detecting a person's gesture as an input and outputting a sequence of symbols corresponding to the gesture.
In this embodiment, a categorical distribution is a discrete probability distribution representing probabilities of a plurality of classes. In other words, the categorical distribution is a set of classes, and each of the classes has a probability value. The probability value corresponds to a value called a likelihood, a certainty factor, a score, or the like in this embodiment. The probability value is a normalized numerical value. Here, being normalized represents that, for a certain categorical distribution, a sum of numerical values of classes belonging to the categorical distribution is 1. A numerical value called a likelihood, a certainty factor, a score, or the like may be normalized or non-normalized in the meaning described above.
The categorical distribution sequence is a series of a categorical distribution, in other words, a set having a sequence.
The information processing device according to this embodiment takes in a categorical distribution sequence as an input and selects and outputs a maximum-likelihood label sequence on the basis of the categorical distribution sequence. An input categorical distribution sequence, for example, is calculated on the basis of feature quantities representing features of an image, speech, a quantity that is physically detected, a statistical value, and the like. The information processing device according to this embodiment may be configured to include a process of generating a categorical distribution sequence based on such feature quantities.
A label is associated with a class. An arrangement of labels corresponding to classes included in a categorical distribution within a categorical distribution sequence is a label sequence. A mapping function to be described below can map a label sequence onto another label sequence. A length of a label sequence may be changed by the mapping function. Classes include a normal class, a blank class, and a rejection class. In correspondence with such classes, a normal label, a blank label, and a rejection label are respectively present. The normal class corresponds to labels configuring a label sequence that is acquired as a result of recognition. The blank class corresponds to a blank. For example, in the process of handwritten character recognition, a blank area between a character and a character is a blank. In addition, for example, in the process of speech recognition, a section of muteness (or only noise) between a phoneme and a phoneme is a blank. Blanks in the other application areas are similar thereto. The rejection class is a class that is a target for rejection. For example, a case in which it is difficult to determine a class of a recognition target, a case in which a likelihood of a specific class is too low (for example, the likelihood is less than a predetermined threshold), and the like correspond to the rejection class.
The input unit 21 acquires a categorical distribution sequence. The categorical distribution sequence can be acquired on the basis of features of a target. The input unit 21 may obtain a categorical distribution sequence acquired by an external device or the like or may directly acquire a categorical distribution sequence on the basis of feature quantities of a target.
The rejection class adding unit 25 receives a categorical distribution sequence from the input unit 21. Then, the rejection class adding unit 25, for each categorical distribution included in a categorical distribution sequence, acquires a score of a rejection class on the basis of the categorical distribution. The rejection class adding unit 25 adds a rejection class having the acquired score to the categorical distribution. A categorical distribution sequence that is a target to be processed by the rejection class adding unit 25 is formed as a series by arranging a plurality of categorical distributions having scores for each class.
The label sequence selecting unit 27 calculates a likelihood of a label sequence candidate corresponding to a categorical distribution sequence on the basis of the categorical distribution sequence after addition of the rejection class. The label sequence selecting unit 27 selects a specific label sequence from among a plurality of label sequence candidates in accordance with calculated likelihoods of the label sequence candidates. The label sequence selecting unit 27, for example, selects one label sequence having the highest likelihood. For example, the label sequence selecting unit 27 may select label sequences of up to a higher n-th (here, n is a positive integer) rank. In addition, the label sequence selecting unit 27 may not necessarily select a label sequence having the highest likelihood.
The label sequence selecting unit 27 may use a mapping function. The mapping function is for mapping from a sequence of labels corresponding to each categorical distribution configuring a categorical distribution sequence (for the convenience of description, this will be referred to as a first label sequence) onto a sequence of labels to be finally output (for the convenience of description, this will be referred to as a second label sequence). Generally, the mapping function is many-to-one mapping. In other words, one identical output may be configured to be in correspondence with a plurality of different inputs for the mapping function. Although details will be described below, for example, the label sequence selecting unit 27 acquires a first label sequence likelihood for each first label sequence on the basis of likelihoods of classes included in a categorical distribution sequence (likelihoods of classes in each categorical distribution). Then, the label sequence selecting unit 27 sets a result of an application of a predetermined mapping function to the first label sequence as a second label sequence. The label sequence selecting unit 27 acquires a second label sequence likelihood for each second label sequence on the basis of the first label sequence likelihood of the first label sequence associated with the second label sequence using the mapping function. In other words, the label sequence selecting unit 27 can acquire a set of first label sequences as a result of an application of an inverse function of the mapping function described above to the second label sequence. Thus, a sum of likelihoods of the first label sequences (a first label sequence likelihood) is set as a likelihood of the second label sequence (a second label sequence likelihood). The label sequence selecting unit 27 uses the second label sequences as the label sequence candidates described above and selects a label sequence to be output from among such second label sequences on the basis of the second label sequence likelihood.
The output unit 29 outputs the label sequence selected by the label sequence selecting unit 27 to the outside.
In Step S1, the input unit 21 acquires categorical distribution sequences p1, . . . , pL from the outside. These categorical distribution sequences p1, . . . , pL have been acquired on the basis of features of data that is a recognition target.
In Step S2, the information processing device 1 initializes t, which is a variable indicating a position within the categorical distribution sequence, to 1.
In processes of Steps S3 to S7, the rejection class adding unit 25 adds a rejection class to the categorical distribution pt indicated by the variable t. A specific process of each step is as follows.
In Step S3, the rejection class adding unit 25 directly sets a likelihood pt(k) of a class k included in input data as a score after insertion of the rejection class. Here, the class k is a normal class or a blank class.
In Step S4, the rejection class adding unit 25 determines whether or not a value of a maximum value (maxk pt(k)) of a likelihood pt(k) at the time t is less than a threshold 01. The threshold 01 will be described below. In a case in which the maximum value of the likelihood pt(k) is less than the threshold θ1 (Step S4: Yes), the process proceeds to Step S6. In a case in which the maximum value of the likelihood pt(k) is equal to or more than the threshold θ1 (Step S4: No), the process proceeds to Step S5.
In Step S5, the rejection class adding unit 25 sets the score of the rejection class to be added to “0”. After end of this step, the process proceeds to the process of Step S7.
In Step S6, the rejection class adding unit 25 sets the score of the rejection class to be added to α1. Here, α1 is a positive value that is set appropriately. After end of this step, the process proceeds to the process of Step S7.
In Step S7, the rejection class adding unit 25 normalizes the categorical distribution pt (tilde) after addition of the rejection class. By performing normalization, for example, the rejection class adding unit 25 sets a sum of scores of all the classes including the rejection class to 1 for the time t.
In Step S8, the information processing device 1 determines whether or not t≥L. In other words, in this step, the information processing device 1 determines whether or not the process for all the positions t in the input categorical distribution sequence has ended. In the case of t≥L (Step S8: Yes), the process proceeds to Step S11. In the case of t<L (Step S8: No), the process proceeds to Step S9.
In a case in which the process proceeds to Step S9, the information processing device 1 increments the value of t for advancing the position within the categorical distribution sequence. In other words, the value of (t+1) is substituted into the variable t. After the process of this step, the information processing device 1 returns to the process of Step S3.
In a case in which the process proceeds to Step S10, the label sequence selecting unit 27 selects a label sequence having the highest likelihood on the basis of the categorical distribution sequence after addition of the rejection class. Details of selection using the label sequence selecting unit 27 will be described below.
In Step S11, the output unit 29 outputs the label sequence selected in Step S10 to the outside. The label sequence output in this step is a result of recognition acquired by the information processing device 1 on the basis of the categorical distribution sequence acquired by the input unit 21. When the process of this step ends, the information processing device 1 ends the process of the entire flowchart.
Next, the internal process of the information processing device 1 will be described in more detail.
The CTC uses a processing method of receiving a categorical distribution series including a blank class in addition to a class that is the original recognition target as an input and calculating a likelihood of a specific label sequence. As described above, the input unit 21 acquires L categorical distribution sequences p1, . . . , pL as inputs.
For such categorical distribution sequences p1, . . . , pL, a likelihood of a label sequence l is calculated using the following Equation (1).
Here, B is a mapping function for deleting a blank label and deleting consecutive identical labels (here, one label among such consecutive identical labels is caused to remain).
B−1 in Equation (1) is an inverse function of the function B described above. In other words, B−1(l) represents a set of paths that can be converted into 1 by the function B. Such a path matches a label sequence including a blank label and is a label sequence of a length L. pt(πt) represents a probability of a t-th label πt of a path π in a categorical distribution sequence pt. As a result of the CTC processing, generally, a label sequence having a highest likelihood that has been calculated is output as a result of the prediction.
The input unit 21 acquires categorical distribution sequences p1, . . . , pL composed of L categorical distributions that do not include rejection classes and include blank labels (Step S1 illustrated in
The rejection class adding unit 25 adds a rejection class to each of the L categorical distributions p1, . . . , pL. More specifically, in a case in which a score of a class that is a first-ranked candidate included in the categorical distribution pt (1≤t≤L) is less than a threshold θ1, the rejection class adding unit 25 adds a rejection class having a predetermined score α1 (Step S6 illustrated in
In this way, the rejection class adding unit 25 acquires {tilde over (p)}1, . . . , {tilde over (p)}L that is a categorical distribution after addition of the rejection class. In other words, pt to which a tilde has been assigned is a categorical distribution after addition of the rejection class.
As described above, the categorical distribution after addition of the rejection class to be added is as represented in the following Equations (2) and (3).
The rejection class adding unit 25 may perform normalization such that a sum of scores of categorical distributions after addition of the rejection class is “1” (Step S7 illustrated in
By performing the normalization as such, {tilde over (p)}1, . . . , {tilde over (p)}L can be handled as a probability scale and can be calculated with the same scale as that of p1, . . . , pL.
Here, the normalization process is not essential as in a case in which calculation with the same scale as that of categorical distribution sequences p1, . . . , pL is not necessary or the like.
The value of the score of the k-th class of the t-th categorical distribution after normalization is represented in the following Equation (4).
In addition, the rejection class adding unit 25 may configure α1 represented in Equation (2) as being infinite.
In a case in which α1 is configured as being infinite, in a case in which the score of a first-ranked class within the categorical distribution is less than 01, and in a case in which the normalization represented in Equation (4) is performed, {tilde over (p)}1, . . . , {tilde over (p)}1, is represented in the following Equation (5).
The reason for configuring α1 as being infinite is that the categorical distribution represented in Equation (5) can be acquired as a result of normalization using Equation (4). Instead, regardless of the value of α1, in a case in which the score of the first-ranked class within the categorical distribution is less than θ1, normalization as in Equation (5) as a definition may be performed.
In other words, in this embodiment, in a case in which the score of a class that is the first-ranked candidate within the categorical distribution is equal to or more than a predetermined threshold (θ1), the rejection class adding unit 25 sets the score of the rejection class to a lowest value among scores of all the classes. “0” in a lower level of the right side represented in Equation (2) is a lowest value in a case in which the value of the score of the class takes a value equal to or more than 0 and equal to or less than 1. On the other hand, in a case in which the score of the class that is the first-ranked candidate within the categorical distribution is less than the predetermined threshold (θ1), the rejection class adding unit 25 sets the score of the rejection class to a predetermined value (α1) other than the lowest value. In other words, for example, in a case in which the value of the score of the class takes a value equal to or more than 0 and equal to or less than 1, it is set such that α1>0.
The label sequence selecting unit 27 selects a label sequence lout having a highest likelihood on the basis of the categorical distribution sequence to which the rejection class has been added. The selected label sequence lout may include a rejection label. Calculation for determining a label sequence lout to be selected is as represented in Equations (6) and (7).
Equation (6) is a numerical equation for acquiring a score of a label sequence l. In other words, the score of the label sequence l is a sum of likelihoods of paths π in which the label sequence l can be acquired by applying the mapping function BM. A likelihood of a path π is calculated as a product of likelihoods of sequences πt (1≤t≤L) composing the path π (based on the categorical distribution having a rejection class).
In addition, as represented also in Equation (6), when a likelihood of a label sequence is acquired, the label sequence selecting unit 27 uses a mapping function BM. The mapping function B described above is a function of deleting a blank label and deleting consecutive identical labels. In contrast to this, the mapping function BM performs an operation of, in a case in which a normal label and a rejection label are adjacent to each other, deleting the rejection label at that position and causing the normal label to remain together with deleting a blank label and deleting consecutive identical labels.
Table 1 is a table for a comparison between the function B and the function BM. This table represents outputs of the function B and the function BM for predetermined examples of an input label sequence 7E. For the convenience of description, a row number is assigned to each row of the table.
In Table 1, rejection labels appearing in a first row and a second row are not adjacent to a normal label and are separated by a blank label. Thus, such rejection labels remain in an output label sequence not only in a case in which the function B is applied but also in a case in which the function BM is applied. On the other hand, rejection labels appearing in a third row and a fourth row are adjacent to a and b that are normal labels. Such rejection labels remain within the output label sequence in a case in which the function B is applied and are deleted in a case in which the function BM is applied.
In other words, by using a first label sequence as an argument, the mapping function BM performs an operation of (1) deleting a blank label from the first label sequence, (2) in a case in which consecutive identical labels are present within the first label sequence, substituting the consecutive identical labels with only one of the labels (only one of the labels is caused to remain), and (3) in a case in which there is a position at which a normal label and a rejection label are consecutive within the first label sequence, deleting the rejection label and causing the normal label to remain regardless of which one of the normal label and the rejection label precedes the other. A result of the operation is a second label sequence that is an output value of the mapping function BM. An inverse function of the mapping function BM uses the second label sequence described above as an argument and has a set of corresponding first label sequences as an output value.
According to this embodiment, the rejection class adding unit 25 adds rejection classes not to the entire categorical distribution sequence but to each categorical distribution among them and gives appropriate scores to such rejection classes. In accordance with this, not rejection of the entire label sequence but rejection in units of labels can be performed (a rejection label can be given within the label sequence to be output).
In addition, according to this embodiment, the BM is used as a mapping function. In this way, in a case in which there is a position at which a normal label and a rejection label are consecutive in the first label sequence, the rejection label is deleted, and the normal label is caused to remain regardless of which one of the normal label and the rejection label precedes the other. In other words, a more appropriate label sequence can be selected and output.
Next, a first modified example of this embodiment will be described. Here, special items of this modified example will be focused on in the description. Points that are not particularly mentioned here are similar to the items that have been described in this embodiment.
In this modified example, a mapping function BR is used instead of the mapping function BM. The mapping function BM performs the operation of, in a case in which a normal label and a rejection label are adjacent to each other, deleting the rejection label at the position thereof and causing the normal label to remain. Here, in a case in which there is a position at which a normal label and a rejection label are adjacent to each other, the mapping function BR converts the entire label sequence into a label sequence that is an exclusion target.
Table 2 is a table for a comparison between the function B and the function BM, and the function BR. This table represents outputs of the function B, the function BM, and the function BR for predetermined examples of an input label sequence 7C. For the convenience of description, a row number is assigned to each row of the table.
As illustrated in Table 2, the mapping function BR outputs “exclusion” for a label sequence including a position at which a and b, which are normal labels, and a rejection label are adjacent to each other. In other words, when an inverse function of the mapping function BR is applied to a certain label sequence l, a label sequence including a pattern in which a normal label and a rejection label are adjacent to each other such as “a?__b_” and “a?_??b” is not included in a set output by BR−1(I) This has an effect of reducing the amount of calculation that is necessary for calculating a likelihood of the label sequence l.
In other words, by using the first label sequence as an argument, the mapping function BR performs an operation of (1) deleting the blank labels described above from the first label sequence and (2), in a case in which consecutive identical labels are present within the first label sequence, substituting the consecutive identical labels with only one of the labels. A result of the operation is a second label sequence that is an output value of the mapping function BR. Here, in a case in which there is a position at which a normal label and a rejection label are consecutive within the first label sequence, an exclusion target label sequence (may be simply referred to as “exclusion”) is set as a second label sequence that is an output value of the mapping function BR regardless which one of the normal label and the rejection label precedes the other. An inverse function of the mapping function BR uses the second label sequence described above as an argument and has a set of corresponding first label sequences as an output value.
In this modified example, the label sequence selecting unit 27 performs calculation using the following Equations (8) and (9) and selects a label sequence lout having a highest likelihood.
In other words, as represented in Equation (8), in this modified example, the label sequence selecting unit 27 uses the inverse function of the mapping function BR. In other words, the label sequence selecting unit 27 sets a sum of likelihoods of paths π that become a label sequence l when the mapping function BR is applied as a likelihood of the label sequence l.
In this modified example, BR is used as the mapping function. In accordance with this, in a case in which there is a position at which a normal label and a rejection label are consecutive within the first label sequence, a corresponding second label sequence is set as an exclusion target label sequence regardless of which one of the normal label and the rejection label preceding the other. In this way, the process of calculating likelihoods of candidates for a label sequence can be simply implemented. In other words, the amount of calculation can be reduced.
Next, a second modified example of this embodiment will be described. Here, special items of this modified example will be focused on in the description. Points that are not particularly mentioned here are similar to the items that have been described in this embodiment (including other modified examples).
In this modified example, there is a feature in the sequence of a process when the label sequence selecting unit 27 selects a label sequence having a maximum likelihood. More specifically, in this modified example, the label sequence selecting unit 27 acquires a maximum likelihood path π* formed by aligning labels of which likelihood is a maximum in each categorical distribution {tilde over (p)}1, . . . , {tilde over (p)}L. Then, the label sequence selecting unit 27 selects B(π*) that is a result acquired by converting the maximum likelihood path π* using the mapping function B as an output label sequence.
In other words, in this modified example, the label sequence selecting unit 27 selects an output label sequence lout using the following Equations (10), (11), and (12).
π*=″π1*π2* . . . πL*″ (10)
πt*=argmaxk{tilde over (p)}t(k) (11)
l
out=(π*) (12)
As represented in Equation (11), πt* (here, 1≤t≤L) is a label corresponding to a maximum likelihood class in a t-th categorical distribution (here, in the rejection class). In addition, “π1*π2* . . . πL*” is a label sequence (path) in which π1* to πL* are aligned in that order.
In other words, in this modified example, the label sequence selecting unit 27 acquires a first label sequence likelihood for each first label sequence on the basis of a likelihood of a class included in a categorical distribution sequence (a likelihood in each categorical distribution). The label sequence selecting unit 27 selects a predetermined number of (one or a plurality of) first label sequences in accordance with the first label sequence likelihood among a plurality of first label sequences. The label sequence selecting unit 27 selects a second label sequence that is a result of an application of a predetermined mapping function to the selected first label sequence as a label sequence to be output.
Here, the mapping function may be the mapping function B described above. In other words, the mapping function B performs an operation of (1) deleting a blank label from the first label sequence and (2), in a case in which consecutive identical labels are present within the first label sequence, substituting the consecutive identical labels with only one of the labels. The mapping function B sets a result of the operation as a second label sequence that is an output value.
The mapping function B used in Equation (12) may be substituted with the mapping function BM or BR described above.
According to this modified example, the number of cases of combinations at the time of acquiring a likelihood of a label sequence decreases. In other words, a similar effect can be acquired with simpler implementation, in other words, with a smaller amount of calculation.
Next, a third modified example of this embodiment will be described. Here, special items of this modified example will be focused on in the description. Points that are not particularly mentioned here are similar to the items that have been described in this embodiment (including other modified examples).
In this modified example, there is a feature in the method of adding a rejection class. In this modified example, the rejection class adding unit 25 assigns a score of a rejection class on the basis of a difference between likelihoods of a first-ranked candidate and a second-ranked candidate in each categorical distribution pt before addition of the rejection class.
More specifically, the rejection class adding unit 25 calculates {tilde over (p)}1, . . . , {tilde over (p)}L using the following Equations (13) and (14). Here, pt1st and pt2nd respectively represent the likelihood of the first-ranked candidate and the likelihood of the second-ranked candidate of pt. Here, 1≤t≤L.
As represented in Equation (13), in this modified example, the rejection class adding unit 25 adds a rejection class having a predetermined score α1 (α1>0) to the categorical distribution in which a difference between likelihoods of the first-ranked candidate and the second-ranked candidate of the categorical distribution pt (1≤t≤L) is less than a predetermined threshold θ2. In the other case (in other words, in a case in which the difference between the likelihoods of the first-ranked candidate and the second-ranked candidate is equal to or more than θ2), the rejection class adding unit 25 adds a rejection class having a score of “0”.
The value of α1 is as described above. In addition, the value of the threshold θ2, for example, may be a positive integer given in advance.
The method of assigning a score of a rejection class may be further generalized and may be as below. In a case in which a difference between a first-ranked score of a class that is a first-ranked candidate within a categorical distribution and a second-ranked score of a class that is a second-ranked candidate within the categorical distribution is equal to or more than a predetermined threshold (θ2), the rejection class adding unit 25 sets the score of the rejection class to a lowest value among the scores of all the classes. One example of this lowest value is 0 represented in a lower level of the right side of Equation (13). For example, the range of a value of the score of a class may be set to be equal to or more than 0 and equal to or less than 1. On the other hand, in a case in which a difference between a first-ranked score of a class that is a first-ranked candidate within a categorical distribution and a second-ranked score of a class that is a second-ranked candidate within the categorical distribution is less than the predetermined threshold (θ2), the rejection class adding unit 25 sets the score of the rejection class to a predetermined value other than the lowest value described above. One example of this “predetermined value” is α1 represented in an upper level of the right side of Equation (13).
As a new modified example, in any one of a case in which the likelihood of the first-ranked candidate is less than the threshold θ1 described above and a case in which a difference between the likelihoods of the first-ranked candidate and the second-ranked candidate is less than the threshold θ2 described above, a rejection class of which a score is α1 may be assigned. In the other case, a rejection class of which a score is 0 is assigned.
In addition, also in this modified example, the rejection class adding unit 25 may be configured to normalize scores of the categorical distribution after addition of the rejection class using Equation (4) described above or the like.
According to this modified example, a score of the rejection class is assigned on the basis of a relative relation between the score of the first-ranked candidate and the score of the second-ranked candidate. In other words, in this modified example, while the number of hyper parameters to be adjusted increases, improvement of the rejection performance can be expected.
Next, a fourth modified example of this embodiment will be described. Here, special items of this modified example will be focused on in the description. Points that are not particularly mentioned here are similar to the items that have been described in this embodiment (including other modified examples).
In this modified example, there is a feature in the method of adding a rejection class. In this modified example, in a case in which a likelihood of a first-ranked candidate is less than a threshold θ1 and the first-ranked candidate is a blank class in a categorical distribution pt (1≤t≤L) before addition of a rejection class, the rejection class adding unit 25 adds a rejection class having a predetermined score α1 (here, α1>0). On the other hand, in the other case, the rejection class adding unit 25 adds a rejection class having a score of “0”.
In other words, in this modified example, the rejection class adding unit 25 calculates a categorical distribution after addition of the rejection class {tilde over (p)}1, . . . , {tilde over (p)}L using the following Equations (15) and (16).
Here, the values of θ1 and α1 are as described above. In addition, also in this modified example, the rejection class adding unit 25 may be configured to perform normalization using Equation (4) or the like after addition of the rejection class.
As described above, in this modified example, a condition of the first-ranked candidate being a blank class in the categorical distribution is a necessary condition for adding a valid rejection. In this modified example, by setting only blanks as targets for rejection, a rejection function targeted only for label deletion can be realized.
A graph illustrated in
A condition for assigning a score α1 may be acquired by combining the basic type according to the first embodiment and the third modified example.
In other words, in this modified example, in a case in which the score of a class that is a first-ranked candidate within a categorical distribution is equal to or more than a predetermined threshold (θ1) or in a case in which the class that is the first-ranked candidate is not a blank class corresponding to a blank, the rejection class adding unit 25 sets the score of the rejection class to a lowest value among scores of all the classes. “0” illustrated in a lower level of the right side of Equation (15) is one example of this lowest value (for example, in a case in which the range of the value of the score is set to be equal to or more than 0 and equal to or less than 1).
For example, such graphs are examples of a case in which a handwritten character recognizing process is performed.
In the graphs illustrated in
In
Also in
A first-ranked candidate at the position T2 is the rejection class. In other words, the first-ranked candidate at each of the positions T1 and T2 is similar to that of the case illustrated in
Next, a fifth modified example of this embodiment will be described. Here, special items of this modified example will be focused on in the description. Points that are not particularly mentioned here are similar to the items that have been described in this embodiment (including other modified examples).
A feature of this modified example is a value of the score assigned to a rejection class. In this modified example, the rejection class adding unit 25 applies a predetermined function f to the value of the maximum likelihood (a likelihood of the first-ranked candidate) in a categorical distribution pt (1≤t≤L) before addition of the rejection class and sets the function value thereof as the score of the rejection class. In other words, the rejection class adding unit 25 adds a rejection class having a score represented in the following Equation (17) to a t-th categorical distribution within the categorical distribution sequence.
In this way, the rejection class adding unit 25 calculates {tilde over (p)}1, . . . , {tilde over (p)}L after addition of the rejection class. In other words, the calculation is as represented in Equation (18).
The domain of the function f(x) includes the range of a value that may be taken by the likelihood of a class included in pt. For example, the domain is 0≤x≤1. In addition, f(x) is monotonous decreasing function (a monotonous decreasing function). In other words, when 0≤x1<x2≤1, f(x1)≥f(x2). Alternatively, when 0≤x1<x2≤1, it may be configured such that f(x1)>f(x2). By using such f(x), the score of the rejection class can be configured to be lower as the likelihood of the class of the first-ranked candidate becomes higher, and the score of the rejection class can be configured to be higher as the likelihood of the class of the first-ranked candidate becomes lower.
By using the domain as 0≤x≤1, examples of the function f(x) include f(x)=1−x or f(x)=(x−1){circumflex over ( )}2.
In other words, in this modified example, the rejection class adding unit 25 sets a value (a value given by the function f(x) described above) that monotonously decreases (monotonously decreases) with respect to the score of a class that is the first-ranked candidate within the categorical distribution before addition of the rejection class as the score of the rejection class in the categorical distribution.
According to this modified example, the score of the rejection class can be changed in accordance with the score of a class that is the first-ranked candidate.
Here, an example of the operation of the information processing device 1 according to the first embodiment will be described. Table 3 represented below illustrates an example of a categorical distribution sequence after addition of a rejection class performed by the rejection class adding unit 25. The values of likelihoods in this categorical distribution are normalized. In other words, a sum of likelihoods of classes in each time step is 1. In this example, the length of the categorical distribution sequence is 2. In other words, only 1 and 2 are included as time steps. As the classes, there are three types including a normal class a, a blank class, and a rejection class. Numerical values recorded in the table are likelihoods of classes in each time step.
As illustrated in Table 3, when the time step is 1, the likelihood of a, which is a first-ranked candidate, exceeds a threshold θ1, and thus the likelihood of the rejection class (?) is 0. When the time step is 2, the likelihood of a, which is a first-ranked candidate, is less than the threshold θ1, and thus 0.7 (an example of the value of cu described above) is assigned as the likelihood of the rejection class (?).
For the categorical distribution sequence (including a rejection class) represented in Table 3, tables representing selection of a maximum likelihood label sequence in cases of three types are Tables 4, 5, and 6. Each of Tables 4, 5, and 6 represents the process of calculating a likelihood of each label sequence candidate. Table 4 corresponds to a case in which B is used as a mapping function (a case of a conventional technology).
Table 5 corresponds to a case in which BM is used as a mapping function (in the case of the basic type of this embodiment). Table 6 corresponds to a case in which BR is used as a mapping function (in the case of the first modified example of this embodiment).
In Table 4, for the convenience of description, row numbers are assigned. 1 of the first column is a label sequence of a candidate. B−1(l) of the second column is a result of an application of an inverse function of the mapping function B to the label sequence l. p(l) of the third column represents a likelihood of the label sequence l. From the first row to the fifth row, label sequences are respectively, blank, “a”, “?”, “a?”, and “a”.
B−1(l) corresponding to a label sequence of the blank represented in the first row is only “__”. A likelihood corresponding to this “__” is 0.1×0.1=0.01 from Table 3. In other words, the likelihood of the label sequence of the blank is 0.01.
B−1(l) corresponding to a label sequence “a” of the second row is “aa”, “a_”, and “_a”. The likelihoods of these “aa”, “a_”, and “_a” can be calculated from Table 3 and are respectively 0.18, 0.09, and 0.02. 0.29 that is a sum thereof is a likelihood of the label sequence “a”.
B−1(l) corresponding to a label sequence “?” of the third row is “??”, “_?”, and “_?”. The likelihoods of these “??”, “_?”, and “?_” can be calculated from Table 3 and are respectively 0.00, 0.07, and 0.00. 0.07 that is a sum thereof is a likelihood of the label sequence “?”.
B−1(l) corresponding to a label sequence “a?” of the fourth row is only “a?”. A likelihood of this “a?” is 0.9×0.7=0.63 from Table 3. In other words, the likelihood of the label sequence “a?” is 0.63.
B−1(l) corresponding to a label sequence “?a” of the fifth row is only “?a”. A likelihood of this “?a” is 0.0×0.2=0.00 from Table 3. In other words, the likelihood of the label sequence “?a” is 0.00.
In Table 5 (a case in which the mapping function BM is used), for the convenience of description, row numbers are assigned. As illustrated in the table, from the first row to the fifth row, candidates for the label sequence are respectively a blank, “a”, “?”, “a?”, and “?a”. Since a mapping function different from that of the case of Table 4 is used, BM−1(l) corresponding to a candidate I for a label sequence is different from that of the case of Table 4.
BM−1(l) corresponding to a label sequence of a blank of the first row is only “__”. A likelihood of this “__” is 0.01 from Table 3. In other words, the likelihood of the blank label sequence is 0.01.
BM−1(l) corresponding to a label sequence “a” of the second row is “aa”, “a_”, “_a”, “a?”, and “?a”. A sum of the likelihoods of these “aa”, “a_”, “_a”, “a?”, and “?a” is 0.92. In other words, the likelihood of the label sequence “a” is 0.92.
BM−1(l) corresponding to a label sequence “?” of the third row is “??”, “?”, and “?”. A sum of the likelihoods of these “??”, “?”, and “?” is 0.07. In other words, the likelihood of the label sequence “?” is 0.07.
For each of the label sequence “a?” of the fourth row and the label sequence “?a” of the fifth row, a corresponding BM−1(l) is an empty set. In other words, both the likelihood of the label sequence “a?” of the fourth row and the likelihood of the label sequence “?a” of the fifth row are 0.00.
As above, in Table 5, a label sequence having the highest likelihood among candidates for the label sequences of five types is “a”. In addition, the likelihood p(1) thereof is 0.92. In the case as illustrated in this Table 5, the label sequence selecting unit 27 selects “a” that is the label sequence having the highest likelihood and transfers the selected label sequence to the output unit 29.
In other words, while “a?” is selected as an output label sequence in a case in which a conventional technology is used (Table 4), “a” is selected as an output label sequence in a case in which the basic type of this embodiment is used (Table 5). This is in accordance with the use of BM as a mapping function.
In Table 6 (a case in which the mapping function BR is used), for the convenience of description, row numbers are assigned. As illustrated in the table, from the first row to the sixth row, candidates for the label sequence are respectively a blank, “a”, “?”, “a?”, “?a”, and exclusion. Since a mapping function different from those of the cases of Tables 4 and 5 is used, BR−1(l) corresponding to a candidate I for a label sequence has a feature in this example.
BR−1(l) corresponding to a label sequence of a blank of the first row is only “__”. A likelihood of this “__” is 0.01 from Table 3. In other words, the likelihood of the blank label sequence is 0.01.
BR−1(l) corresponding to a label sequence “a” of the second row is “aa”, “a_”, and “_a”. A sum of the likelihoods of these “aa”, “a_”, and “_a” is 0.29. In other words, the likelihood of the label sequence “a” is 0.29.
BR−1(l) corresponding to a label sequence “?” of the third row is “??”, “_?”, and “?_”. A sum of the likelihoods of these “??”, “_?”, and “?_” is 0.07. In other words, the likelihood of the label sequence “?” is 0.07.
For each of the label sequence “a?” of the fourth row and the label sequence “?a” of the fifth row, a corresponding BR−1(l) is an empty set. In other words, both the likelihood of the label sequence “a?” of the fourth row and the likelihood of the label sequence “?a” of the fifth row are 0.00.
The label sequence of the sixth row is an exclusion. BR−1(l) corresponding to this is “a?” and “?a”. From Table 3, the likelihood of “a?” is 0.63, and the likelihood of “?a” is 0.00. In other words, the likelihood of the exclusion is a sum of such likelihoods, which is 0.63.
“exclusion” represented in the sixth row of Table 6 cannot be selected as the maximum likelihood label sequence by the label sequence selecting unit 27. In other words, the label sequence selecting unit 27 selects “a” (the second row) that is a label sequence having a highest likelihood among the first row to the fifth row of Table 6. The likelihood is 0.29. The label sequence selecting unit 27 transfers the selected label sequence “a” to the output unit 29.
In other words, while “a?” is selected as an output label sequence in a case in which a conventional technology is used (Table 4), in a case in which the first modified example of this embodiment is used (Table 6), “a” is selected as an output label sequence. This is in accordance with use of BR as a mapping function.
Next, a second embodiment will be described. Hereinafter, description of items that have already been described in the previous embodiment (including the modified examples) may be omitted. Here, items that are features of this embodiment will be focused on in description.
The smoothing unit 23 performs smoothing such that a change in elements, which are adjacent to each other, of a categorical distribution sequence acquired by the input unit 21 decreases. In other words, the smoothing unit 23 smoothens changes between categorical distributions, which are adjacent to each other, included in the categorical distribution sequence. As an example of the method, the smoothing unit 23 applies a Gaussian filter for each class. In other words, the smoothing unit 23 applies the process of a Gaussian filter in the direction of a column within the categorical distribution sequence for the value of the score for each class. In this case, the smoothing unit 23 substitutes the value of pt(k) with a value represented by the following Equation (19).
In this Equation (19), s is a positive integer that is appropriately set. In addition, wr is a coefficient that is appropriately set.
For example, as s=1, it may be set such that w−1=1, w0=1, and w1=1.
In addition, for example, as s=1, it may be set such that w−1=1, w0=2, and w1=1.
Furthermore, for example, as s=2, it may be set such that w−2=1, w−1=2, w0=4, w1=2, and w−2=1.
In addition, wr may have a different value.
Here, in a case in which the value of (t+r) is outside the range of the categorical distribution sequence, in other words, in a case in which (t+r)<1 or (t+r)>L, corresponding items are deleted from both the numerator and the denominator of Equation (9). In other words, in this case, the value of a corresponding wr may be set to 0.
The smoothing unit 23 may or may not normalize likelihood values of results smoothen by the process of Equation (19) and the like. In the case of normalization, for example, the smoothing unit 23 performs normalization such that a sum of likelihoods for each class at the position of each t (1≤t≤L) is 1. In addition, the rejection class adding unit 25 may subsequently normalize scores after addition of the rejection class regardless of whether or not the smoothing unit 23 performs a normalization process.
The function of the rejection class adding unit 25 according to this embodiment is similar to that according to the first embodiment. In this embodiment, the rejection class adding unit 25 performs a subsequent process on the basis of the categorical class sequence of results smoothed by the smoothing unit 23.
In Step S51, the input unit 21 acquires categorical distribution sequences p1, . . . , pL from the outside. The process of this step corresponds to Step S1 in the flowchart (the first embodiment) illustrated in
In Step S52, the smoothing unit 23 performs a process of smoothing a categorical distribution sequence, which is acquired by the input unit 21, in a series direction. As described above, the smoothing unit 23 may perform smoothing using a Gaussian filter as an example. The process of this step is a process that is a feature of this embodiment. The smoothing unit 23 transfers the categorical distribution sequence after smoothing to the rejection class adding unit 25.
Processes of Steps S53 to S62 respectively correspond to the processes of Steps S2 to S11 in the flowchart illustrated in
In addition, any one of the first modified example to the fifth modified example of the first embodiment may be combined with this embodiment for the process. Also in such a case, the process of each case is performed using a result of the smoothing process performed by the smoothing unit 23.
According to this embodiment, since the smoothing unit 23 smoothens the categorical distribution sequence, rejection that is robust with respect to noises can be expected to be realized.
In each of the embodiments described above, the label sequence selecting unit 27 selects a label sequence having a highest (in other words, first-ranked) likelihood among candidates. As a modified example, the label sequence selecting unit 27 may be configured to select a label sequence other than a first-ranked label sequence (in other words, for example, a second-ranked or third-ranked label sequence or the like) among candidates. In addition, the label sequence selecting unit 27 may be configured to select a plurality of label sequences (for example, up to the n-th ranked (here, n is a positive integer) label sequences) in accordance with likelihoods among candidates. Also in such a case, the output unit 29 outputs the label sequence selected by the label sequence selecting unit 27.
In the embodiments, a plurality of modified examples have been described. Here, a plurality of modified examples may be combined as long as the combination can be performed.
According to at least one of the embodiments described above, instead of rejecting the entire series or assigning a score of rejection to the entire series, the rejection class adding unit 25 adds a rejection class having a score for each categorical distribution within the categorical distribution sequence. By including such a rejection class adding unit 25, a rejection label can be included at a specific label position within a label sequence (series) to be output.
In other words, according to at least one of the embodiments, in a recognition process performed in units of series, only a part of the series can be rejected.
As one example, in a case in which an image in which “abc” is written is recognized as characters in a character string recognition problem, the certainty factors of “a” and “c” are high, and the certainty factor of “b” is low, only “b” can be rejected in units of characters like “a?c”.
The function of the information processing device according to the embodiment described above may be realized by a computer. In such a case, by recording a program used for realizing the function on a computer-readable recording medium and causing the computer system to read and execute the program recorded on this recording medium, the function may be realized. The “computer system” described here includes an OS and hardware such as peripherals. The “computer-readable recording medium” represents a portable medium such as a flexible disc, a magneto-optical disk, a ROM, a CD-ROM, a DVD-ROM, or a USB memory or a storage device such as a hard disk built in the computer system. In other words, a “computer-readable recording medium” may be a non-transitory computer-readable recording medium. Furthermore, the “computer-readable recording medium” may include a medium dynamically storing the program for a short time such as a communication line of a case in which the program is transmitted through a network such as the Internet or a communication circuit line such as a telephone line and a medium storing the program for a predetermined time such as an internal volatile memory of the computer system that becomes a server or a client in such a case. The program described above may be a program used for realizing a part of the function described above or a program that can realize the function described above in combination with a program that is already recorded in the computer system.
While several preferred embodiments of the invention have been described and illustrated above, it should be understood that these embodiments are presented as examples and are not intended to limit the scope of the invention. These embodiments may be performed in various other forms, and additions, omissions, substitutions, and other modifications can be made without departing from the spirit or scope of the present invention. These embodiments and modifications thereof are included in the scope or spirit of the present invention and, similarly, are included in the scope of the invention defined in the claims and the scope of equivalency thereof. Accordingly, the invention is not to be considered as being limited by the foregoing description, and is only limited by the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2019-228171 | Dec 2019 | JP | national |