This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2022-147323, filed Sep. 15, 2022, the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to a representation learning apparatus, a method, and a non-transitory computer readable medium.
In recent machine learning, a method of representation learning for representing complex data such as image data, audio data, or time-series data using a low-dimensional feature amount vector has been proposed. As an example, a representation learning method suitable for clustering has been proposed. In this method, since a feature amount for clustering complex/abstract information samples that can be grouped is learned, a feature amount that clusters both features that should be focused and features that should not be focused is learned.
A representation learning apparatus according to the embodiment includes a first acquisition unit, a second acquisition unit, a first vector calculation unit, a second vector calculation unit, a similarity calculation unit, a loss function calculation unit, and an updating unit. The first acquisition unit acquires target data. The second acquisition unit acquires non-interest data similar to a non-interest feature included in the target data. The first vector calculation unit calculates a latent vector in the latent space of the target data using a first model parameter concerning a first machine learning model of a training target. The second vector calculation unit calculates a first non-interest latent vector in the latent space of the non-interest feature in the target data and a second non-interest latent vector in the latent space of the non-interest data using a second model parameter concerning a second machine learning model of the training target. The similarity calculation unit calculates a first similarity obtained by correcting the similarity between the latent vector and a first representative value of the latent vector by the similarity between the first non-interest latent vector and a second representative value of the first non-interest latent vector, and a second similarity between the second non-interest latent vector and a third representative value of the second non-interest latent vector. The loss function calculation unit calculates a loss function including the first similarity and the second similarity. The updating unit updates the first model parameter and/or the second model parameter based on the loss function.
A representation learning apparatus, a method, and a non-transitory computer readable medium according to this embodiment will now be described with reference to the accompanying drawings.
The processing circuit 1 includes a processor such as a CPU (Central Processing Unit), and a memory such as a RAM (Random Access Memory). The processing circuit 1 includes a first acquisition unit 11, a second acquisition unit 12, a first vector calculation unit 13, a second vector calculation unit 14, a similarity calculation unit 15, a loss function calculation unit 16, an updating unit 17, a learning control unit 18, a post-processing unit 19, and a display control unit 20. The processing circuit 1 executes a representation learning program, thereby implementing the functions of the units 11 to 20. The representation learning program is stored in a non-transitory computer readable medium such as the storage device 2. The representation learning program may be implemented as a single program that describes all the functions of the units 11 to 20, or may be implemented as a plurality of modules divided into several functional units. In addition, the units 11 to 20 may be implemented by an integrated circuit such as Application Specific Integrated Circuit (ASIC). In this case, the units may be implemented on a single integrated circuit, or may be implemented individually on a plurality of integrated circuits.
The first acquisition unit 11 acquires processing target learning data (to be referred to as target data hereinafter). The target data means data of a classification target by a machine learning model. The target data has a feature that should be focused (to be referred to as a interest feature hereinafter), and a feature that should not be focused (to be referred to as a non-interest feature hereinafter). The target data is not particularly limited if it can be classified and, for example, image data, audio data, character data, waveform data, and the like are used.
Detailed examples of target data will be described here with reference to
The first vector calculation unit 13 calculates a latent vector in the latent space of target data using a first model parameter of a first machine learning model of a training target. The latent vector is a vector representing data obtained by compressing the dimensions of the target data. The latent space means a space established by the latent vector. The first machine learning model is an encoder network that converts the target data into the latent vector. The first model parameter is a parameter of the training target assigned to the first machine learning model. Typically, the first model parameter is a weight or a bias. The first model parameter is stored in the storage device 2.
The second vector calculation unit 14 calculates a latent vector (to be referred to as a first non-interest latent vector hereinafter) in the latent space of a non-interest feature of target data using a second model parameter of a second machine learning model of the training target. Also, the second vector calculation unit 14 calculates a second non-interest latent vector in the latent space of the non-interest data using the second model parameter. If the first non-interest latent vector and the second non-interest latent vector are not discriminated, these will simply be referred to as non-interest latent vectors hereinafter. The latent space concerning the second machine learning model means a space established by the latent vector of non-interest. The second machine learning model is an encoder network that converts the non-interest feature of the target data or the non-interest data into the first non-interest latent vector and the second non-interest latent vector. The second model parameter is a parameter of the training target assigned to the second machine learning model. Typically, the second model parameter is a weight or a bias. The second model parameter is stored in the storage device 2. Note that the first machine learning model and the second machine learning model may be of the same type or different types.
The similarity calculation unit 15 calculates a first similarity obtained by correcting the similarity between a latent vector and a first representative value of the latent vector by the similarity between a first non-interest latent vector and a second representative value of the first non-interest latent vector. The first representative value is a value representing a plurality of latent vectors obtained until the preceding iteration count in representation learning processing. Similarly, the second representative value is a value representing a plurality of first non-interest latent vectors obtained until the preceding iteration count in representation learning processing. In addition, the similarity calculation unit 15 calculates a second similarity between a second non-interest latent vector and a third representative value of the second non-interest latent vector. The third representative value is a value representing a plurality of second non-interest latent vectors obtained until the preceding iteration count in representation learning processing. Furthermore, the similarity calculation unit 15 may calculate a third similarity between the first non-interest latent vector and the third representative value.
The loss function calculation unit 16 calculates a loss function including at least the first similarity and the second similarity. If the third similarity is calculated, the loss function calculation unit 16 may calculate a loss function including the first similarity, the second similarity, and the third similarity.
The updating unit 17 updates the first model parameter and/or the second model parameter based on the loss function. More specifically, the updating unit 17 updates the first model parameter and/or the second model parameter in accordance with the gradient of the loss function.
The learning control unit 18 controls representation learning processing. More specifically, the learning control unit 18 determines whether a stop condition of representation learning processing is satisfied, and iterates the representation learning processing until it is determined that the stop condition is satisfied. Upon determining that the stop condition of representation learning processing is satisfied, the learning control unit 18 outputs the first model parameter and/or the second model parameter in the current iteration count as a learned model parameter.
The post-processing unit 19 executes post-processing using an information resource obtained by representation learning processing. More specifically, clustering processing and search processing are executed as post-processing. In clustering processing, target data or new data are clustered using latent vectors and/or non-interest latent vectors obtained by representation learning processing. In search processing, another target data or new data similar to target data or new data as a reference is searched for using latent vectors and/or non-interest latent vectors obtained by representation learning processing.
The display control unit 20 displays various data on the display device 5. For example, the display control unit 20 displays a result of clustering using a machine learning model.
The storage device 2 is formed by a ROM (Read Only Memory), an HDD (Hard Disk Drive), an SSD (Solid State Drive), and integrated circuit storage device, or the like. The storage device 2 stores the representation learning program, and the like. Also, the storage device 2 stores the latent vector of the target data and the first representative value thereof, the latent vector of non-interest of the target data and the second representative value thereof, and the latent vector of non-interest of non-interest data and the third representative value thereof.
The input device 3 inputs various kinds of instructions from a user. As the input device 3, a keyboard, a mouse, various kinds of switches, a touch pad, a touch panel display, or the like can be used. An output signal from the input device 3 is supplied to the processing circuit 1. Note that the input device 3 may be an input device of a computer connected to the processing circuit 1 by wire or wirelessly.
The communication device 4 is an interface configured to perform data communication with an external device connected to the representation learning apparatus 100 via a network.
The display device 5 displays various kinds of information. For example, the display device 5 displays various kinds of data under the control of the display control unit 20. As the display device 5, a CRT (Cathode-Ray Tube) display, a liquid crystal display, an organic EL (Electro Luminescence) display, an LED (Light-Emitting Diode) display, a plasma display, or any other arbitrary display known in this technical field can appropriately be used. Also, the display device 5 may be a projector.
Representation learning processing according to this embodiment will be described below.
As shown in
In step S401, the first acquisition unit 11 may perform data extension for the target data xi. As an example, random image cropping or a method of changing brightness, lightness, saturation, or the like at random is performed as the data extension.
When step S401 is performed, the first vector calculation unit 13 calculates a latent vector Sxi of the target data x1 using the first model parameter (step S402). The first vector calculation unit 13 reads out the first model parameter of the training target from the storage device 2, sets the readout first model parameter to the first machine learning model, and sequentially propagates the target data x1 to the first machine learning model, thereby calculating the latent vector Sxi. The first machine learning model is an encoder network that receives d-dimensional target data x1 and outputs the d′-dimensional latent vector Sxi. d′ is smaller than d. The architecture of the encoder network is not particularly limited and, for example, a deep neural network such as a ResNet (Deep Residual Learning) is used. A model parameter in the initial updating count of the representation learning processing is set to an arbitrary value. The latent vector Sxi is stored in the Sx memory 21.
When step S402 is performed, the second vector calculation unit 14 calculates a latent vector Zxi of non-interest of the target data x1 using the second model parameter (step S403). In step S403, the second vector calculation unit 14 reads out the second model parameter of the training target from the storage device 2, sets the readout second model parameter to the second machine learning model, and sequentially propagates the target data xi to the second machine learning model, thereby calculating the latent vector Zxi of non-interest. The second machine learning model is an encoder network that receives d-dimensional target data xi and outputs the d″-dimensional latent vector Zxi of non-interest. d″ is smaller than d. A model parameter in the initial updating count of the representation learning processing is set to an arbitrary value. The latent vector Zxi of non-interest is stored in the Zx memory 22.
When step S403 is performed, the similarity calculation unit 15 calculates the similarity S1ij based on the latent vector Sxi and a representative value S′xj thereof, and the latent vector Zxi of non-interest and a representative value Z′xj thereof (step S404). The suffix “j” represents a jth representative value S′x or Z′x. In step S404, the similarity calculation unit 15 acquires the representative value S′xj from the Sx memory 21. The representative value S′xj is a vector representing latent vectors Sx calculated until the preceding iteration count. As an example, the representative value S′xj is the moving average value of the plurality of latent vectors Sx calculated until the preceding iteration count. Similarly, the similarity calculation unit 15 acquires the representative value Z′xj from the Zx memory 22. The representative value Z′xj is a vector representing a plurality of latent vectors Zxi of non-interest calculated until the preceding iteration count. As an example, the representative value Z′xj is the moving average value of the plurality of latent vectors Zxi of non-interest calculated until the preceding iteration count.
In step S404, the similarity calculation unit 15 calculates the similarity S1ij obtained by correcting the similarity between the latent vector Sxi and the representative value S′xj of the latent vector Sxi by the similarity between the latent vector Zx of non-interest and the representative value Z′xj of the latent vector Zxi of non-interest. As an example, the similarity S1ij is calculated in accordance with equation (1) below. The numerator of the similarity S1ij represents the similarity between the latent vector Sxi and the representative value S′xj. The denominator represents the similarity between the latent vector Zx of non-interest and the representative value Z′xj. τ is a parameter for controlling the degree of enhancement of the similarity.
Since the similarity Slip need only be obtained by correcting the similarity between the latent vector Sxi and the representative value S′xj thereof by the similarity between the latent vector Zxi of non-interest and the representative value Z′xj thereof, the calculation method is not limited to equation (1). For example, the similarity S1ij may be calculated in accordance with equation (2) below. τ′ is a parameter for controlling the degree of enhancement of the similarity of the latent vector of non-interest.
When step S404 is performed, the second acquisition unit 12 acquires the data bi of non-interest (step S405). In step S405, the second acquisition unit 12 acquires M data bi of non-interest. In step S405, the second acquisition unit 12 may perform data extension for the data bi of non-interest. The data extension may be the same as in step S401 or may be different processing.
When step S405 is performed, the second vector calculation unit 14 calculates a latent vector Zb of non-interest of the non-interest data b using the second model parameter (step S406). The second vector calculation unit 14 reads out the second model parameter of the training target from the storage device 2, sets the readout second model parameter to the second machine learning model, and sequentially propagates the data bi of non-interest to the second machine learning model, thereby calculating a latent vector Zbi of non-interest. The latent vector Zbi of non-interest is stored in the Zb memory 23. In step S406, the second vector calculation unit 14 uses the same model parameter as the second model parameter used in step S403.
When step S406 is performed, the similarity calculation unit 15 calculates a similarity S2ij based on the latent vector Zbi of non-interest and a representative value Z′bj thereof (step S407). In step S407, the similarity calculation unit 15 acquires a representative value Z′b from the Zb memory 23. A representative value Z′bi is a vector representing the latent vectors Zbi calculated until the preceding iteration count. As an example, the representative value Z′bi is the moving average value of the plurality of latent vectors Zbi of non-interest calculated until the preceding iteration count. The similarity S2ij is calculated in accordance with equation (3) below. The numerator of the similarity S2ij represents the similarity between the latent vector Zbi of non-interest and the representative value Z′bj. The denominator τ is a parameter for controlling the degree of enhancement of the similarity.
When step S407 is performed, the similarity calculation unit 15 calculates a similarity S3ij between the latent vector Zxi of non-interest and the representative value Z′bj of a latent vector Zb of non-interest (step S408). In step S408, the similarity calculation unit 15 acquires the representative value Z′bj from the Zb memory 23. The similarity S3ij is calculated in accordance with equation (4) below. A numerator ZxiZ′bj represents the similarity between the latent vector Zxi of non-interest and the representative value Z′bj of the latent vector Zb of non-interest. The denominator τ is a parameter for controlling the degree of enhancement of the similarity.
When step S408 is performed, the loss function calculation unit 16 calculates a loss function loss including a similarity S1, a similarity S2, and a similarity S3 (step S409). As an example, the loss function loss is calculated in accordance with equation (5) below. As indicated by equation (5), the loss function loss is defined by the sum of the similarity S1, the similarity S2, and the similarity S3. The first term of equation (5) corresponds to the similarity S1. The value of the first term becomes small when the latent vector Sxi of the target data x has a high similarity to S′xi of itself and a low similarity to the S′xj other than that, and the degree is corrected by the similarities of the latent vectors Zxi and Z′xj of non-interest of the target data x. The second term of equation (5) corresponds to the similarity S2. The first term plays a role of correcting the loss function loss such that the latent vector Zx of non-interest is not included in the latent vector Sx. The value of the second term becomes small when the latent vector Zbi of non-interest of the non-interest data b has a high similarity to Z′bi of itself and a low similarity to Z′bj other than that. The second term plays a role of making the similar latent vectors Zb of non-interest close to each other and unsimilar latent vectors Zb of non-interest apart from each other. The third term of equation (5) corresponds to the similarity S3. The value of the third term becomes small when the latent vector Zxi of non-interest of the target data x has a high similarity to a most similar latent vector Zbk of non-interest of the non-interest data b and a low similarity to other vectors. The third term plays a role of making the latent vector Zx of non-interest close to the similar latent vector Zb of non-interest and apart from the unsimilar latent vector Zb of non-interest.
The loss function loss is not limited to equation (5), and another term may be added as indicted by equation (6). The fourth, fifth, and sixth terms of equation (6) are feature decorrelation terms in Non-Patent Literature 1 (“Clustering Friendly representation learning via instance discrimination and feature decorrelation”, Yaling Tao, Kentaro Takagi, Kouta Nakata. arXiv: 2106.00131 (ICLR2021)). A feature decorrelation term LfdSx represents a degree that the latent vectors Sx are orthogonal to each other. A feature decorrelation term LfdZx represents a degree that the latent vectors Zx of non-interest are orthogonal to each other. A feature decorrelation term LfdZb represents a degree that the latent vectors Zb of non-interest are orthogonal to each other.
When step S409 is performed, the updating unit 17 updates the first model parameter and/or the second model parameter in accordance with the gradient of the loss function loss (step S410). In step S410, the updating unit 17 can update the model parameter using an arbitrary optimization method such as stochastic gradient descent or ADAM.
When step S410 is performed, the first vector calculation unit 13 updates the representative value S′x stored in the Sx memory 21 (step S411). In step S411, the first vector calculation unit 13 updates the representative value S′x based on the latent vector Sx calculated in step S402 in the current updating count. Typically, the representative value S′x is updated by a method such as a moving average method. As an example, when performing updating using an index moving average method, a representative value S′xnew after updating is calculated based on the latent vector Sx and a representative value S′xold in the current updating count in accordance with an index moving average method represented by equation (7) below. The representative value S′xnew is stored in the Sx memory 21.
s′
x
new
=αs
x+(1−α)s′xold (7)
Note that the updating method is not limited only to the above-described method. As ab example, the representative value S′xnew after updating may be obtained by replacing a statistic value such as the average value of the latent vectors Sx calculated in step S402 of the current updating count with the representative value S x old of the current updating count.
When step S411 is performed, the second vector calculation unit 14 updates the representative value Z′x stored in the Zx memory 22 (step S412). The representative value Z′x after updating is stored in the Zx memory 22. In step S412, the second vector calculation unit 14 updates the representative value Z′x based on the latent vector Zx of non-interest calculated in step S403 of the current updating count. The representative value Z′x is updated by the same moving average method as in step S411.
When step S412 is performed, the second vector calculation unit 14 updates the representative value Z′b stored in the Zb memory 23 (step S413). The representative value Z′b after updating is stored in the Zb memory 23. In step S413, the second vector calculation unit 14 updates the representative value Z′b based on the latent vector Zb of non-interest calculated in step S406 of the current updating count. The representative value Z′b is updated by the same moving average method as in step S411.
When step S413 is performed, the learning control unit 18 determine whether the stop condition is satisfied (step S414). The stop condition is set to a condition that the updating count reaches an updating count set in advance, a condition that the value of the loss function is less than a first threshold, or a condition that the number of times when the decrease of the value of the loss function is equal to or less than a second threshold reaches a third threshold. Upon determining that the stop condition is not satisfied (NO in step S414), steps S401 to S414 are iterated for the new target data x and non-interest data b. By the iteration of steps S401 to S414, the first model parameter and/or the second model parameter can be trained such that the value of the loss function loss including the similarity S1, the similarity S2, and the similarity S3 is made small.
Upon determining in step S414 that the stop condition is satisfied (YES in step S414), the learning control unit 18 outputs the first model parameter and/or the second model parameter (step S415). The output first model parameter and/or the second model parameter is stored in the storage device 2.
When step S415 is performed, the representation learning processing according to this embodiment is ended.
The processing procedure of the above-described representation learning processing is merely an example, and addition, deletion and/or change of processing is possible without departing from the scope of the present invention.
As an example, the order of the steps shown in
As another example, the loss function need not include all the similarity S1, the similarity S2, and the similarity S3. For example, the loss function may include the similarity S1 and the similarity S2 but not the similarity S3. As still another example, the loss function may further include a mutual information amount between the latent vector Sx and the first latent vector Zx of non-interest.
According to the above-described representation learning processing, the first model parameter and/or the second model parameter is updated based on the loss function including the similarity S1 and the similarity S2. The similarity S1 is an index obtained by correcting the similarity between the latent vector Sx and the representative value S′x thereof by the similarity between the latent vector Zx of non-interest and the representative value Z′x thereof. The similarity S2 is an index representing the similarity between the latent vector Zb of non-interest and the representative value Z′b thereof. By using such a loss function, representation learning for enhancing an interest feature and suppressing a non-interest feature can be performed for target data including the interest feature and the non-interest feature.
Use examples of the information resource obtained by the above-described representation learning processing will be described below.
In Use Example 1, clustering based on the interest feature of the target data used in the representation learning processing according to this embodiment is executed. A post-processing unit 19 according to Use Example 1 clusters target data x based on a set of latent vectors Sx calculated by representation learning processing and stored in an Sx memory 21.
When step S801 is performed, the post-processing unit 19 clusters P target data x using the latent vectors Sx (step S802). Clustering is executed by unsupervised clustering. More specifically, clustering is executed by the K-means method. More specifically, the post-processing unit 19 initially assigns one of a plurality of labels to each of the P latent vectors Sx in the latent space (step A). For each label, the post-processing unit 19 calculates the center-of-gravity points of a plurality of latent vectors Sx belonging to the label (step B). For each of the P latent vectors Sx, the post-processing unit 19 calculates the distance from the plurality of center-of-gravity points, selects a label to which a center-of-gravity point corresponding to the shortest one of the plurality of distances belongs, and newly assigns the selected label to the latent vector Sx (step C). The post-processing unit 19 iterates the processes of steps A to C until the label assigned in step C does not change anymore. Clustering of the latent vectors Sx is thus performed. The latent vectors Sx and the target data x are in a one-to-one correspondence. For this reason, when clustering of the latent vectors Sx is performed, clustering of the target data x is also performed.
Here, the difference between clustering according to this embodiment and clustering according to a comparative example will be described. In clustering, classification is performed using the distance between latent vectors or the distance between a latent vector and the cluster center. In representation learning processing according to the comparative example, only the first vector calculation unit 13 according to this embodiment is used, and the second vector calculation unit 14 is not used. For this reason, the machine learning model learns, without distinction, both an interest feature that should be focused and a non-interest feature that should not be focused. Hence, for example, when clustering first data and second data, which have the same interest feature and different non-interest features, the first data and the second data are classified into different classes.
In the representation learning processing according to this embodiment, both the first vector calculation unit 13 and the second vector calculation unit 14 are used. It is therefore possible to obtain the first machine learning model than extracts only the interest feature that is focused and the second machine learning model that extracts only the non-interest feature that should not be focused. Hence, even when clustering first data and second data, which have the same interest feature and different non-interest features, the first data and the second data can be classified into the same class by placing focus only on the interest feature.
In Use Example 2, clustering based on an interest feature of new data x′ that is not used in the representation learning processing according to this embodiment is executed. A post-processing unit 19 according to Use Example 2 calculates a latent vector Sx′ in the latent space of the new data x′ using the first model parameter and clusters the new data x′ based on the calculated latent vector Sx′.
According to Use Example 2, even for the new data x′ that is not used in the representation learning processing, clustering can be executed using the first model parameter trained in the representation learning processing. It is therefore possible to execute accurate clustering as compared to the comparative example.
In Use Example 3, clustering based on a non-interest feature of target data x used in the representation learning processing according to this embodiment is executed. A post-processing unit 19 according to Use Example 3 clusters the target data x based on a set of latent vectors Zx of non-interest.
According to Use Example 3, clustering can be executed using the target data x or the latent vector Zx of non-interest used in the representation learning processing. It is therefore possible to execute accurate clustering as compared to the comparative example.
In Use Example 4, clustering based on a non-interest feature of new data x′ is executed. A post-processing unit 19 according to Use Example 4 calculates a latent vector Zx′ of non-interest in the latent space of the new data x′ using the second model parameter and clusters the new data x′ based on the calculated latent vector Zx′ of non-interest.
According to Use Example 4, even for the new data x′ that is not used in the representation learning processing, clustering can be executed using the second model parameter trained in the representation learning processing. It is therefore possible to execute accurate clustering as compared to the comparative example.
In Use Example 5, target data similar to new data is searched for based an interest feature. A post-processing unit 19 according to Use Example 5 calculates a new latent vector Sx′ in the latent space of new data x′ using the first model parameter, and calculates the distance or similarity between the latent vector Sx′ and a latent vector Sx. A display control unit 20 displays target data x on a display device 5 in the ascending order of distance or similarity.
When step S1404 is performed, the display control unit 20 presents the target data x similar to the new data x′ from the P target data x (step S1405). As an example, in step S1405, the display control unit 20 displays the target data x with a distance equal to or less than a threshold on the display device 5 as the target data x similar to the new data x′. At this time, the display control unit 20 displays the target data x similar to the new data x′ in a ranking format in the ascending order of distance. As another example, the display control unit 20 may display all or a predetermined number of target data x in the ascending order of distance. Note that in steps S1404 and S1405, a similarity, a cosine similarity, or another similarity used in the above-described representation learning processing may be used in place of the distance.
According to Use Example 5, the target data x similar to the new data x′ can be searched for using the first model parameter and the latent vector Sx obtained by the representation learning processing. Hence, the accuracy of search processing is expected to improve.
In Use Example 6, another new data x′ similar to new data x is searched for based an interest feature. A post-processing unit 19 according to Use Example 6 calculates a first new latent vector Sx1′ in the latent space of first new data x1′ and a plurality of second new latent vectors Sx2′ in the latent space of a plurality of second new data x2′ using the first model parameter, and calculates the distance or similarity between each of the plurality of second new latent vectors Sx2′ and the first new latent vector Sx1′. A display control unit 20 displays the plurality of second new data x2′ on a display device 5 in the ascending order of distance or similarity.
According to Use Example 6, the new data x2′ similar to the new data x1′ can be searched for using the first model parameter obtained by the representation learning processing. Hence, the accuracy of search processing is expected to improve.
In Use Example 7, target data similar to new data is searched for based a non-interest feature. A post-processing unit 19 according to Use Example 7 calculates a plurality of new latent vectors Zx′ of non-interest in the latent space of a plurality of new data x′ using the second model parameter, and calculates the distance or similarity between each of the plurality of new latent vectors Zx′ of non-interest and a latent vector Zx of non-interest of the target data x. A display control unit 20 displays the plurality of new data on a display device in the ascending order of distance or similarity.
According to Use Example 7, the target data x similar to the new data x′ can be searched for using the second model parameter and the latent vector Zx of non-interest obtained by the representation learning processing. Hence, the accuracy of search processing is expected to improve.
In Use Example 8, another new data similar to new data is searched for based a non-interest feature. A post-processing unit 19 according to Use Example 8 calculates a first new latent vector Zx1′ of non-interest in the latent space of a non-interest feature in first new data x1′ and a plurality of second new latent vectors Zx2′ of non-interest in the latent space of a non-interest feature in a plurality of second new data x2′ using the second model parameter, and calculates the distance or similarity between each of the plurality of second new latent vectors Zx2′ of non-interest and the first new latent vector Zx1′ of non-interest. A display control unit 20 displays the plurality of second new data x2′ on a display device 5 in the ascending order of distance or similarity.
According to Use Example 8, the new data x2′ similar to the new data x1′ can be searched for using the second model parameter obtained by the representation learning processing. Hence, the accuracy of search processing is expected to improve.
As described above in various embodiments, the representation learning apparatus 100 includes the first acquisition unit 11, the second acquisition unit 12, the first vector calculation unit 13, the second vector calculation unit 14, the similarity calculation unit 15, the loss function calculation unit 16, and the updating unit 17. The first acquisition unit 11 acquires target data x. The second acquisition unit 12 acquires non-interest data b similar to a non-interest feature included in the target data x. The first vector calculation unit 13 calculates the latent vector Sx in the latent space of the target data x using the first model parameter concerning the first machine learning model of the training target. The second vector calculation unit 14 calculates the latent vector Zx of non-interest in the latent space of the non-interest feature included in the target data x and the latent vector Zb of non-interest in the latent space of the non-interest data b using the second model parameter concerning the second machine learning model of the training target. The similarity calculation unit 15 calculates the similarity S1 obtained by correcting the similarity between the latent vector Sx and the representative value S′x thereof by the similarity between the latent vector Zx of non-interest and the representative value Z′x thereof, and the similarity S2 between the latent vector Zb of non-interest and the representative value Z′b thereof. The loss function calculation unit 16 calculates a loss function including the similarity S1 and the similarity S2. The updating unit 17 updates the first model parameter and the second model parameter based on the loss function.
According to the above-described configuration, the first model parameter and/or the second model parameter is updated based on the loss function including the similarity S1 and the similarity S2. The similarity S1 is an index obtained by correcting the similarity between the latent vector Sx and the representative value S′x thereof by the similarity between the latent vector Zx of non-interest and the representative value Z′x thereof. The similarity S2 is an index representing the similarity between the latent vector Zb of non-interest and the representative value Z′b thereof. By using such a loss function, representation learning for enhancing a interest feature and suppressing a non-interest feature can be performed for target data including the interest feature and the non-interest feature.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2022-147323 | Sep 2022 | JP | national |