This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2022-149134, filed on Sep. 20, 2022; the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to a method and an information processing device.
In the conventional art, there are methods or devices for performing information processing of searching for data similar to a query as input data and outputting the search result. Required in such methods or devices are the speed for the query response and the accuracy of the search relating to information processing for outputting a result to an input query are required. As a nearest neighbor search algorithm for achieving both the speed of the query response and the accuracy of the search, an approximate nearest neighbor search (ANNS) algorithm using a plurality of heterogeneous memories is known.
According to an embodiment, a method includes receiving a query, and selecting one of first objects on the basis of the query and a neural network model. Each of the first objects is associated with one or more pieces of first data in a group of first data stored on a first memory. The method further includes calculating a metric of a distance between the query and one or more pieces of second data. The one or more pieces of second data are one or more pieces of first data associated with a second object. The second object is the one of the first objects having been selected. The method further includes identifying third data on the basis of the metric of distance. The third data is first data closest to the query in the group of the first data.
A nearest neighbor search method according to one of embodiments is executed by, for example, an information processing device including a processor, a first memory, and a second memory. The first memory is a memory having a larger capacity than that of the second memory. The second memory is capable of operating at a higher speed than the first memory. Hereinafter, an example will be described, in which a nearest neighbor search according to one of embodiments is performed in a computer including a solid state drive (SSD) as the first memory and a dynamic random access memory (DRAM) as the second memory.
Note that the nearest neighbor search method according to one of embodiments may be implemented by collaboration of two or more information processing devices connected to each other via a network. Alternatively, the nearest neighbor search according to one of embodiments may be implemented by a storage device that includes a storage medium such as a NAND flash memory device as the first memory, a DRAM as the second memory, and a processor.
Hereinafter, the method and the information processing device according to one of embodiments will be described in detail by referring to the accompanying drawings. Note that the present invention is not limited by these embodiments.
An information processing device 1 is a computer including a processor 2, an SSD 3 that is an example of a first memory, a DRAM 4 that is an example of a second memory, an input and output (I/O) circuit 5, and a bus 6 that electrically connects these. Note that the first memory and the second memory are not limited to the above. For example, the first memory may be any storage memory. The first memory may be a universal flash storage (UFS) device or a magnetic disk device.
The processor 2 executes a specific operation in accordance with a computer program. The processor 2 is, for example, a central processing unit (CPU) or a graphics processing unit (GPU). When a query as input data is input to the information processing device 1, the processor 2 executes a specific operation based on the input query using the SSD 3 and the DRAM 4.
The SSD 3 is a storage memory having a larger capacity than the DRAM 4. The SSD 3 includes a NAND flash memory as a storage medium.
The DRAM 4 has a smaller capacity than that of the SSD 3 but can operate at a higher speed than the SSD 3. Here, the operation includes a data write operation and a data read operation.
The I/O circuit 5 is an interface device to which input and output devices can be connected. Input and output devices include, for example, an input device, a display device, a network device, a printer, or the like.
The SSD 3 stores a group of data D. The type of each piece of data D is not limited to a specific type. Each piece of data D has N (where N is an integer greater than or equal to 1) elements. In other words, each piece of data D is an N-dimensional vector. Each piece of data D is data of an image, data of a document, or any other type of data, or data generated from these types of data. In one example, each piece of data D is N features extracted from an image. In another example, each piece of data D is a purchase log of products classified into N categories by different users. The number of elements N is common to all the data D and a query Q to be described later.
In response to receiving a query input to the information processing device 1, the processor 2 searches data D that is most close to the input query from a group of data D stored on the SSD 3.
In the present specification, the distance is a scale representing similarity between pieces of data. Mathematically, the distance is a Euclidean distance, for example. Note that the mathematical definition of the distance is not limited to the Euclidean distance. Additionally, a metric used for evaluation of the distance is not limited to the Euclidean distance or the like, and any metric can be used as long as the metric represents the distance. Herein, as an example, the inner product value is used as the metric of the distance. The inner product value increases as the distance is shorter.
In the first embodiment, the group of data D in the SSD 3 is categorized into a plurality of clusters. Each of the clusters is a sub-group of data D obtained by grouping two or more pieces of data D that are close to each other. Note that there may be a cluster including only one piece of data D. Each of the clusters is an example of a first object of the first embodiment.
In the data D0 to D21, the data D0 to D9 are grouped as a cluster #0. The data D7, D8, and D10 to D17 are grouped as a cluster #1. The data D8, D9, and D18 to D21 are grouped as a cluster #2.
The grouping may be implemented in any manner but is typically implemented on the basis of the distance between the data D. For example, the N-dimensional space, in which the data D is present, may be divided into a lattice shape, whereby a list may be set by a set of data D in each unit cell. This enables grouping two or more pieces of data D close to each other as one list. Hereinafter, in order to facilitate understanding of the description, it is based on the premise that the data D in the group are close to each other, however, this is not necessarily required for the present invention.
Note that one piece of data D may belong to only one cluster. Moreover, as in each of the data D7 to D9 in
When a query is input, the processor 2 first identifies, from among all clusters, a cluster to which data closest to the query belongs. Then, the processor 2 transfers all data D belonging to the identified cluster from the SSD 3 to the DRAM 4. Then, the processor 2 calculates the metric of the distance between each piece of data D and the query in the DRAM 4 and identifies data D closest to the query on the basis of each metric.
As technology to be compared with the first embodiment, there is a method of performing a graph-based nearest neighbor search for data in a storage memory, such as an SSD, which requires a lot of time for accessing. Such a method is referred to as a first comparative example. According to the first comparative example, in order to identify data closest to a query, the metric of the distance between candidate data and the query is calculated every time the candidate data is selected while candidate data is selected along the graph. The processing of selecting new candidate data along the graph is also referred to as “hop”.
However, according to the first comparative example, every time new candidate data is selected, in other words, at every hop, there occurs processing of transferring the selected candidate data from a storage memory such as an SSD to a high-speed memory such as a DRAM, which is a work area. Therefore, it takes a lot of time to identify the data closest to the query.
According to the first embodiment, all the data D included in a cluster, to which the data D closest to the query Q belongs, is collectively transferred from the SSD 3 to the DRAM 4. The time required for accessing the DRAM 4 is shorter than the time required for accessing the SSD 3. Therefore, the time required for identifying the data D closest to the query Q is shortened. Thus, according to the first embodiment, the speed of the query response is enhanced.
In the example illustrated in
Meanwhile, a second comparative example will be described as other technology to be compared with the first embodiment. According to the second comparative example, one representative point is set for each of the clusters. The representative point may be one piece of data selected from all the data belonging to the cluster or may be new data corresponding to the center or the center of gravity of the cluster. The cluster, to which the data closest to the query belongs, is identified on the basis of the distances from the representative points of the respective clusters to the data. For example, a cluster, in which a representative point closest to the query has been set, is identified as the cluster to which the data closest to the query belongs.
However, there may be a case where the cluster, in which the representative point closest to the query has been set, is different from the cluster to which the data closest to the query belongs. If a cluster different from the cluster to which the data closest to the query belongs is erroneously identified as the cluster to which the data closest to the query belongs, it is not possible to identify the data closest to the query.
In the example illustrated in
In the first embodiment of the present disclosure, a neural network model (neural network model 43 to be described later), which has been trained, is used to identify the cluster #1, to which the data D10 closest to the query Q in the group of data D belongs, as the cluster to which the data closest to the query belongs. By identifying the cluster, to which the data D10 closest to the query Q belongs, with the trained neural network model, the estimation error described in the description of the second comparative example is reduced, and the accuracy of the search is enhanced.
A search program 41 and model information 42 are loaded into the DRAM 4. The search program 41 and the model information 42 are stored in advance on a desired non-volatile memory (for example, the SSD 3). The processor 2 loads the search program 41 and the model information 42 into the DRAM 4 in accordance with a specific processing (for example, an instruction to start the search program 41).
The processor 2 searches for data closest to the query in accordance with the search program 41 loaded into the DRAM 4.
The model information 42 is information in which the structure of the neural network model 43 is recorded. The model information 42 includes, for example, definitions of nodes, definitions of connection relationships among the nodes, and biases. In the model information 42, each of the nodes is associated with an activation function and a trained weight. At the time of a search, the processor 2 performs operation as the neural network model 43 by using the model information 42 loaded into the DRAM 4, thereby identifying a cluster to which the data closest to the query Q belongs.
In the neural network model 43 illustrated in
The input layer includes nodes whose number corresponds to the number of elements constituting the query Q, namely, four nodes. Each of four nodes of the input layer is associated with one of four elements q0 to q3 of the query Q on a one-to-one basis and accepts input of the associated element.
The output layer includes nodes whose number corresponding to the number of clusters, namely, four nodes. Each of the four nodes included in the output layer is associated with one of the clusters #0 to #3 on a one-to-one basis and outputs a score for the associated cluster.
A score represents a probability that a cluster corresponding to each node corresponds to the cluster to which the data closest to the query Q belongs. In one example, it is based on the premise that the higher the score is, the higher the probability is that the cluster corresponding to the node having output the score corresponds to the cluster to which the data closest to the query Q belongs. Note that the relationship between the score and the probability is not limited to the above.
The processor 2 inputs the query Q to the input layer. Then, in each node of the hidden layers and the output layer, the processor 2 multiplies biases and input values from the respective nodes of the preceding layer by weights, applies an activation function to the total sum of the values obtained after the multiplication by the weights, and outputs a value obtained by applying the activation function.
The processor 2 acquires the scores for the clusters from the output layer and identifies a cluster corresponding to a node to which the highest score is output, as the cluster to which the data closest to the query Q belongs.
Hereinafter, a score output from a node associated with a cluster #X is referred to as a score of a cluster #X (X in the example of
Note that the neural network model 43 illustrated in
First, the processor 2 generates a large number of sample queries (S101). The processor 2 generates a larger number of sample queries from a given number of queries on the basis of, for example, a random number generating program or the like. Note that the method of generating the sample queries is not limited to the above.
Subsequently, the processor 2 identifies a cluster to which data closest to the sample query belongs for each of the sample queries (S102). For example, the processor 2 calculates the distance between each of the sample queries and each piece of data D, thereby identifying data D closest to each of the sample queries. Then, the processor 2 identifies a cluster to which the identified data D belongs.
Note that the processing of S102 is processing for creating training data. Therefore, accuracy is required for the processing of S102. The series of processing illustrated in
Subsequent to S102, the processor 2 executes training of the neural network model 43 using a large number of obtained pairs of a sample query and a cluster to which data closest to the sample query belongs as training data (S103). As a result, the weight for each node is determined, and the model information 42 is generated. Then, the training is completed.
Upon receiving a query Q input to the information processing device 1 (S201), the processor 2 inputs the query Q to the neural network model 43 (S202).
The processor 2 selects a cluster whose score is the highest, on the basis of the scores for the respective clusters output from the neural network model 43 (S203). Then, the processor 2 transfers all the data D belonging to the selected cluster from the SSD 3 to the DRAM 4 (S204).
The processor 2 calculates an inner product between the query Q and each piece of data D in the DRAM 4 (S205). Note that the inner product of two pieces of data is an example of the metric of the distance between the two pieces of data. The closer the distance between the two pieces of data is, the larger the value of the inner product of the two pieces of data is. Note that the metric of the distance between the two pieces of data is not limited to the inner product.
The processor 2 identifies a piece of data D closest to the query Q among the pieces of data D in the DRAM 4 on the basis of the inner product values between the query Q and each piece of data D in the DRAM 4, and outputs the identified data D as a search result (S206). For example, in a case where the inner product is used as the metric of the distance, the processor 2 outputs, as the search result, a piece of data D for which the largest inner product value has been obtained. Then, the series of processing of the nearest neighbor search ends.
As described above, according to the first embodiment, the processor 2 receives the input of the query Q and selects one cluster on the basis of the neural network model 43. The processor 2 calculates the metric of the distance between each piece of data D included in the selected cluster and the query Q. The processor 2 identifies data D closest to the query Q among the data D included in the selected cluster as the data D closest to the query Q in the group of the data D in the SSD 3 on the basis of the metric of the distance between each piece of data D included in the selected cluster and the query Q.
Therefore, the speed of the query response and the accuracy of the search can be both enhanced.
In addition, in the first embodiment, the processor 2 transfers the cluster selected on the basis of the neural network model 43 from the SSD 3 to the DRAM 4 and calculates the metric of the distance between each piece of data D in the DRAM 4 and the query Q.
Therefore, the speed of the query response can be enhanced.
Moreover, in the first embodiment, the neural network model 43 outputs, for each cluster, a score representing the possibility of including the data D closest to the query Q in the group of the data D in the SSD 3 in a case where the query Q is input. The processor 2 inputs the query Q to the neural network model 43 and selects a cluster having the highest possibility of including the data D closest to the query Q on the basis of the score for each cluster output from the neural network model 43.
Therefore, the accuracy of the query response can be enhanced.
In the above-described first embodiment, all the data D belonging to one cluster is collectively transferred from the SSD 3 to the DRAM 4. In order to minimize the time required for transferring all the data D belonging to each cluster, a set of pieces of data D belonging to each cluster may be disposed in a continuous area in an address space provided by an SSD 3 to a processor 2.
Specifically,
With such a configuration, the processor 2 is able to acquire all the data D included in a desired cluster only by issuing, to the SSD 3, a single read-command including the position in the address space and the size. Therefore, it is possible to reduce the time required for transferring all the data D belonging to the desired cluster from the SSD 3 to the DRAM 4.
As described above, in the graph-based nearest neighbor search using a group of data stored on a storage memory such as an SSD as a search range, transfer from the storage memory to a volatile memory occurs at every hop. Therefore, as the hop count is larger, the time required for the search is longer.
In a second embodiment, the time required for a search is suppressed by reducing the hop count required in a graph-based nearest neighbor search as much as possible. An information processing device according to the second embodiment will be referred to as an information processing device 1a. Moreover, in the description below, matters different from those of the first embodiment will be described, and description of the same matters as those of the first embodiment will be omitted or briefly described.
As in the first embodiment, a group of data D is stored on the SSD 3. The SSD 3 also stores graph information 31 that defines connection among the data D. The graph information 31 is generated in advance by a designer or a specific computer program.
As an example of the graph according to the second embodiment, a graph 32 is formed such that each of the pieces of data D0 to D20 is a node. Each of the pieces of data D0 to D20 is connected to all the pieces of data D0 to D20 via one or more edges and 0 or more pieces of data D. The edge is a path on which a hop is allowed.
The graph 32 includes two or more nodes each being depicted by a filled circle. Each filled circle indicates an entry point candidate. The entry point candidate refers to a node of an entry point, namely, a node that can be a starting point of a search. It is based on the premise here that, as an example, the data D3, the data D8, the data D11, and the data D18 are entry point candidates. For example, pieces of data D randomly selected from the group of data D are set as entry point candidates. Note that the method of setting the entry point candidate is not limited to the above.
In the second embodiment, the processor 2 selects an entry point from the entry point candidates in such a manner that the hop count required for the search becomes as small as possible. Note that an entry point candidate is an example of a first object according to the second embodiment.
In the example of
A third comparative example will be described as technology to be compared with the second embodiment. According to the third comparative example, an entry point candidate is selected on the basis of the distance between each of entry point candidates and a query Q.
In the example of
In contrast, according to the second embodiment, the processor 2 utilizes a trained neural network model (neural network model 43a to be described later) for identifying an entry point candidate by which the data D16 closest to the query Q can be identified with the minimum hop count.
A search program 41a and model information 42a are loaded into the DRAM 4.
The processor 2 searches for data closest to the query in accordance with the search program 41a loaded into the DRAM 4.
The model information 42a is information in which the structure of the neural network model 43a is recorded. At the time of a search, the processor 2 performs operation as the neural network model 43a by using the model information 42a loaded into the DRAM 4, thereby estimating an entry point candidate by which the data D16 closest to the query Q can be identified with the minimum hop count.
In the example illustrated in
The input layer includes nodes whose number corresponds to the number of elements constituting the query Q, namely, four nodes. Each of four nodes of the input layer is associated with one of four elements q0 to q3 of the query Q on a one-to-one basis and accepts input of the associated element.
The output layer includes nodes whose number corresponds the number of the entry point candidates, that is, four nodes. Each of the four nodes included in the output layer is associated with one of the four entry point candidates, namely, the data D3, the data D8, the data D11, and the data D18, on a one-to-one basis and outputs a score regarding an entry point candidate associated thereto.
In the second embodiment, a score represents a probability that an entry point candidate corresponding to each node corresponds to the entry point by which the data D16 closest to the query Q can be identified with the minimum hop count. As one example, it is based on the premise that the higher the score is, the higher the probability is that an entry point candidate corresponding to the node that has output the score corresponds to the entry point by which the data D16 closest to the query Q can be identified with the minimum hop count. Note that the relationship between the score and the probability is not limited to the above.
The processor 2 inputs the query Q to the input layer. Then, in each node of the hidden layers and the output layer, the processor 2 multiplies biases and input values from the respective nodes of the preceding layer by weights, applies an activation function to the total sum of the values obtained after the multiplication by the weights, and outputs a value obtained by applying the activation function. The processor 2 acquires the scores from the output layer for the respective clusters and identifies an entry point candidate corresponding to a node to which the highest score is output as the entry point by which the data D16 closest to the query Q can be identified with the minimum hop count.
Hereinafter, the data D3, the data D8, the data D11, and the data D18 as the entry point candidates are referred to as an entry point candidate D3, an entry point candidate D8, an entry point candidate D11, and an entry point candidate D18, respectively. In addition, a score output from a node associated with an entry point candidate DX (X is an integer greater than or equal to 0; X in the example of
Note that the neural network model 43a illustrated in
First, the processor 2 generates a large number of sample queries in a similar manner to that of the first embodiment (S301).
Subsequently, the processor 2 identifies, for each sample query, an entry point candidate by which the data D closest to the sample query Q can be obtained with the minimum hop count (S302). The processor 2 is allowed to take time for accurately obtaining such an entry point candidate.
Subsequent to S302, the processor 2 executes training of the neural network model 43a by using, as training data, a large number of obtained pairs of a sample query and an entry point candidate by which the data D closest to this sample query can be obtained with the minimum hop count (S303). As a result, the weight for each node is determined, and the model information 42a is generated. Then, the training is completed.
Upon receiving a query Q input to the information processing device 1a (S401), the processor 2 inputs the query Q to the neural network model 43a (S402).
The processor 2 selects an entry point candidate having the highest score as an entry point, on the basis of the scores of the respective entry point candidates output from the neural network model 43a (S403). Then, the processor 2 executes the graph-based nearest neighbor search with the group of the data D in the SSD 3 used as a search range and the selected entry point as the starting point. As a result, the processor 2 searches for the data D closest to the query Q in the group of the data D in the SSD 3 (S404).
Specifically, the processor 2 selects the selected entry point as the first candidate data and calculates the metric of the distance between the candidate data and the query Q. Then, the processor 2 performs a hopping along the graph starting from the selected entry point and calculates the metric of the distance between new candidate data identified by the hopping and the query Q. Subsequently, the processor 2 compares the indices of the distances between the candidate data and the query Q before and after the hopping and determines whether or not the distance to the query Q has become shorter by the hopping. The processor 2 searches for the data D closest to the query Q in the group of the data D in the SSD 3 by repeating the hopping, the calculation of the metric of the distance between the candidate data and the query Q, and the comparison of the indices of the distance before and after the hopping.
The processor 2 outputs the data D obtained by the search as a search result (S405). Then, the series of processing of the nearest neighbor search ends.
As described above, according to the second embodiment, the processor 2 receives the input of the query Q and selects one entry point on the basis of the neural network model 43a. T the processor 2 executes the graph-based nearest neighbor search with the group of the data D in the SSD 3 used as a search range and the selected entry point as the starting point. The processor 2 identifies the data D closest to the query Q in the group of the data D in the SSD 3 by the nearest neighbor search.
Therefore, the speed of the query response and the accuracy of the search can be both enhanced.
Moreover, according to the second embodiment, the neural network model 43a is configured, in response to receiving the query Q, to output, for each entry point candidate, a score representing a possibility that the data D closest to the query Q in the group of the data D stored in the SSD 3 can be obtained with the minimum hop count. The processor 2 inputs the query Q to the neural network model 43a and selects, as the entry point, an entry point candidate with the highest possibility that the data D closest to the query Q can be obtained with the minimum hop count, on the basis of the score for each entry point candidate output from the neural network model 43a.
Therefore, the speed of the query response can be enhanced.
Note that, in the above-described second embodiment, the processor 2 selects, as the entry point, the entry point candidate having the minimum hop count to the data D closest to the query Q. The processor 2 may select, as the entry point, any one of entry point candidates by which the data D closest to the query Q can be obtained with a predetermined hop count or less.
Specifically, the neural network model 43a is configured, in response to receiving the query Q, to output, for each entry point candidate, a score representing a possibility that the data D closest to the query Q can be reached with the predetermined hop count or less, in the group of the data D in the SSD 3. The processor 2 inputs the query Q to the neural network model 43a and selects, as the entry point, an entry point candidate having the highest possibility that the data D closest to the query Q can be reached with the minimum hop count or less, on the basis of the score for each entry point candidate output from the neural network model 43a.
In addition, the processor 2 may select, as the entry point, an entry point candidate for obtaining data, which is found by executing the nearest neighbor search with a predetermined hop count, is the closest to the query Q.
Specifically, the neural network model 43a is configured, in response to receiving the query Q, to output, for each entry point candidate, a score representing the possibility that the data D, which is found by a search with the predetermined hop count, is the closest to the query Q in the group of the data D in the SSD 3. The processor 2 inputs the query Q to the neural network model 43a and selects, as the entry point, an entry point candidate having the highest possibility that the data D found as a result of a search with the predetermined hop count is the closest to the query Q, on the basis of the score for each entry point candidate output from the neural network model 43a.
As described in the first embodiment, the modification of the first embodiment, and the second embodiment, a nearest neighbor search method includes receiving input of the query Q and selecting one of the first objects on the basis of the query Q and the neural network model 43 or 43a. Each of the first objects is associated with one or more pieces of data D in the group of pieces of data D stored on the SSD 3. The nearest neighbor search method further includes calculating the metric of the distance between data D and the query from one or more pieces of data D with which the selected one first object is associated. The method further includes identifying data D closest to the query Q in the group of the data D on the basis of the metric of the distance.
Therefore, the speed of the query response and the accuracy of the search can be both enhanced.
While some embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; moreover, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2022-149134 | Sep 2022 | JP | national |