METHOD AND APPARATUS FOR GENERATING TWO-DIMENSIONAL MATRIX, AND METHOD AND APPARATUS FOR QUERYING KEY VALUE ELEMENT

Information

  • Patent Application
  • 20170170968
  • Publication Number
    20170170968
  • Date Filed
    February 27, 2017
    7 years ago
  • Date Published
    June 15, 2017
    7 years ago
Abstract
A generation method, a query method, and an apparatus of a two-dimensional filter are provided. In this solution, a two-dimensional filter includes a two-dimensional matrix, where the two-dimensional matrix may be linked to multiple key value element groups, and therefore flexibility of the filter is improved. Further, when it is queried whether a key value element is a key value element included in multiple key value element groups, it only needs to perform a query based on the two-dimensional filter, and it does not need to generate a Bloom filter corresponding to each key value element group. In addition, when it is queried whether a key value element is a key value element included in multiple key value element groups, it does not need to perform queries one by one based on each of multiple Bloom filters. Therefore, a problem of low query efficiency currently is further resolved.
Description
TECHNICAL FIELD

The present invention relates to the field of element querying and matching technologies, and in particular, to a method and an apparatus for generating a two-dimensional filter, and a method and an apparatus for querying a key value element.


BACKGROUND

When computer software is being designed, it usually needs to determine whether an element is in a set. For example, in word processing software, it needs to check whether an English word is spelled correctly (that is, it is required to determine whether the English word is in a known dictionary); for another example, in URL (Uniform Resource Locator) filtering software, it is determined whether a URL is in a filtering list. The most direct method is: storing all elements in a set into a computer, and when there appears a new element, directly comparing the new element with the elements in the set. In order to improve a searching speed, a hash table is generally used to store a set. A hash table is a data structure for quickly mapping a storage location of an element according to a key code value of the element, where a mapping function is a so-called hash function. A structure of a hash table is shown in FIG. 1A. First, a hash location of an element in a set is obtained by using a hash function, and then, the element is recorded in a hash linked list at the location. In FIG. 1A, it is assumed that the hash function is HASH, and A1, A2, . . . , and A8 are elements in the set; then, it can be seen from the diagram that HASH (A1)=HASH (A2)=H1, HASH (A3)=HASH (A4)=H2, HASH (A5)=HASH (A6)=H3, and HASH (A7)=HASH (A8)=H4.


An advantage of a hash table is that it can be quickly and accurately determined whether an element is in a set, and a disadvantage is that relatively large storage space is required. In order to reduce storage space, a one-dimensional Bloom filter is put forward by Burton Bloom in 1970, and a principle of the Bloom filter is as follows: a one-dimensional Bloom filter is formed by K hash functions h1, h2, . . . , and hk that are mutually independent, and a bit vector whose length is m. A value range of each hash function is {0, 1, . . . , m−1}, and one byte has eight bits. Therefore, memory space actually occupied by the bit vector is m/8 bytes, and all bits of the bit vector are initialized to 0. Set S={s1, s2, . . . , sn}, a hash sequence (h1 (s), h2 (s), . . . , hk(s)) is calculated for each element in set S by using k hash functions, and then a corresponding hash sequence bit in the bit vector is set to 1; then, it is referred to that data element set S is loaded into the Bloom filter, or the Bloom filter represents data element set S. For example, if h1 (s1)=5, the 6th bit of the bit vector is set to 1; if h2 (s1)=10, the 11th bit of the bit vector is set to 1; and until hk(s1)=n−1, the nth bit of the bit vector is set to 1; then, it is referred to that data element s1 is loaded into the Bloom filter. When all data elements in set S are loaded into the Bloom filter, it is referred to that the Bloom filter represents data element set S. When it is queried whether a data element is in set S, a hash sequence is calculated for the data element by using the same k hash functions. If each bit of a bit vector corresponding to the hash sequence is 1, it is considered that the data element belongs to S; otherwise, it is considered that the data element does not belong to S. Compared with that data is completely stored, the Bloom filter can be used to reduce storage space, and any element that belongs to a set may never be missed if the Bloom filter is used.


In the following, a Bloom filter is briefly described with reference to an example of a junk Email address.


It is assumed that a quantity of junk Email addresses is one hundred million; first, a bit vector whose length is 1.6 billion bits, that is, a vector of two hundred million bytes, is established, and then, all the 1.6 billion binary bits are initialized to 0. For each known junk Email address, eight different hash functions (F1, F2, . . . , and F8) are used to generate eight hash values (f1, f2, . . . , and f8), and locations that are of the bit vector and corresponding to the eight hash values are all set to 1. After all the one hundred million junk Email addresses are processed in such a way, a Bloom filter is generated for these junk Email addresses. As shown in FIG. 1B, FIG. 1B is an exemplary diagram of a junk Email address represented by using the Bloom filter. For a junk Email address XXX@163.com, eight different hash functions (F1, F2, . . . , and F8) are used to generate eight hash values (f1, f2, . . . , and f8), and locations that are of the bit vector and corresponding to the eight hash values are all set to 1; in this case, it is referred to that the junk Email address is loaded into the Bloom filter.


The foregoing generated Bloom filter is for one key value element group, that is, one Bloom filter can be linked to only one key value element group, and a quantity of Bloom filters that need to be generated depends on a quantity of key value element groups. Therefore, there exists a deficiency that flexibility of the Bloom filter is relatively poor.


SUMMARY

Embodiments of the present invention provide a method and an apparatus for generating a two-dimensional filter, and a method and an apparatus for querying a key value element, so as to improve flexibility of a Bloom filter.


Specific technical solutions provided in the embodiments of the present invention are as follows:


According to a first aspect, a method for querying a key value element is provided, including:


determining, from a hash function set, a hash function subset corresponding to each key value element group;


for any key value element in each key value element group, calculating a hash value according to a hash function subset corresponding to a key value element group to which the key value element belongs, and setting an element corresponding to a location that is of the calculated hash value and in a two-dimensional matrix to a second preset identifier;


for a to-be-queried key value element, determining a hash function subset corresponding to a key value element group to which the to-be-queried key value element belongs, and calculating a hash value of the to-be-queried key value element according to the corresponding hash function subset;


acquiring an element corresponding to a location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix; and


when the acquired element is a second preset identifier corresponding to the location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix, determining that the to-be-queried key value element belongs to a key value element set represented by a two-dimensional filter.


With reference to the first aspect, in a first possible implementation manner of the first aspect, hash functions included in hash function subsets respectively corresponding to any two different key value element groups are different; or hash functions included in hash function subsets respectively corresponding to any two different key value element groups are the same, but the hash functions differ in an arrangement manner.


With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the calculating a hash value of the to-be-queried key value element specifically includes:


obtaining a first hash value by performing calculation on a first sub key value element of the to-be-queried key value element based on the hash function subset corresponding to the key value element group to which the to-be-queried key value element belongs; and


obtaining a second hash value by performing calculation on a second sub key value element of the to-be-queried key value element based on the hash function subset corresponding to the key value element group to which the to-be-queried key value element belongs.


With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the acquiring an element corresponding to a location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix specifically includes:


acquiring, from the two-dimensional matrix, an element that uses the first hash value as a row and uses the second hash value as a column; or acquiring, from the two-dimensional matrix, an element that uses the second hash value as a row and uses the first hash value as a column.


According to a second aspect, a method for generating a two-dimensional filter is provided, including:


establishing a two-dimensional matrix that includes at least two row vectors and at least two column vectors;


determining a hash function set, where each hash function in the hash function set is corresponding to at least one key value element group; and obtaining a first hash value by performing hash calculation on a first sub key value element of any key value element in at least one corresponding key value element group by using any hash function in the hash function set, and obtaining a second hash value by performing hash calculation on a second sub key value element of the any key value element, where the first hash value is a positive integer that is less than or equal to a length of the row vectors, and the second hash value is a positive integer that is less than or equal to a length of the column vectors; and


generating a two-dimensional filter that includes the two-dimensional matrix and the hash function set.


With reference to the second aspect, in a first possible implementation manner of the second aspect, both the length of the row vectors and the length of the column vectors are greater than or equal to √{square root over (Sr)}; where


Sr is a quantity of all key value elements included in all key value element groups; or Sr is a quantity of key value elements obtained after all key value elements included in all key value element groups are filtered by using a query condition.


With reference to the second aspect or the first possible implementation manner of the second aspect, in a second possible implementation manner of the second aspect, the first sub key value element includes a key value element formed by all odd bits of the any key value element when the any key value element is represented in binary, and the second sub key value element includes a key value element formed by all even bits of the any key value element when the any key value element is represented in binary; or


the first sub key value element includes a key value element formed by the 1st bit to the Kth bit of the any key value element when the any key value element is represented in binary, and the second sub key value element includes a key value element formed by the (K+1)th bit to the Nth bit of the any key value element when the any key value element is represented in binary, where N is a quantity of bits of the any key value element when the any key value element is represented in binary, 1≦K≦N, and K is a positive integer.


With reference to the second aspect or the first to the second possible implementation manner of the second aspect, in a third possible implementation manner of the second aspect, the method further includes:


initializing an element determined by any row vector and any column vector that are in the two-dimensional matrix to a first preset identifier.


According to a third aspect, an apparatus for querying a key value element is provided, including:


at least one processor;


memory in electronic communication with the processor; and


program code stored in the memory, wherein the program code is executable by the processor to:


determine, from a hash function set, a hash function subset corresponding to each key value element group;


for any key value element in each key value element group, calculate a hash value according to a hash function subset corresponding to a key value element group to which the key value element belongs, and set an element corresponding to a location that is of the calculated hash value and in a two-dimensional matrix to a second preset identifier;


for a to-be-queried key value element, determine a hash function subset corresponding to a key value element group to which the to-be-queried key value element belongs, and calculate a hash value of the to-be-queried key value element according to the corresponding hash function subset;


acquire an element corresponding to a location that is of the hash value of the to-be-queried key value element group and in the two-dimensional matrix; and


when the acquired element is a second preset identifier corresponding to the location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix, determine that the to-be-queried key value element belongs to a key value element set represented by a two-dimensional filter.


With reference to the third aspect, in a first possible implementation manner of the third aspect, hash functions included in hash function subsets that are respectively corresponding to any two different key value element groups and determined by the processor are different; or


hash functions included in hash function subsets that are respectively corresponding to any two different key value element groups and determined by the processor are the same, but the hash functions differ in an arrangement manner.


With reference to the third aspect or the first possible implementation manner of the third aspect, in a second possible implementation manner of the third aspect, the processor is specifically configured to:


obtain a first hash value by performing calculation on a first sub key value element of the to-be-queried key value element based on the hash function subset corresponding to the key value element group to which the to-be-queried key value element belongs; and


obtain a second hash value by performing calculation on a second sub key value element of the to-be-queried key value element based on the hash function subset corresponding to the key value element group to which the to-be-queried key value element belongs.


With reference to the second possible implementation manner of the third aspect, in a third possible implementation manner of the third aspect, the processor is specifically configured to:


acquire, from the two-dimensional matrix, an element that uses the first hash value as a row and uses the second hash value as a column; or acquire, from the two-dimensional matrix, an element that uses the second hash value as a row and uses the first hash value as a column.


According to a fourth aspect, an apparatus for generating a two-dimensional filter is provided, including:


at least one processor;


memory in electronic communication with the processor; and


program code stored in the memory, wherein the program code is executable by the processor to:


establish a two-dimensional matrix that includes at least two row vectors and at least two column vectors;


determine a hash function set, where each hash function in the hash function set is corresponding to at least one key value element group; and obtain a first hash value by performing hash calculation on a first sub key value element of any key value element in at least one corresponding key value element group by using any hash function in the hash function set, and obtain a second hash value by performing hash calculation on a second sub key value element of the any key value element, where the first hash value is a positive integer that is less than or equal to a length of the row vectors, and the second hash value is a positive integer that is less than or equal to a length of the column vectors; and


generate a two-dimensional filter that includes the two-dimensional matrix and the hash function set.


With reference to the fourth aspect, in a first possible implementation manner of the fourth aspect, both the length of the row vectors and the length of the column vectors are greater than or equal to √{square root over (Sr)}, where the row vectors and the column vectors are included in the two-dimensional matrix generated by the processor; where


Sr is a quantity of all key value elements included in all key value element groups; or Sr is a quantity of key value elements obtained after all key value elements included in all key value element groups are filtered by using a query condition.


With reference to the fourth aspect or the first possible implementation manner of the fourth aspect, in a second possible implementation manner of the fourth aspect, the first sub key value element obtained by the processor includes a key value element formed by all odd bits of the any key value element when the any key value element is represented in binary, and the second sub key value element obtained by the processor includes a key value element formed by all even bits of the any key value element when the any key value element is represented in binary; or


the first sub key value element obtained by the processor includes a key value element formed by the 1st bit to the Kth bit of the any key value element when the any key value element is represented in binary, and the second sub key value element obtained by the processor includes a key value element formed by the (K+1)th bit to the Nth bit of the any key value element when the any key value element is represented in binary, where N is a quantity of bits of the any key value element when the any key value element is represented in binary, 1≦K≦N and K is a positive integer.


With reference to the fourth aspect or the first to the second possible implementation manner of the fourth aspect, in a third possible implementation manner of the fourth aspect, the processor is further configured to initialize an element determined by any row vector and any column vector that are in the two-dimensional matrix to a first preset identifier.


In the embodiments of the present invention, a two-dimensional filter includes a two-dimensional matrix, where the two-dimensional matrix may be linked to multiple key value element groups, and therefore flexibility of the filter is improved.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1A is a schematic structural diagram of a hash table in the prior art;



FIG. 1B is an exemplary diagram of a junk Email address represented by using a filter in the prior art;



FIG. 2A is a flowchart of generating a two-dimensional filter according to an embodiment of the present invention;



FIG. 2B is a schematic diagram of a two-dimensional matrix according to an embodiment of the present invention;



FIG. 3 is a flowchart of querying a key value element according to an embodiment of the present invention;



FIG. 4 is a flowchart of generating a two-dimensional filter and querying a key value element according to an embodiment of the present invention;



FIG. 5 is a schematic diagram of a functional structure of an apparatus for generating a two-dimensional filter according to an embodiment of the present invention;



FIG. 6 is a schematic diagram of an entity structure of an apparatus for generating a two-dimensional filter according to an embodiment of the present invention;



FIG. 7 is a schematic diagram of a functional structure of an apparatus for querying a key value element according to an embodiment of the present invention; and



FIG. 8 is a schematic diagram of an entity structure of an apparatus for querying a key value element according to an embodiment of the present invention.





DETAILED DESCRIPTION

To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the following clearly describes the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Apparently, the described embodiments are some but not all of the embodiments of the present invention. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts shall fall within the protection scope of the present invention.


In addition, the terms “system” and “network” may be used interchangeably in this specification. The term “and/or” in this specification describes only an association relationship for describing associated objects and represents that three relationships may exist. For example, A and/or B may represent the following three cases: Only A exists, both A and B exist, and only B exists. In addition, the character “/” in this specification generally indicates an “or” relationship between the associated objects.


In the following, exemplary implementation manners of the present invention are described in detail with reference to the accompanying drawings. It should be understood that the exemplary embodiments described herein are merely used to illustrate and explain the present invention, but are not intended to limit the present invention. In addition, the embodiments of the present application and features in the embodiments may be mutually combined in a case in which they do not conflict with each other.


In the following, the exemplary implementation manners of the present invention are described in detail with reference to the accompanying drawings.


Embodiment 1

Referring to FIG. 2A, in this embodiment of the present invention, a detailed process of generating a two-dimensional filter is as follows:


Step 200: Establish a two-dimensional matrix that includes at least two row vectors and at least two column vectors.


Step 210: Determine a hash function set, where each hash function in the hash function set is corresponding to at least one key value element group; and obtain a first hash value by performing hash calculation on a first sub key value element of any key value element in at least one corresponding key value element group by using any hash function in the hash function set, and obtain a second hash value by performing hash calculation on a second sub key value element of any key value element in the corresponding key value element group, where each first hash value is a positive integer that is less than or equal to a length of the row vectors, and each second hash value is a positive integer that is less than or equal to a length of the column vectors.


Step 220: Generate a two-dimensional filter that includes the two-dimensional matrix and the hash function set.


In this embodiment of the present invention, the two-dimensional matrix is shown in FIG. 2B.


In this embodiment of the present invention, a quantity of storage data units of the two-dimensional matrix is a product of a quantity of row vectors and a quantity of column vectors. As shown in FIG. 2B, in the two-dimensional matrix, the quantity of row vectors is 9, and the quantity of column vectors is 9; then, the quantity of storage data units of the two-dimensional matrix is 81.


In this embodiment of the present invention, if both the length of the row vectors of the established two-dimensional matrix and the length of the column vectors of the established two-dimensional matrix are less than √{square root over (Sr)}, when key value elements in a key value element group are loaded into a filter, there is a high probability that different key value elements are loaded at a same location, thereby affecting query accuracy. Therefore, in order to improve query accuracy, in this embodiment of the present invention, both the length of the row vectors of the established two-dimensional matrix and the length of the column vectors of the established two-dimensional matrix are greater than √{square root over (Sr)} where Sr is a quantity of all key value elements included in all key value element groups; or Sr is a quantity of key value elements obtained after all key value elements included in all key value element groups are filtered by using a query condition.


However, if the length of the row vectors and the length of the column vectors are larger, larger storage space is required. Therefore, in this embodiment of the present invention, in order to improve a utilization rate of storage space, both the length of the row vectors of the established two-dimensional matrix and the length of the column vectors of the established two-dimensional matrix are equal to √{square root over (Sr)}.


In this embodiment of the present invention, a first sub key value element and a second sub key value element may be in various forms. Optionally, the following several forms may be used:


The first sub key value element includes a key value element formed by all odd bits of any key value element when the any key value element is represented in binary, and the second sub key value element includes a key value element formed by all even bits of any key value element when the any key value element is represented in binary.


The key value element formed by all the odd bits may be in decimal, and the key value element formed by all the even bits may be in decimal. Certainly, the key value element formed by all the odd bits and the key value element formed by all the even bits may also be in another number system, and details are not described herein again.


For example, a key value element is 37348, and when represented in binary, 37348 is 1001000111100100; all odd bits are 01011010, and all even bits are 10001100; a decimal number represented by all the odd bits is 90 (a first sub key value element), and a decimal number represented by all the even bits is 140 (a second sub key value element).


Alternatively, the first sub key value element and the second sub key value element may also be in the following form:


The first sub key value element includes a key value element formed by the 1st bit to the Kth bit of any key value element when the any key value element is represented in binary, and the second sub key value element includes a key value element formed by the (K+1)th bit to the Nth bit of any key value element when the any key value element is represented in binary, where N is a quantity of bits of any key value element when the any key value element is represented in binary, and K is a positive integer.


The key value element formed by the 1st bit to the Kth bit may be in decimal, and the key value element formed by the (K+1)th bit to the Nth bit may be in decimal. Certainly, the key value element formed by the 1st bit to the Kth bit and the key value element formed by the (K+1)th bit to the Nth bit may also be in another number system, and details are not described herein again.


For example, a key value element is 37348, and when represented in binary, 37348 is 1001000111100100; the 0th bit to the 7th bit are 10010001, and the 8th bit to the 15thbit are 11100100; a decimal number represented by the 0th bit to the 7th bit is 90 (a first sub key value element), and a decimal number represented by the 8th bit to the 15th bit is 140 (a second sub key value element).


In this embodiment of the present invention, after a two-dimensional matrix that includes a row vector and a column vector is established, the process further includes: initiating an element determined by any row vector and any column vector that are in the two-dimensional matrix to a first preset identifier.


Embodiment 2

Referring to FIG. 3, in this embodiment of the present invention, a detailed process of querying a key value element by using the two-dimensional filter generated in FIG. 2 is as follows:


Step 300: Determine, from a hash function set, a hash function subset corresponding to each key value element group.


Step 310: For any key value element in each key value element group, calculate a hash value according to a hash function subset corresponding to a key value element group to which the key value element belongs, and set an element corresponding to a location that is of the calculated hash value and in a two-dimensional matrix to a second preset identifier.


Step 320: For a to-be-queried key value element, determine a hash function subset corresponding to a key value element group to which the to-be-queried key value element belongs, and calculate a hash value of the to-be-queried key value element according to the corresponding hash function subset.


Step 330: Acquire an element corresponding to a location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix.


Step 340: When the acquired element is a second preset identifier corresponding to the location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix, determine that the to-be-queried key value element belongs to a key value element set represented by the two-dimensional filter.


If hash function subsets corresponding to two different types of key value elements are the same, locations at which the key value elements are loaded into the two-dimensional filter are the same, and then different key value elements are corresponding to a same location in the two-dimensional filter. In this case, query accuracy is relatively low. In order to improve the query accuracy, in this embodiment of the present invention, hash functions included in hash function subsets respectively corresponding to any two different key value element groups are different; or


hash functions included in hash function subsets respectively corresponding to any two different key value element groups are the same, but the hash functions differ in an arrangement manner.


For example, a first key value element group is a sales chart related to regions, and a second key value element group is a sales chart related to months; then, a hash function subset corresponding to the first key value element group is different from a hash function subset corresponding to the second key value element group.


In this embodiment of the present invention, there are multiple manners for calculating the hash value of the to-be-queried key value element. Optionally, the following manner may be used:


obtaining a first hash value by performing calculation on a first sub key value element of the to-be-queried key value element based on the hash function subset corresponding to the key value element group to which the to-be-queried key value element belongs; and


obtaining a second hash value by performing calculation on a second sub key value element of the to-be-queried key value element based on the hash function subset corresponding to the key value element group to which the to-be-queried key value element belongs.


Certainly, the first sub key value element herein may also include a key value element formed by all odd bits of any key value element when the any key value element is represented in binary, and the second sub key value element includes a key value element formed by all even bits of any key value element when the any key value element is represented in binary; or


the first sub key value element includes a key value element formed by the 1st bit to the Kth bit of any key value element when the any key value element is represented in binary, and the second sub key value element includes a key value element formed by the (K+1)th bit to the Nth bit of any key value element when the any key value element is represented in binary, where N is a quantity of bits of any key value element when the any key value element is represented in binary, and K is a positive integer.


A specific representation form of the first sub key value element herein is the same as the representation form of the first sub key value element in Embodiment 1.


In this embodiment of the present invention, there are multiple manners for acquiring the element corresponding to the location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix. Optionally, the following manner may be used:


acquiring, from the two-dimensional matrix, an element that uses the first hash value as a row and uses the second hash value as a column; or acquiring, from the two-dimensional matrix, an element that uses the second hash value as a row and uses the first hash value as a column.


For example, a first sub key value element of any key value element is 90, and a second sub key value element of the any key value element is 140; a hash function subset corresponding to the any key value element is (h1, h2, h3); and first hash values obtained by performing calculation on 90 by using (h1, h2, h3) are respectively 6, 128, and 55, and second hash values obtained by performing calculation on 140 by using (h1, h2, h3) are respectively 0, 101, and 46. Then, locations that are of the any key value element and in the two-dimensional matrix are (6, 0), (6, 101), (6, 46), (128, 0), (128, 101), (128, 46), (55, 0), (55, 101), and (55, 46), and elements corresponding to these locations are all set to second preset identifiers; or locations that are of the any key value element and in the two-dimensional matrix are (0, 6), (101, 6), (46, 4), (0, 128), (101, 128), (46, 128), (0, 55), (101, 55), and (46, 55), and elements corresponding to these locations are all set to second preset identifiers.


In Embodiment 2, when it is queried whether a key value element is a key value element included in multiple key value element groups, it only needs to perform a query based on the two-dimensional filter, and it does not need to generate a Bloom filter corresponding to each key value element group. In addition, when it is queried whether a key value element is a key value element included in multiple key value element groups, it does not need to perform queries one by one based on each of multiple Bloom filters. Therefore, a problem of low query efficiency currently is further resolved.


In Embodiment 1 and Embodiment 2, a two-dimensional matrix is used as an example for description. Certainly, a multidimensional matrix such as a three-dimensional matrix and a four-dimensional matrix may also be used. A process of generating a multidimensional matrix is similar to a process of generating a two-dimensional matrix, and a query process based on a multidimensional matrix is similar to a query process based on a two-dimensional matrix; details are not described herein again.


To better understand this embodiment of the present invention, the following provides a specific application scenario and further describes in detail the process of querying a key value element, as shown in FIG. 4.


Embodiment 3

Step 400: Establish a two-dimensional matrix that includes three row vectors and three column vectors.


Step 410: Determine a hash function set, and generate a two-dimensional filter that includes the two-dimensional matrix and the hash function set.


Each hash function in the hash function set is corresponding to at least one key value element group. A first hash value is obtained by performing hash calculation on a first sub key value element of any key value element in at least one corresponding key value element group by using any hash function in the hash function set, and a second hash value is obtained by performing hash calculation on a second sub key value element of any key value element in the corresponding key value element group, where each first hash value is a positive integer that is less than or equal to a length of the row vectors, and each second hash value is a positive integer that is less than or equal to a length of the column vectors.


In addition, the hash function set determined in this step includes 10 hash functions: h1, h2, h3, h4, h5, h6, h7, h8, h9, and h10.


Step 420: Initialize an element determined by any row vector and any column vector that are in the two-dimensional matrix to a first preset identifier.


Step 430: Determine, from the determined hash function set, hash function subsets respectively corresponding to two key value element groups.


Step 440: For any key value element in the two key value element groups, calculate a hash value according to a hash function subset corresponding to a key value element group to which the key value element belongs, and preset an element corresponding to a location that is of the calculated hash value and in the two-dimensional matrix to a second preset identifier.


Step 450: For a to-be-queried key value element, determine a hash function subset corresponding to a key value element group to which the to-be-queried key value element belongs, and calculate a hash value of the to-be-queried key value element according to the corresponding hash function subset.


Step 460: Acquire an element corresponding to a location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix.


Step 470: Determine whether the acquired element is a second preset identifier corresponding to the location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix, and if the acquired element is the second preset identifier corresponding to the location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix, determine that the to-be-queried key value element belongs to a key value element set represented by the two-dimensional filter; otherwise, determine that the to-be-queried key value element does not belong to a key value element set represented by the two-dimensional filter.


Based on the foregoing technical solutions, and referring to FIG. 5, this embodiment of the present invention provides an apparatus for generating a two-dimensional filter, where the generation apparatus includes an establishing unit 50, a determining unit 51, and a generating unit 52.


The establishing unit 50 is configured to establish a two-dimensional matrix that includes at least two row vectors and at least two column vectors.


The determining unit 51 is configured to determine a hash function set, where each hash function in the hash function set is corresponding to at least one key value element group; and obtain a first hash value by performing hash calculation on a first sub key value element of any key value element in at least one corresponding key value element group by using any hash function in the hash function set, and obtain a second hash value by performing hash calculation on a second sub key value element of any key value element in the corresponding key value element group, where each first hash value is a positive integer that is less than or equal to a length of the row vectors, and each second hash value is a positive integer that is less than or equal to a length of the column vectors.


The generating unit 52 is configured to generate a two-dimensional filter that includes the two-dimensional matrix and the hash function set.


In this embodiment of the present invention, optionally, both the length of the row vectors and the length of the column vectors are greater than or equal to √{square root over (Sr)} where the row vectors and the column vectors are included in the two-dimensional matrix generated by the establishing unit 50.


Sr is a quantity of all key value elements included in all key value element groups; or Sr is a quantity of key value elements obtained after all key value elements included in all key value element groups are filtered by using a query condition.


In this embodiment of the present invention, optionally, the first sub key value element obtained by the determining unit 51 includes a key value element formed by all odd bits of any key value element when the any key value element is represented in binary, and the second sub key value element obtained by the determining unit 51 includes a key value element formed by all even bits of any key value element when the any key value element is represented in binary; or


the first sub key value element obtained by the determining unit 51 includes a key value element formed by the 1st bit to the Kth bit of any key value element when the any key value element is represented in binary, and the second sub key value element obtained by the determining unit 51 includes a key value element formed by the (K+1)th bit to the Nth bit of any key value element when the any key value element is represented in binary, where N is a quantity of bits of any key value element when the any key value element is represented in binary, 1≦K≦N and K is a positive integer.


In this embodiment of the present invention, the apparatus further includes an initializing unit 53, configured to initialize an element determined by any row vector and any column vector that are in the two-dimensional matrix to a first preset identifier.


As shown in FIG. 6, FIG. 6 is an entity apparatus diagram of an apparatus for generating a two-dimensional filter according to the present invention, where the apparatus for generating a two-dimensional filter includes at least one processor 601, a communications bus 602, a memory 603, and at least one communications interface 604.


The communications bus 602 is configured to implement connection and communication among the foregoing components, and the communications interface 604 is configured to connect to and communicate with an external device.


The memory 603 is configured to store program code that needs to be executed, and when executing the program code in the memory 603, the processor 601 implements the following functions:


establishing a two-dimensional matrix that includes at least two row vectors and at least two column vectors;


determining a hash function set, where each hash function in the hash function set is corresponding to at least one key value element group; and obtaining a first hash value by performing hash calculation on a first sub key value element of any key value element in at least one corresponding key value element group by using any hash function in the hash function set, and obtaining a second hash value by performing hash calculation on a second sub key value element of any key value element in the corresponding key value element group, where each first hash value is a positive integer that is less than or equal to a length of the row vectors, and each second hash value is a positive integer that is less than or equal to a length of the column vectors; and


generating a two-dimensional filter that includes the two-dimensional matrix and the hash function set.


Based on the foregoing technical solutions, and referring to FIG. 7, this embodiment of the present invention provides an apparatus for querying a key value element, where the apparatus for querying a key value element includes a determining unit 70, a setting unit 71, a calculating unit 72, an acquiring unit 73, and a querying unit 74.


The determining unit 70 is configured to determine, from a hash function set, a hash function subset corresponding to each key value element group.


The setting unit 71 is configured to: for any key value element in each key value element group, calculate a hash value according to a hash function subset corresponding to a key value element group to which the key value element belongs, and set an element corresponding to a location that is of the calculated hash value and in a two-dimensional matrix to a second preset identifier.


The calculating unit 72 is configured to: for a to-be-queried key value element, determine a hash function subset corresponding to a key value element group to which the to-be-queried key value element belongs, and calculate a hash value of the to-be-queried key value element according to the corresponding hash function subset.


The acquiring unit 73 is configured to acquire an element corresponding to a location that is of the hash value of the to-be-queried key value element group and in the two-dimensional matrix.


The querying unit 74 is configured to: when the acquired element is a second preset identifier corresponding to the location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix, determine that the to-be-queried key value element belongs to a key value element set represented by a two-dimensional filter.


In this embodiment of the present invention, optionally, hash functions included in hash function subsets that are respectively corresponding to any two different key value element groups and determined by the determining unit 70 are different; or


hash functions included in hash function subsets that are respectively corresponding to any two different key value element groups and determined by the determining unit 70 are the same, but the hash functions differ in an arrangement manner.


In this embodiment of the present invention, optionally, the calculating unit 72 is specifically configured to:


obtain a first hash value by performing calculation on a first sub key value element of the to-be-queried key value element based on the hash function subset corresponding to the key value element group to which the to-be-queried key value element belongs; and


obtain a second hash value by performing calculation on a second sub key value element of the to-be-queried key value element based on the hash function subset corresponding to the key value element group to which the to-be-queried key value element belongs.


In this embodiment of the present invention, optionally, the acquiring unit 73 is specifically configured to:


acquire, from the two-dimensional matrix, an element that uses the first hash value as a row and uses the second hash value as a column; or acquire, from the two-dimensional matrix, an element that uses the second hash value as a row and uses the first hash value as a column.


As shown in FIG. 8, FIG. 8 is an entity apparatus diagram of an apparatus for generating a two-dimensional filter according to the present invention, where the apparatus for generating a two-dimensional filter includes at least one processor 801, a communications bus 802, a memory 803, and at least one communications interface 804.


The communications bus 802 is configured to implement connection and communication among the foregoing components, and the communications interface 804 is configured to connect to and communicate with an external device.


The memory 803 is configured to store program code that needs to be executed, and when executing the program code in the memory 803, the processor 801 implements the following functions:


determining, from a hash function set, a hash function subset corresponding to each key value element group;


for any key value element in each key value element group, calculating a hash value according to a hash function subset corresponding to a key value element group to which the key value element belongs, and setting an element corresponding to a location that is of the calculated hash value and in a two-dimensional matrix to a second preset identifier;


for a to-be-queried key value element, determining a hash function subset corresponding to a key value element group to which the to-be-queried key value element belongs, and calculating a hash value of the to-be-queried key value element according to the corresponding hash function subset;


acquiring an element corresponding to a location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix; and


when the acquired element is a second preset identifier corresponding to the location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix, determining that the to-be-queried key value element belongs to a key value element set represented by a two-dimensional filter.


In conclusion, in this embodiment of the present invention, a two-dimensional filter includes a two-dimensional matrix, where the two-dimensional matrix may be linked to multiple key value element groups, and therefore flexibility of the filter is improved.


Further, when it is queried whether a key value element is a key value element included in multiple key value element groups, it only needs to perform a query based on the two-dimensional filter, and it does not need to generate a Bloom filter corresponding to each key value element group. In addition, when it is queried whether a key value element is a key value element included in multiple key value element groups, it does not need to perform queries one by one based on each of multiple Bloom filters. Therefore, a problem of low query efficiency currently is further resolved.


The present invention is described with reference to the flowcharts and/or block diagrams of the method, the device (system), and the computer program product according to the embodiments of the present invention. It should be understood that computer program instructions may be used to implement each process and/or each block in the flowcharts and/or the block diagrams and a combination of a process and/or a block in the flowcharts and/or the block diagrams. These computer program instructions may be provided for a general-purpose computer, a dedicated computer, an embedded processor, or a processor of any other programmable data processing device to generate a machine, so that the instructions executed by a computer or a processor of any other programmable data processing device generate an apparatus for implementing a specific function in one or more processes in the flowcharts and/or in one or more blocks in the block diagrams.


These computer program instructions may also be stored in a computer readable memory that can instruct the computer or any other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory generate an artifact that includes an instruction apparatus. The instruction apparatus implements a specific function in one or more processes in the flowcharts and/or in one or more blocks in the block diagrams.


These computer program instructions may also be loaded onto a computer or another programmable data processing device, so that a series of operations and steps are performed on the computer or the another programmable device, thereby generating computer-implemented processing. Therefore, the instructions executed on the computer or another programmable device provide steps for implementing a specific function in one or more processes in the flowcharts and/or in one or more blocks in the block diagrams.


Although some exemplary embodiments of the present invention have been described, persons skilled in the art can make changes and modifications to these embodiments once they learn the basic inventive concept. Therefore, the following claims are intended to be construed as to cover the exemplary embodiments and all changes and modifications falling within the scope of the present invention.


Obviously, persons skilled in the art can make various modifications and variations to the embodiments of the present invention without departing from the scope of the embodiments of the present invention. The present invention is intended to cover these modifications and variations provided that they fall within the scope of protection defined by the following claims and their equivalent technologies.

Claims
  • 1. A method for querying a key value element, the method comprising: determining, from a hash function set, a hash function subset corresponding to each key value element group;for any key value element in each key value element group, calculating a hash value according to a hash function subset corresponding to a key value element group to which the key value element belongs, and setting an element corresponding to a location that is of the calculated hash value and in a two-dimensional matrix to a second preset identifier;for a to-be-queried key value element, determining a hash function subset corresponding to a key value element group to which the to-be-queried key value element belongs, and calculating a hash value of the to-be-queried key value element according to the corresponding hash function subset;acquiring an element corresponding to a location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix; andwhen the acquired element is a second preset identifier corresponding to the location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix, determining that the to-be-queried key value element belongs to a key value element set represented by a two-dimensional filter.
  • 2. The method according to claim 1, wherein: hash functions comprised in hash function subsets respectively corresponding to any two different key value element groups are different; orhash functions comprised in hash function subsets respectively corresponding to any two different key value element groups are the same, but the hash functions differ in an arrangement manner.
  • 3. The method according to claim 1, wherein calculating a hash value of the to-be-queried key value element comprises: obtaining a first hash value by performing calculation on a first sub key value element of the to-be-queried key value element based on the hash function subset corresponding to the key value element group to which the to-be-queried key value element belongs; andobtaining a second hash value by performing calculation on a second sub key value element of the to-be-queried key value element based on the hash function subset corresponding to the key value element group to which the to-be-queried key value element belongs.
  • 4. The method according to claim 3, wherein acquiring an element corresponding to a location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix comprises: acquiring, from the two-dimensional matrix, an element that uses the first hash value as a row and uses the second hash value as a column; or acquiring, from the two-dimensional matrix, an element that uses the second hash value as a row and uses the first hash value as a column.
  • 5. A method for generating a two-dimensional filter, the method comprising: establishing a two-dimensional matrix comprising at least two row vectors and at least two column vectors;determining a hash function set, wherein each hash function in the hash function set corresponds to at least one key value element group;obtaining a first hash value by performing hash calculation on a first sub key value element of any key value element in at least one corresponding key value element group by using any hash function in the hash function set, and obtaining a second hash value by performing hash calculation on a second sub key value element of the any key value element, wherein the first hash value is a positive integer that is less than or equal to a length of the row vectors, and the second hash value is a positive integer that is less than or equal to a length of the column vectors; andgenerating a two-dimensional filter comprising the two-dimensional matrix and the hash function set.
  • 6. The method according to claim 5, wherein both the length of the row vectors and the length of the column vectors are greater than or equal to √{square root over (Sr)}, wherein Sr is a quantity of all key value elements comprised in all key value element groups, or Sr is a quantity of key value elements obtained after all key value elements comprised in all key value element groups are filtered by using a query condition.
  • 7. The method according to claim 5, wherein: the first sub key value element comprises a key value element formed by all odd bits of the any key value element when the any key value element is represented in binary, and the second sub key value element comprises a key value element formed by all even bits of the any key value element when the any key value element is represented in binary; orthe first sub key value element comprises a key value element formed by the 1st bit to the Kth bit of the any key value element when the any key value element is represented in binary, and the second sub key value element comprises a key value element formed by the (K+1)th bit to the Nth bit of the any key value element when the any key value element is represented in binary, wherein N is a quantity of bits of the any key value element when the any key value element is represented in binary, 1≦K≦N, and K is a positive integer.
  • 8. The method according to claim 5, further comprising: initializing an element determined by any row vector and any column vector that are in the two-dimensional matrix to a first preset identifier.
  • 9. An apparatus for querying a key value element, the apparatus comprising: at least one processor;memory in electronic communication with the processor; andprogram code stored in the memory which, when executed by the processor, cause the processor to: determine, from a hash function set, a hash function subset corresponding to each key value element group,for any key value element in each key value element group, calculate a hash value according to a hash function subset corresponding to a key value element group to which the key value element belongs, and set an element corresponding to a location that is of the calculated hash value and in a two-dimensional matrix to a second preset identifier,for a to-be-queried key value element, determine a hash function subset corresponding to a key value element group to which the to-be-queried key value element belongs, and calculate a hash value of the to-be-queried key value element according to the corresponding hash function subset,acquire an element corresponding to a location that is of the hash value of the to-be-queried key value element group and in the two-dimensional matrix, andwhen the acquired element is a second preset identifier corresponding to the location that is of the hash value of the to-be-queried key value element and in the two-dimensional matrix, determine that the to-be-queried key value element belongs to a key value element set represented by a two-dimensional filter.
  • 10. The apparatus according to claim 9, wherein: hash functions comprised in hash function subsets that are respectively corresponding to any two different key value element groups and determined by the processor are different; orhash functions comprised in hash function subsets that are respectively corresponding to any two different key value element groups and determined by the processor are the same, but the hash functions differ in an arrangement manner.
  • 11. The apparatus according to claim 9, wherein the processor is configured to: obtain a first hash value by performing calculation on a first sub key value element of the to-be-queried key value element based on the hash function subset corresponding to the key value element group to which the to-be-queried key value element belongs; andobtain a second hash value by performing calculation on a second sub key value element of the to-be-queried key value element based on the hash function subset corresponding to the key value element group to which the to-be-queried key value element belongs.
  • 12. The apparatus according to claim 11, wherein the processor is configured to: acquire, from the two-dimensional matrix, an element that uses the first hash value as a row and uses the second hash value as a column; oracquire, from the two-dimensional matrix, an element that uses the second hash value as a row and uses the first hash value as a column.
  • 13. An apparatus for generating a two-dimensional filter, the apparatus comprising: at least one processor;memory in electronic communication with the processor; andprogram code stored in the memory which, when executed by the processor, cause the processor to: establish a two-dimensional matrix that comprises at least two row vectors and at least two column vectors,determine a hash function set, wherein each hash function in the hash function set corresponds to at least one key value element group,obtain a first hash value by performing hash calculation on a first sub key value element of any key value element in at least one corresponding key value element group by using any hash function in the hash function set, and obtain a second hash value by performing hash calculation on a second sub key value element of the any key value element, wherein the first hash value is a positive integer that is less than or equal to a length of the row vectors, and the second hash value is a positive integer that is less than or equal to a length of the column vectors, andgenerate a two-dimensional filter that comprises the two-dimensional matrix and the hash function set.
  • 14. The apparatus according to claim 13, wherein both the length of the row vectors and the length of the column vectors are greater than or equal to √{square root over (Sr)}, wherein the row vectors and the column vectors are comprised in the two-dimensional matrix generated by the processor, and wherein Sr is a quantity of all key value elements comprised in all key value element groups, or Sr is a quantity of key value elements obtained after all key value elements comprised in all key value element groups are filtered by using a query condition.
  • 15. The apparatus according to claim 13, wherein: the first sub key value element obtained by the processor comprises a key value element formed by all odd bits of the any key value element when the any key value element is represented in binary, and the second sub key value element obtained by the processor comprises a key value element formed by all even bits of the any key value element when the any key value element is represented in binary; orthe first sub key value element obtained by the processor comprises a key value element formed by the 1st bit to the Kth bit of the any key value element when the any key value element is represented in binary, and the second sub key value element obtained by the processor comprises a key value element formed by the (K+1)th bit to the Nth bit of the any key value element when the any key value element is represented in binary, wherein N is a quantity of bits of the any key value element when the any key value element is represented in binary, 1≦K≦N and K is a positive integer.
  • 16. The apparatus according to claim 13, the processor is further configured to initialize an element determined by any row vector and any column vector that are in the two-dimensional matrix to a first preset identifier.
Priority Claims (1)
Number Date Country Kind
201410431085.9 Aug 2014 CN national
CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of International Application No. PCT/CN2015/072915, filed on Feb. 12, 2015, which claims priority to Chinese Patent Application No. 201410431085.9, filed on Aug. 28, 2014. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties.

Continuations (1)
Number Date Country
Parent PCT/CN2015/072915 Feb 2015 US
Child 15443997 US