Not applicable.
Not applicable.
Not applicable.
Stores of data are increasing in size at a rapid pace. To utilize these data stores, effective and efficient means of searching the stores and providing basic maintenance to keep the stores up to date and valid may be desirable. In addition, it may be desirable to have the ability to use plain language text to identify pieces of data as opposed to technical details of the data. As a result, a process for searching both the plain language text identifications and technical details to obtain a resulting file may be desirable.
In one embodiment, the disclosure includes an apparatus for processing queries in a heterogeneous index. The apparatus comprises a receiver configured to receive a query from a user, wherein the query comprises at least one desired attribute of a desired file, and a processor coupled to the receiver and configured to search the heterogeneous index. The processor is configured to search the heterogeneous index by receiving the query from the receiver, testing a bloom filter of a storage partition in the heterogeneous index for existence of the desired attribute after receipt of the query, ignoring the storage partition and proceeding to a next storage partition in the heterogeneous index when the bloom filter indicates that the desired attribute is not present in the storage partition, and searching the storage partition to determine which one or more files of the storage partition have the desired attribute when the bloom filter indicates that the desired attribute is present in the storage partition.
In another embodiment, the disclosure includes a method for updating a heterogeneous search index for a storage partition. The method comprises receiving an update message from a user, wherein the update message indicates an operation to be performed on the heterogeneous search index that comprises attributes comprising metadata and tags, recording a log entry indicating receipt of the update message from the user; determining the operation that is to be performed according to the update message, updating the heterogeneous search index according to the update message, and recording a log entry indicating that the update message received from the user was executed successfully.
In yet another embodiment, the disclosure includes a method of recovering from a system failure in a heterogeneous search index. The method comprises entering a plurality of actions to be performed into a log at a time of receipt prior to execution of the actions, wherein the actions to be performed comprise at least two of updating a bloom filter of the heterogeneous search index that indicates an existence of a tag or metadata in the heterogeneous search index, updating a k-dimensional tree of the heterogeneous search index, and updating a key-value store of the heterogeneous search index, and entering the actions performed into the log at a time of completion to indicate successful execution of a first of the actions and a progression to a second of the actions.
These and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.
For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.
It should be understood at the outset that although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.
Disclosed herein is a manner for establishing an index of file attributes that includes both machine-readable metadata and semantic tags. The disclosed embodiments facilitate searching of the index according to queries received from a user. File storage space is divided into a plurality of partitions for storing files and their accompanying attribute indexes for searching. Each partition includes a bloom filter for indicating the existence of a given attribute in the partitions, a k-dimensional tree for indexing fixed categories of metadata, and a plurality of key-value stores that each index one category of tag. Utilizing hash tables that record the presence of a file in a partition, the k-dimensional and key-value store indexes may be updated and maintained according to update messages received from a user. By creating a log of the update messages received from the user and the updates messages that are successfully executed, a log-based recovery process may be established.
Tags 104 illustrate another example of labeling for a network element readable file. In some embodiments, tags 104 may be referred to as human-readable file attributes and comprise semantic details about the network element readable file that are introduced by a user. For a network element readable file that is, for example a movie, tags 104 include, for example, a title, director, list of one or more actors, genre, country of origin, language, release data, length, comments, and/or other semantic details of a like nature. For a network element readable file that is, for example an audio file, tags 104 include, for example, a song name, one or more singer names, an album name, one or more producer names, a track number, and/or other semantic details of a like nature.
When a network element readable file having metadata and/or tags associated with the file is added to a partition 202, the file is added to a hash table within the partition 202 to record the presence of the file in that partition 202. Additionally, the metadata of the file is indexed in the kd-tree index 206 of the partition 202, and the tags of the file are indexed in the kv-stores 208 that correspond to the respective tag category.
Query processor 210 receives a query comprising one or more query attributes from a user. The query attributes may be any combination of metadata and/or tags that identify a network element readable file for which a search is occurring. The query processor 210 parses the query and tests each bloom filter 204 of each partition 202 for the presence of the query attributes. In one embodiment, each partition 202 comprises one bloom filter 204 for each file attribute, for example metadata and/or tag, which is indexed in that partition 202. For example, in a server 200 in which each partition 202 indexes twenty-seven combined metadata and tag file attributes, each partition 202 will comprise twenty-seven bloom filters 204. Generally, where each partition 202 indexes N file attributes, each partition 202 will comprise N bloom filters 204.
Each bloom filter 204 comprises a plurality of bits, where each bit serves as an indicator of the presence of a particular file attribute in the partition 202 in which the bloom filter 204 is located. For example, when a query comprising one or more query attributes is tested against bloom filters 204 by query processor 210, the query attributes are compared to the bits of the bloom filter 204 to determine whether a file having the query attributes is present in the particular partition 202 in which the bloom filters 204 are located. When a query processor 210 receives a positive response from a bloom filter 204 that indicates a high probability of a file having the desired query attributes being present in the partition 202 in which the bloom filter 204 is located, the query processor 210 searches the kd-tree index 206 and kv-stores 208 to identify the files having the desired query attributes and returns those files to the user.
Network element readable files stored in a partition 202 may be deleted from the partition 202, additional network element readable files maybe inserted into the partition 202, and/or existing network element readable files in the partition 202 may be updated with one or more modified metadata fields and/or tags. In an embodiment, update processor 212 receives from a user, a request comprising one or more actions to be performed in a partition 202. As described above, the action may be the insertion of a network element readable file into the partition 202, the deletion of a network element readable file from the partition 202, or the update of metadata or tags in an already existing network element readable file in the partition 202. When an action is taken in the partition 202 by update processor 212, corresponding updates are made to bloom filters 204, kd-tree index 206, and kv-stores 208 to reflect changes in the metadata and/or tags that are present in the partition 202 subsequent to the action being performed by update processor 212.
It is understood that in one embodiment the query processor 210, the update processor 212, and the partitions 202 are co-located on the same device, for example a single network element as described in further detail below. It is also understood that alternative embodiments exist such that the query processor 210, the update processor 212, and the partitions 202 are distributed among a plurality of devices, for example in a cloud computing environment. For example, in one embodiment, the query processor 210 and update processor 212 may be located on a first device and the partitions 202 may be located on a second device, for example a network attached storage device.
When the query processor receives a response from the bloom filters indicating that the desired attributes probably exist in the partition, at step 308 the query processor tests the partition's kd-tree index, for example kd-tree index 206, shown in
When tags matching kv-store keys are found, at step 316 the query processor searches the kv-store indexes to identify the particular network element readable files having the metadata indicated by the query. After searching the kv-store index to identify the particular network element readable files having the tags indicated by the query, or if tags matching kv-store keys are not found at step 310, the query processor determines at step 314 whether attributes from the query were not found in either the kd-tree index at step 308 or the kv-store index at step 310. When attributes from the query were not found in either index, at step 320 the query processor scans all files in the partition to find any that match the query. At step 318, the query processor joins the results of the kd-tree search at step 312, the kv-store index search at step 316, and the scan of all files at step 320 prior to returning the results to the user at step 322.
In an alternative embodiment of process 300, the kv-store is searched prior to the kd-tree, such that one or both of step 310 and step 316 may be performed before one or both of step 308 and step 312. In another alternative embodiment of process 300, the kd-tree is searched prior to the kv-store. In another alternative embodiment of process 300, the kv-store and the kd-tree are searched substantially simultaneously, e.g., on a network element having a plurality of processors and/or a plurality of cores, such that the search of the kv-store and the search of the kd-tree begin and/or end at approximately the same time.
At step 404, the update processor writes a message log. The message log records the contents of the update message, and is maintained for future use or reference, for example, in a backup system as described below. At step 406, the update processor determines what operation is specified by the update message. If the update message indicates that a file is to be inserted into the partition or that an existing file in the partition is to be updated with new metadata and/or tags, at step 408 the update processor determines whether the file is present in the partition's hash table, as described above. If the file is not in the partition's hash table, at step 410 the update processor determines whether the partition has space available for the file or if the partition is full. When the partition is full, at step 412 the update processor creates a new partition and designates that partition as the current partition before updating the hash table at step 414 to indicate that the file has been placed in the newly created partition. After updating the hash table, or if the partition at step 408 was determined to have space available for the file, at step 416 the update processor uses the currently designated partition for further action.
If, at step 408, the file was found in the hash table and therefore will have its metadata and/or tags updated, at step 418 the update processor finds the file in the partition. At step 420, the update processor inserts the metadata and/or tags associated with the file for insertion into the partition determined in steps 416 or 418, and updates the partition's bloom filters, kd-tree, and kv-stores to reflect the new file and its associated metadata and/or tags. At step 422, the update processor writes a commit message indicating that the tasks in the update message that were noted in the message log at step 404 have been completed prior to returning at step 424.
If, at step 406, the update processor determines that the update message indicates that a file is to be deleted from the partition, at step 426 the update processor determines whether the file is present in the partition's hash table, as described above. If the file is not in the partition's hash table, at step 428 the update server notes the file cannot be found and returns at step 424. If the file is found in the hash table, at step 430 the update processor finds the partition in which the file is located. At step 432, the update processor deletes the metadata and/or tags associated with the file for deletion and updates the partition's bloom filters, kd-tree, and kv-stores. At step 434, the update processor writes a commit message indicating that the tasks in the update message that were noted in the message log at step 404 have been completed prior to returning at step 424.
In an embodiment, as discussed in further detail below, the combination of the message log of step 404 and the commit log of steps 422 and 434 is used to implement a system backup. For example, one or more update messages are passed to an index server, for example server 200 in
Cluster manager 504 directs the functions of each cluster of system 504 according to queries received from the query dispatcher 502. For example, after receiving a query from query dispatcher 502, the cluster manager 504 passes the query to the index server 508 for processing according to processes 300 and 400, disclosed above (e.g., searching a file server 510 for the existence of a file having certain metadata and/or tag attributes and/or updating the metadata and/or tag attributes of a file). A plurality of clusters, each comprising an index server 508, is implemented in parallel with each query being transmitted to the cluster manager 504 of each cluster. In one embodiment, a query may be executed by a particularly designated index server 508. In other embodiments, a query may be executed by an available index server 508 that is determined by the query dispatcher 502.
Recovery manager 506 is configured to aid system 500 in recovering from a system failure by utilizing message and commit logs, as described in process 400, shown in
At least some of the features/methods described in this disclosure may be implemented in a network element (NE) 600. For instance, the features/methods of this disclosure may be implemented using hardware, firmware, and/or software installed to run on hardware. The network element may be any device that transports data through a network, e.g., a switch, router, bridge, server, client, etc.
The network element 600 may comprise one or more downstream ports 610 coupled to a transceiver (Tx/Rx) 620, which may be transmitters, receivers, or combinations thereof. The Tx/Rx 620 may transmit and/or receive frames from other network nodes via the downstream ports 610. Similarly, the network element 600 may comprise another Tx/Rx 620 coupled to a plurality of upstream ports 640, wherein the Tx/Rx 620 may transmit and/or receive frames from other nodes via the upstream ports 640. The downstream ports 610 and/or the upstream ports 640 may include electrical and/or optical transmitting and/or receiving components. In another embodiment, the network element 600 may comprise one or more antennas coupled to the Tx/Rx 620. The Tx/Rx 620 may transmit and/or receive data (e.g., packets) from other network elements wirelessly via one or more antennas.
A processor 630 may be coupled to the Tx/Rx 620 and may be configured to process the frames and/or determine to which nodes to send (e.g., transmit) the packets. In an embodiment, the processor 630 may comprise one or more multi-core processors and/or memory modules 650, which may function as data stores, buffers, etc. The processor 630 may be implemented as a general processor or may be part of one or more application specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and/or digital signal processors (DSPs). Although illustrated as a single processor, the processor 630 is not so limited and may comprise multiple processors. The processor 630 may be configured to communicate and/or process multi-destination frames.
The memory module 650 may be used to house the instructions for carrying out the various embodiments described herein. In one embodiment, memory module 650 may comprise an index server query process 660 which may be implemented on processor 630 and configured to search an index of a partition of a data storage device according to process 300, discussed above and shown in
It is understood that by programming and/or loading executable instructions onto the network element 600, at least one of the processor 630 and/or the memory 650 are changed, transforming the network element 600 in part into a particular machine or apparatus, for example, a multi-core forwarding architecture having the novel functionality taught by the present disclosure. It is fundamental to the electrical engineering and software engineering arts that functionality that can be implemented by loading executable software into a computer can be converted to a hardware implementation by well-known design rules known in the art. Decisions between implementing a concept in software versus hardware typically hinge on considerations of stability of the design and number of units to be produced rather than any issues involved in translating from the software domain to the hardware domain. Generally, a design that is still subject to frequent change may be preferred to be implemented in software, because re-spinning a hardware implementation is more expensive than re-spinning a software design. Generally, a design that is stable and will be produced in large volume may be preferred to be implemented in hardware (e.g., in an ASIC) because for large production runs the hardware implementation may be less expensive than software implementations. Often a design may be developed and tested in a software form and then later transformed, by well-known design rules known in the art, to an equivalent hardware implementation in an ASIC that hardwires the instructions of the software. In the same manner as a machine controlled by a new ASIC is a particular machine or apparatus, likewise a computer that has been programmed and/or loaded with executable instructions may be viewed as a particular machine or apparatus.
Any processing of the present disclosure may be implemented by causing a processor (e.g., a general purpose multi-core processor) to execute a computer program. In this case, a computer program product can be provided to a computer or a network device using any type of non-transitory computer readable media. The computer program product may be stored in a non-transitory computer readable medium in the computer or the network device. Non-transitory computer readable media include any type of tangible storage media. Examples of non-transitory computer readable media include magnetic storage media (such as floppy disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (e.g. magneto-optical disks), compact disc read-only memory (CD-ROM), compact disc recordable (CD-R), compact disc rewritable (CD-R/W), digital versatile disc (DVD), Blu-ray (registered trademark) disc (BD), and semiconductor memories (such as mask ROM, programmable ROM (PROM), erasable PROM, flash ROM, and RAM). The computer program product may also be provided to a computer or a network device using any type of transitory computer readable media. Examples of transitory computer readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer readable media can provide the program to a computer via a wired communication line (e.g. electric wires, and optical fibers) or a wireless communication line.
While several embodiments have been provided in the present disclosure, it should be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.
In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.