Various techniques exist that enable a user to search for files in a file system. A typical file lookup technique involves searching the file system directly, which involves physical traversal of the file system tree. This adds the load on the file systems especially when the file system size is large and has a large number of files. Increasing the load on a large file system may result in low Input/Output (I/O) speeds and large access time of the file systems.
Certain examples are described in the following detailed description and in reference to the drawings, in which:
The present disclosure relates to techniques that enable file lookup on a file system by querying a database associated with the file system. The lookup of files in the file system can be based on several different search tokens without physical traversal of the file system tree using an integrated database in a file system. The file system stores the metadata and user defined custom metadata associated with a file in a database such as an Express query database. Using this database, the lookup for files on these file systems is done by querying the database instead of directly searching the file systems. In this way, a file lookup can be accomplished using additional attributes like retention time and custom metadata, which may not be possible using the traditional techniques. In some examples, a file lookup can be implemented on multiple file systems at the same time. Furthermore, the load on the file system is completely removed. In some examples, the file system is a scale-out file system or a cloud computing system. In some examples, the database is a pipelined database.
As illustrated in
In some examples, the device 102 includes a network interface controller (NIC) 118, for connecting the device 102 to a network 120. In some examples, the network 120 may be an enterprise network, which is a large private network of an entity such as a business organization. The network 120 may be configured, for example, as a Storage Area Network (SAN), a Serial Attached Storage (SAS), or other network configuration. The network 120 through a local area network (LAN), a wide-area network (WAN), or another network configuration. The network 120 include a variety of coupled devices that are capable of storing files, such as storage arrays 122, and other client machines 124, which may be similar to computing device 102. Through the network 120, the computing device 102 can access other networks, such as the Internet 126. The computing device 102 may be coupled through the Internet 126 to a cloud computing system 128. The cloud system 128 provides a large pool of compute and storage resources that can be dynamically allocated to client computing systems such as the computing device 102. In some embodiments, the cloud computing system 128 and the network 120 can each include several petabytes of storage space.
The computing device 102, the network 120, the client machines 124, and the cloud computing system 128 may each have their own separate file systems. Some or all of these files systems may be associated with a database that stores information related to files in the corresponding file system. For example, a database 130 coupled to the network 120 can include information regarding files stored in the storage arrays 122 of the network 120. Additionally, a separate database 130 associated with the cloud computing system 128 can include information regarding files stored in the cloud computing system 128. Furthermore, although not shown, separate databases can be maintained for the client computer 102, and each of the client machines 124 coupled to the network 120.
Each database 130 can include an entry for each file in the corresponding file system. Each entry can include any number of file attributes, some of which may correspond to metadata tags associated with the file. For examples, file attributes may include file name, file type, location, creation date, modification date, retention time, expiration time, retention state, tier, user ID, Group ID, custom metadata, and other file attributes. The custom metadata can include any number of custom metadata tags, which may be created to satisfy specific needs of the entity generating or using the files. For example, if the file is a medical record such as an X-ray image, the custom tags could include a patient name, identification of the area being imaged, date that the X-ray was performed, doctor name, and the like. A file lookup operation can be performed by generating a query that uses these file attributes as filtering parameters.
The database can be maintained dynamically. For example, each time a change occurs to the file system, such as deleting, updating, or renaming a file, the corresponding database can be updated to reflect the current state of the file system. In some examples, the database 130 is a pipeline database, such as an Express Query database. In some examples, the database can also be a relational database. The database 130 includes file metadata and custom metadata information, which is continuously being added and updated in response to events that are produced by the file system, such as changes to the files. These file system events are converted to database records that are inserted into the database so as to always maintain a correct mapping of the file information in the database so that the file lookup can produce accurate results.
The computing device 102 can access a number of file systems, including the local file system of the computing device 102, the network's 120 file system, the cloud computing system's 128 file system, and the file systems of other client machines 124. The client computing device 102 can include a file lookup utility 134, which may be included in a file browser interface, for example. The file lookup utility 134 enables a user to perform a file lookup on one or more of the file systems within the system 100. The file lookup can be accomplished by querying the corresponding database 130 instead of traversing the file system tree of the specified file system.
To facilitate the file lookup, each database 130 may be coupled to a corresponding reporting framework 136. The system 100 shows a reporting framework 136 coupled to the database 130 of network 120 and a separate reporting framework 136 coupled to the database 130 of the cloud computing system 128. Any additional file systems in the system 100 may also have a separate reporting framework 136. In some examples, a single combined reporting framework 136 may be used for two or more of the file systems, wherein the combined reporting framework 136 has access to each of the corresponding databases 130. To initiate a file lookup, the client device 102 can provide search inputs to a specified reporting framework 136. The reporting framework 136 queries one or more of the databases 130 in accordance with the search input and returns a search report to the client computer 102.
The reporting framework 136 may include a query generator 202, a database connection driver 204, and report generator 206. The query generator 202 is used to generate a query based on the search criteria received from the client 102. For example, the query generator 202 may generate a Structured Query Language (SQL) query. In some examples, the query is a complex query. As used herein, the term “complex query” refers to a query that includes two or more filtering parameters joined by one or more Boolean operators.
The database connection driver 204 is used to establish a connection to the appropriate file system database 130 and execute the query on the database 130. The report generator 206 generates the search report based on the search results and sends the search report to the client computer 102. In some examples, the report generator 206 can use a reporting tool such as JasperReports to convert the search results into a standard file type such as Portable Document Format (PDF), HyperText Markup Language (HTML), a Spreadsheet, Rich Text Format (RTF), ODT, Comma-separated values (CSV), or Extensible Markup Language (XML), among others. An example of a method for performing a file lookup is explained in more detail below with reference to
To initiate a file lookup, the user can specify various search inputs to be used for the file lookup command. Some or all of the search inputs can be specified by a user through the file lookup utility 134. Additionally, some search inputs some search inputs may also be specified as default values that are preprogrammed into the file lookup utility or configured by an administrator, for example. The search inputs can include one or more file system names on which to execute the lookup, the search criteria used for the lookup, and other search parameters. The search inputs can be used to generate a lookup command file that can be sent to a reporting framework corresponding to the specified file system or file systems. The lookup command file includes the search inputs and can be generated by the file lookup utility. In some examples, the lookup command file is an XML file.
In some examples, the search criteria can include a single search token, such as a filename or folder name, for example. In some examples, the search criteria can include multiple search tokens, which can combined using Boolean operators such as “AND”, “OR”, and parentheses. The search inputs can also include various search parameters used to affect how the search is conducted or how the search results are presented. For example, one search parameter can indicate that the results should be sorted in ascending or descending order based on file name or file size, for example. Another search parameter can indicate whether results are shown on a display such as display 108 or sent to a printer. Another search parameter can indicate a file type for an output file to which the search results are to be exported.
At block 302, the lookup command file, including the search input, is received by the reporting framework. The lookup command file may be processed to obtain the search criteria and other search parameters. If the reporting framework is used for more than one file system, the lookup command file may also have the file system names that the user has specified.
At block 304, a query is generated based on the search criteria. As explained above, the query may be a complex query that includes two or more search tokens, Boolean operators, and parentheses. Generating the query may include obtaining the appropriate Table name to query, generating a “Where” clause from the search inputs, generating a “Group” clause from the group criteria, generating an “OrderBy” clause from the sort criteria parameter, and generating a “Select Statement” query using the table name and above clauses. In some examples, more than one file system is specified and a corresponding number of queries is generated for each of the file system databases.
At block 306, a connection to the database of the specified file system is established and the query is executed on the database. In some examples, if more than one file system is specified in the lookup command file, then a the query is executed on each of the corresponding file system databases.
At block 308, search results are received from the database. The search result may contain the rows and columns of the database that satisfy the search criteria. The search results may also be organized in accordance with the search parameters. In some examples, the rows and columns of the database that satisfy the search criteria is referred to herein as the “ResultSet Object.” The ResultSet Object can be returned form the database to the reporting framework.
At block 310, a search report is generated based on the search results. For example, the search report may generated by the report generator 206 of
At block 312, the reporting framework then sends the generated report back to the client computer 102 that initiated the file lookup. Upon receipt of the report, the client computer 102 may automatically save the report, send the report to a display 108, or print the report, for example. The report can list one or more files that match the search input provided by the user. In some examples, additional information about each file may be obtained from the database and used in the report, such as file size, and any other metadata associated with the file, including custom metadata.
In some example, the file lookup can be performed on multiple file systems. For example, the client machine can send the two or more search command files to two or more file systems, each of which have their own database and search module. In some examples, reports may be generated automatically. For example, reports can be generated according to a specified schedule.
The various software components discussed above may be stored on the tangible, non-transitory, computer-readable medium 400. For example, a region 406 on the computer-readable medium 400 can include a file lookup utility that enables a user to specify search input for a file lookup. A region 408 can include a query generator that generates a complex query based on the search input. A region 410 can include a report generator that generates a report based on the search results returned by the database. Although shown as contiguous blocks, the software components can be stored in any order or configuration. For example, if the tangible, non-transitory, computer-readable medium is a hard drive, the software components can be stored in non-contiguous, or even overlapping, sectors.
While the present techniques may be susceptible to various modifications and alternative forms, the exemplary examples discussed above have been shown only by way of example. It is to be understood that the technique is not intended to be limited to the particular examples disclosed herein. Indeed, the present techniques include all alternatives, modifications, and equivalents falling within the true spirit and scope of the appended claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IN2013/000754 | 12/6/2013 | WO | 00 |