The present invention generally relates to data security and more specifically to a method and system for performing programmatic search on securely stored data.
Methods for programmatic search, namely identification of the presence and location of specific elements within a corpus of information stored in any manner of computer memory, have been extensively developed and refined. One notable limitation of programmatic search is encountered when the corpus of information to be searched is encrypted by conventional methods. In conventional methods for encryption, raw data is converted by means of one or more cryptographic keys and an algorithm that creates an encrypted version of raw data that is typically not the same size as the raw data in memory. If a user or system intends to perform programmatic search on a corpus of encrypted information, the entire corpus must be decrypted to the raw (unencrypted) form of the corpus, prior to algorithmic search. For a large corpus of information in computer memory or distributed solutions, such as web- or cloud-based computing, the process of decryption for search is time- and resource-intensive. For a corpus of information on a computer system, cloud system, electronic communication system, or social media system that is encrypted for privacy purposes, it is not possible to perform functions such as matching advertisements to content because the content is encrypted by conventional methods that prevent the necessary search functions required to make such a match. One solution in common use that allows for algorithmic search of data stored in cloud systems is to store data at rest in unencrypted form, with password-protection applied to user access to the cloud system. This approach suffers from the known security vulnerabilities inherent to any system with data stored in unencrypted form.
There is thus a need in the art for a method and system that provides that provides for efficiently and securely searching of secured information.
In one aspect of the invention, a method of securely storing information is provided, where the method includes accepting information; forming the information into a plurality of packets of uniform size; determining an algorithm to modify packets of uniform size; modifying information by applying the algorithm to each of the packets of the plurality of packets; and storing the modified information. In various embodiments the method: 1) is for searching modified information, and includes accepting a search parameter; using the algorithm as a lookup table to determine the presence and location of the search parameter in the accepted information, and providing the presence and location of the search parameter in the accepted information; 2) encompasses encrypting and decrypting the algorithm; 3) is for searching for files and the method includes accepting a request for a data file, using the algorithm as a lookup table to recover the requested data file from the stored modified one or more data files, and providing the requested one or more data files; and 4) determines an algorithm that includes one or more of a) a first pad of values which are appended to the examples, b) a plurality of perturbation functions applied to each example; and c) an index shuffling applied to each example.
In another aspect of the invention, a system is provided for securely storing information, said system including networked memory and processors programmed to: accept information; form the information into a plurality of packets of uniform size; determine an algorithm to modify packets of uniform size; modify information by applying the algorithm to each of the packets of the plurality of packets; and store the modified information. In various embodiments the processors are further programmed to: 1) search for modified information, including: accept a search parameter; using the algorithm as a lookup table to determine the presence and location of the search parameter in the accepted information, and provide the presence and location of the search parameter in the accepted information; 2) encrypt and decrypt the algorithm; 3) accept a request for a data file, using the algorithm as a lookup table to recover the requested data file from the stored modified one or more data files, and providing the requested one or more data files; and 4) where the algorithm includes one or more of a) a first pad of values which are appended to the examples, b) a plurality of perturbation functions applied to each example; and c) an index shuffling applied to each example.
These features together with the various ancillary provisions and features which will become apparent to those skilled in the art from the following detailed description, are attained by the method and system of the present invention, preferred embodiments thereof being shown with reference to the accompanying drawings, by way of example only, wherein:
Reference symbols are used in the figures to indicate certain components, aspects or features shown therein, with reference symbols common to more than one figure indicating like components, aspects or features shown therein.
The following discussion illustrates various techniques for securely storing data files, performing programmatic search of the data files, and for retrieving securely stored data files. Various aspects of this invention are also described in co-pending U.S. patent application Ser. No. 17/652,743, filed on Feb. 28, 2022, the contents of which are incorporated herein in its entirety.
Securely storing data files may, for example, be accomplished by index shuffling, which rearranges the data in a manner that difficult to deduce from the rearranged data, but which can be efficiently reversed to retrieve or search the secured data file. Thus, for example,
Programmatic search (613) is then performed by means of the inverse of shuffled indices array (612) to yield a search result (614) in the original data. For example, using the values shown in the illustration, a search for sequence ‘14, 86, 21’ in the fixed shuffled packetized data (610) could first determine the locations of the initial value ‘14’ in the fixed shuffled packetized data (610), then convert the locations of these values in the original packetized data (608) using the inverse of shuffled indices lookup table (612), then look ahead for adjacent values ‘86’ and ‘21’ by the same lookup process. By this process, the location of the search sequence ‘14, 86, 21’ in the fixed shuffled packetized data (610) will be determined to start at the 0th (initial) position of the 0th (initial) packet (row) of the original packetized data (608), corresponding to the 0th (initial) position in the original data (606). The process of looking ahead for adjacent values in this example takes into account the relationship of the end of one packet (row) in the unshuffled packetized data (608) with the beginning of the next packet (row) in the unshuffled packetized data (608), determined using the lookup via the inverse of the shuffled indices array (612), so search sequences crossing packet boundaries (for example, ′99, 68, 14′) are identified and the location of the search sequence is be determined in the original data (606).
In various embodiments, the shuffled indices (601) subjected to encryption by standard methods (602) are accompanied prior to encryption by a collection of corresponding data elements representing a fixed padding element and a fixed perturbation array. In various embodiments, a fixed padding element is stored for the original data shape prior to fixed shuffling by the shuffled indices (601) along with data stored to indicate the position of the fixed padding element relative to the original data. In various embodiments, a fixed perturbation array is stored for the data shape prior to fixed shuffling by the shuffled indices (601).
In various embodiments, data are prepared for programmatic search using packetization and fixed shuffling alone; in other embodiments, data are prepared for programmatic search using packetization, fixed padding, and fixed shuffling; in other embodiments, data are prepared for programmatic search using packetization, fixed perturbation, and fixed shuffling; and in other embodiments, data are prepared for programmatic search using packetization, fixed padding, fixed perturbation, and fixed shuffling. In various embodiments, following packetization, the three methods of fixed padding, fixed perturbation, and fixed shuffling are applied in various combinations, and optionally, iterative application of two or more rounds of any of the three methods. In various embodiments, the trailing zeroes shown for purposes of illustration in
On a computer system or cloud service (801), an unencrypted memory storage (802) contains any number of files (803) stored in memory in conventional unencrypted form. As depicted for purposes of graphical illustration by the height of the rectangles representing individual data files (803), the unencrypted files may be of arbitrary size. An initial transformation step is performed on the files (803) in memory storage (802) to subdivide a copy of the files (803) into packetized form (805) in memory storage (804), with packets of uniform size as illustrated graphically (805). The uniform file packets (805) in memory storage (804) are then transformed into data arrays (807) with the shape determined by ([packet size]×[number of packets]) in memory storage (806). In a preferred embodiment, fixed padding data and fixed shuffle indices (808) are generated that correspond to the data shape of the arrays (807), in the fashion described in detail above, and stored temporarily for use in transforming the data arrays (807) in memory storage (806) into fixed padded and fixed shuffled data arrays (809). In addition, the fixed padding and fixed shuffling indices (808) are encrypted (810) using a first key of a key pair, Key 1 (811) to yield encrypted fixed padding and fixed shuffling indices (812). After generation of the encrypted fixed padding and fixed shuffling indices (812), the original fixed padding and fixed shuffling indices (808) are deleted from memory.
When a search query (815) is produced by a credentialed user, a second key of a key pair, Key 2 (814) is used to decrypt (813) the encrypted fixed padding and fixed shuffling indices (812) to yield a decrypted version of the fixed padding and fixed shuffling indices (816). The fixed padding and fixed shuffling indices (816) are then applied to the search query and the result is used to perform a search (817) on the fixed padded and fixed shuffled data arrays (809) to yield a search result in the original data (818) using the method described in detail above. After the search is performed, the decrypted fixed padding and fixed shuffling indices (816) are deleted from memory.
When a file request (819) is produced by a credentialed user, a second key of a key pair, Key 2 (814) is used to decrypt (813) the encrypted fixed padding and fixed shuffling indices (812) to yield a decrypted version of the fixed padding and fixed shuffling indices (816). The file request (819) is then addressed using the decrypted fixed padding and fixed shuffling indices (816) to restore the requested file (820) from the fixed padded and fixed shuffled data arrays (809), to yield an unpadded and unshuffled data array for the requested file (821). The unpadded and unshuffled data array for the requested file (821) is then transformed into the corresponding data packets of fixed size (822), which are then rejoined to yield the original file (823). After restoration of the original file (823), the decrypted fixed padding and fixed shuffling indices (816) are deleted from memory. In alternative embodiments for implementation of secure search and file restoration, fixed perturbation is performed between the fixed padding and fixed shuffling, as described in detail above.
The present invention is not intended to be limited to a system or method which must satisfy one or more of any stated or implied objects or features of the invention. Modifications and substitutions by one of ordinary skill in the art are considered to be within the scope of the present invention. Numerous details are provided to convey an understanding of the embodiments described herein. It will be understood by those of ordinary skill in the art that the embodiments described herein may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the embodiments described herein. The present invention is not limited to the preferred, exemplary, or primary embodiment or embodiments described herein.
In addition, while the invention has been described in terms of a number of different functions or steps, it will be appreciated by those skilled in the art that the order of the functions or steps may be performed in a different order than as described herein, and that certain functions or steps may be combined into to a fewer number or greater number of steps to achieve the same effect as is described herein.
It will also be understood by one of ordinary skill in the art that the systems and methods may be provided on many different types of computer-readable media including computer storage mechanisms that contain instructions for use in execution by a processor to perform the methods' operations and implement the systems described herein. Any unit, component, computer, module, server, terminal, or device described or exemplified herein that executes instructions may include or otherwise have access to computer readable media such as storage media, computer storage media, or data storage devices (removable and/or non-removable, volatile and/or non-volatile) such as, for example, CD-ROM, diskette, RAM, ROM, EEPROM, flash memory, computer hard drive, magnetic disks, optical disks, tape, or other memory technology implemented in any method for storage or transmission of information, such as computer readable instructions, data structures, program modules, or other data. Any such computer storage media may be part of the device or accessible or connectable thereto. Any application or module herein described may be implemented using computer readable and/or executable instructions that may be stored or otherwise held by such computer-readable media.
With respect to the appended claims, unless stated otherwise, the term “first” does not, by itself, require that there also be a “second.” Moreover, reference to only “a first” and “a second” does not exclude additional items. While the particular computer-based systems and methods described herein and described in detail are fully capable of attaining the above-described objects and advantages of the invention, it is to be understood that these are the presently preferred embodiments of the invention and are thus representative of the subject matter which is broadly contemplated by the present invention, that the scope of the present invention fully encompasses other embodiments which may become obvious to those skilled in the art, and that the scope of the present invention is accordingly to be limited by nothing other than the appended claims, in which reference to an element in the singular means “one or more” and not “one and only one,” unless otherwise so recited in the claim.
Although the invention has been described relative to specific embodiments thereof, there are numerous variations and modifications that will be readily apparent to those skilled in the art in light of the teachings presented herein.
The appended drawings are diagrammatic, showing features of the invention and their relation to other features and structures, and are not made to scale.
This application claims the benefit of U.S. Provisional Application No. 63/155,210, filed Mar. 1, 2021, the contents of which is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
6125378 | Barbano | Sep 2000 | A |
6826181 | Higashida | Nov 2004 | B1 |
10713589 | Zarandioon | Jul 2020 | B1 |
10909604 | Zappella | Feb 2021 | B1 |
10929896 | Nath | Feb 2021 | B1 |
10963926 | Nath | Mar 2021 | B1 |
20020147954 | Shea | Oct 2002 | A1 |
20030018608 | Rice | Jan 2003 | A1 |
20050210260 | Venkatesan | Sep 2005 | A1 |
20110194687 | Brothers | Aug 2011 | A1 |
20140025905 | Brown | Jan 2014 | A1 |
20140149480 | Catanzaro | May 2014 | A1 |
20140201450 | Haugen | Jul 2014 | A1 |
20150039851 | Uliel | Feb 2015 | A1 |
20150089574 | Mattsson | Mar 2015 | A1 |
20150186627 | Teuwen | Jul 2015 | A1 |
20160134601 | Finlow-Bates | May 2016 | A1 |
20160328540 | Hoogerbrugge | Nov 2016 | A1 |
20170053135 | Mattsson | Feb 2017 | A1 |
20170353302 | Fernandez | Dec 2017 | A1 |
20190087689 | Chen | Mar 2019 | A1 |
20190294956 | Cheung | Sep 2019 | A1 |
20190318085 | Mathur | Oct 2019 | A1 |
20190319641 | Hwang | Oct 2019 | A1 |
20190325108 | Turek | Oct 2019 | A1 |
20190340381 | Yavuz | Nov 2019 | A1 |
20190363880 | Lee | Nov 2019 | A1 |
20190386819 | Ertl | Dec 2019 | A1 |
20200044852 | Streit | Feb 2020 | A1 |
20200110992 | Hosseinzadeh | Apr 2020 | A1 |
20200272340 | Kwok | Aug 2020 | A1 |
20200401645 | Brown | Dec 2020 | A1 |
20210089877 | Shechtman | Mar 2021 | A1 |
20210149980 | Pavlini | May 2021 | A1 |
20210201126 | Meng | Jul 2021 | A1 |
20210288976 | Huang | Sep 2021 | A1 |
20210319090 | Demir | Oct 2021 | A1 |
20210342541 | Taylor | Nov 2021 | A1 |
20210374229 | Kumar | Dec 2021 | A1 |
20220190956 | Chen | Jun 2022 | A1 |
20220197971 | Sikka | Jun 2022 | A1 |
20220229810 | Yoon | Jul 2022 | A1 |
20220286272 | Wang | Sep 2022 | A1 |
20230403143 | Tsuchida | Dec 2023 | A1 |
Entry |
---|
NPL Search History (Year: 2024). |
International Search Report and Written Opinion of the International Searching Authority for PCT/US 22/70872; Jul. 1, 2022. |
Number | Date | Country | |
---|---|---|---|
20220277098 A1 | Sep 2022 | US |
Number | Date | Country | |
---|---|---|---|
63155210 | Mar 2021 | US |