At least some embodiments disclosed herein relate to protocol processing for incoming data using one or more systolic arrays in general (e.g., for processing the incoming data prior to sending to secure data storage).
An encryption appliance can be inserted in-line into an existing network and used to encrypt full duplex data at, for example, multigigabit line speeds. This can reduce performance penalties imposed by, for example, encryption software running on a general-purpose server, and can allow encryption of data in flight to storage at local or remote locations.
Data can be stored remotely using various protocols. One example of a storage protocol is an Internet Small Computer Systems Interface (iSCSI), which is an Internet Protocol (IP)-based storage networking standard for linking data storage facilities. For example, iSCSI can be used to transmit data over local area networks (LANs), wide area networks (WANs), or the Internet and can enable location-independent data storage and retrieval. The iSCSI protocol can allow client devices to send, for example, SCSI commands to storage devices on remote servers. In one example, iSCSI is used as a storage area network protocol, allowing consolidation of storage into storage arrays for clients (e.g., database and web servers).
In some cases, data to be stored remotely requires Transmission Control Protocol (TCP) processing prior to being stored. The Transmission Control Protocol provides, for example, a communication service between an application and the Internet Protocol (e.g., host-to-host connectivity at the transport layer of the Internet model). An application does not need to know the particular mechanisms for sending data via a link to another host. At the transport layer, TCP handles handshaking and transmission details and presents an abstraction of the network connection to the application (e.g., through a network socket interface).
At lower levels of the protocol stack, due to network congestion, traffic load balancing, or other unpredictable network behavior, IP packets may be lost, duplicated, or delivered out of order. TCP detects these problems, requests re-transmission of lost data, and rearranges out-of-order data. If the data still remains undelivered, the source is notified of this failure. Once a TCP receiver has reassembled a sequence of data originally transmitted, it passes the data to the receiving application.
TCP is, for example, used by many applications available by internet, including the World Wide Web (WWW), e-mail, File Transfer Protocol, Secure Shell, peer-to-peer file sharing, and streaming media applications. TCP is, for example, designed for accurate delivery rather than timely delivery and can incur relatively long delays while waiting for out-of-order messages or re-transmissions of lost messages. TCP is, for example, a stream delivery service which guarantees that all bytes received will be identical to bytes sent and will be in the correct order.
In some cases, hardware implementations known as TCP offload engines (TOEs) are used to handle TCP processing. One problem of TOEs is that they are difficult to integrate into computing systems, sometimes requiring extensive changes in the operating system of the computer or device.
Systems and methods for protocol processing for incoming data (e.g., packets in an incoming data stream) are described herein. Some embodiments are summarized in this section.
The present embodiments include the realization that data in transit is wrapped in layers of additional protocol information. For instance, Ethernet, Fibre Channel, Network File System (NFS), SATAm and iSCSI all have a protocol layer around the data payload. This additional information needs to be stripped off and processed for a receiving system (e.g., an encryption appliance) to understand how to handle the underlying data stream.
Typically, there are many types of protocols to parse and they must be parsed quickly on an incoming data stream. Traditionally, protocol parsing has been done as a serial process on a central processing unit (CPU). The traditional approach is not fast enough for systems requiring modern network line speed data processing.
In contrast to the above, the protocol processing according to various embodiments described herein provides a technical solution to the technical problems caused by prior approaches as described above. In various embodiments, a packet input engine is configured as a protocol parser in a systolic array. The systolic array is, for example, a two-dimensional array of processing units that can take in data, fan out the data to multiple parallel processing units, and move the data through a pipeline of parallel processing units on every clock tick. This pipelined architecture enables processing multiple packets simultaneously, providing much higher throughput.
Each protocol has different characteristics, so the protocol parser is designed for a target protocol. In various embodiments, a systolic protocol parser is built with an array of protocol field decoders and interconnects to other decoders. These decoders identify where in the data stream there is data that needs to be processed, and further what type of processing needs to be done for the target protocol.
In one embodiment, systolic array components including protocol parsing can be compiled into a field programmable gate array (FPGA). In this embodiment, the systolic array parsing fans out decision making during protocol parsing enabling faster decisions based on the information parsed out of the data stream. This embodiment also allows generating the required FPGA logic based on the targeted protocol.
In various embodiments, the protocol processing that is performed is based on one or more protocol fields corresponding to portions of data in an incoming data stream. A protocol field can include, for example, a command (e.g., an op-code), an offset, or a byte count. In some cases, a protocol field includes, for example, an identification of a specific protocol. The protocol processing is performed in one or more systolic arrays. The systolic arrays can be implemented, for example, in one or more field-programmable gate arrays (FPGAs). The path of processing through the systolic array for each data portion is dependent on the one or more protocol fields. In various embodiments, different data portions are processed in a parallel processing pipeline. Timing for the pipeline is provided by a system clock, and each data portion advances from one stage to another in the pipeline on each tick of the system clock.
In one embodiment, a method includes: receiving a data stream comprising data portions; parsing, in a systolic array, at least one protocol field for each respective data portion; and based on the parsed at least one protocol field for each respective data portion, performing at least one of encrypting the data portion, decrypting the data portion, or guarding the data portion.
In one embodiment, a method includes: receiving a data stream; separating the data stream into a plurality of data portions; providing the data portions as one or more inputs to a plurality of processing units in at least one systolic array; providing at least one output from the at least one systolic array; and identifying, based on the at least one output, at least one portion of the data stream for processing.
In one embodiment, a system includes: a packet engine including at least one first field-programmable gate array (FPGA), the packet engine configured to receive incoming packets, and the packet engine comprising at least one systolic array configured to parse a protocol field for each incoming packet, and further configured to process, route, or switch each packet for at least one of encryption or authentication processing based on a result from parsing the protocol field for the respective packet; and a cryptographic engine including at least one second FPGA, the cryptographic engine configured to receive each packet from the packet engine, and the cryptographic engine comprising at least one systolic array configured to perform at least one of encryption or authentication processing for the packet.
In one embodiment, a system includes: at least one processor or FPGA; and memory storing instructions configured to instruct or program the at least one processor or FPGA to: receive, from a local device, a data stream comprising data portions; parse, using a systolic array, at least one protocol field for each respective data portion; encrypt, based on the respective parsed at least one protocol field, each data portion to provide encrypted data; and send the encrypted data to a remote storage device.
In some cases, data coming from a network to a device (such as data to be stored in a remote storage device) may require protocol processing prior to cryptographic processing (e.g., encryption). In one embodiment, protocol processing is performed to keep up with high-speed network data rates such as 100 Gigabit per second networks. Various embodiments herein may perform protocol processing to network line speed rates by using one or more systolic arrays (e.g., programmed into one or more field-programmable gate arrays (FPGAs)). In one embodiment, the protocol processing is performed by a processor configured to include a systolic array of processing units. For example, the processor may be included in an encryption appliance.
In some embodiments, a systolic array uses multiple in-line single-function processing units to process incoming data in a pipelined manner. Data passing through each processing unit can be spread out to dedicated hardware resources with each processing unit having all the resources needed to process and move data to the next pipeline step on each clock tick. For example, a protocol processor processes incoming data packets to identify a protocol associated with each packet. In one example, after protocol processing, the data is assembled into the correct blocks required by the underlying data type (e.g. Network File System (NFS) or Amazon Web Services (AWS)). In one example, when data is finished protocol processing, it is sent on to a next systolic processing unit such as an encryption engine.
The disclosure includes methods and apparatuses which perform these methods, including data processing systems which perform these methods, and computer readable media containing instructions which when executed on data processing systems cause the systems to perform these methods.
Other features will be apparent from the accompanying drawings and from the detailed description which follows.
The embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements.
The following description and drawings are illustrative and are not to be construed as limiting. Numerous specific details are described to provide a thorough understanding. However, in certain instances, well known or conventional details are not described in order to avoid obscuring the description. References to “one embodiment” or “an embodiment” in the present disclosure are not necessarily references to the same embodiment; and, such references mean at least one.
Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not other embodiments.
Data encryptors (e.g., an encryption appliance) can perform in-line encryption for data storage protocols between clients and remote storage devices. The present embodiments include the realization that, in a typical software solution implemented with general purpose CPUs, an input data stream is handled serially byte-by-byte. For example, a CPU may take one word at a time and take several clocks to perform processing of the word. This serial processing is slower than what is needed for modern high-speed networks.
In many cases, data in transit is wrapped in layers of additional information that identify how the data is communicated while in transit, which blocks of data belong together, the format of data to be stored, etc. This additional information needs to be stripped off and processed for a receiving system (e.g., an encryption appliance) to understand how to handle the underlying data stream. Data-at-rest protocols communicate using a series of protocol fields (e.g., commands such as op-codes and/or other fields that provide information).
The protocol fields differentiate types of information such as data, meta-data, and overhead, and/or define how bits in an input data stream should be interpreted. For example, commands can be either fixed or variable length. An incoming data stream is parsed to identify each command, determine the command length, identify where in the stream the data begins and ends, and find where the next command starts. For example, a protocol field may identify the packet is of type NFS and all data between two byte offsets needs to be encrypted. One significant problem is that there are many types of protocols to parse and the incoming data stream needs to be parsed quickly so the data can be extracted or separated out for further processing (e.g., encryption, etc.).
Many protocols, such as Network File System (NFS), were designed to be processed by central processing units (CPU) which process data in a serial manner. To process a data stream faster, a CPU must run at a higher clock rate. CPU clock rates generally stopped improving years ago. Thus, a new parallelizable approach is needed to be able to run protocol parsing in real-time and at modern network line speeds.
Various embodiments described herein provide a technical solution to the above technical problems. When using a systolic array according to these various embodiments, several processing units look at several bytes in the input data stream at a time (e.g., on each clock tick). The systolic array as described herein provides a technical solution to the technical problems caused by prior approaches as described above. Various embodiments herein use a high throughput systolic protocol processor to decompose the network data stream to discover which data portions need to be encrypted or decrypted.
In various embodiments, each processing unit of the systolic array is programmed to generate the correct results for the next column of the systolic array, and the results are propagated across time so that multiple bytes from the input data stream can be processed simultaneously. The systolic array fans out the data from the input data stream across multiple rows, and on each clock cycle the row outputs generate inputs for the next column in the systolic array. The first column is then free to take a new set of words (e.g., from the input data stream) on the next clock while the next row is processing the previous set of words.
In one embodiment, the systolic array protocol parser has a pipelined architecture such that multiple packets can be processed simultaneously, providing much higher throughput. This enables high-speed “bump in the wire” data processing for systems requiring high speed such as a data-at-rest encryptor.
The protocol parsing provides the ability to generate the FPGA logic needed based on the protocol targeted in a way so that decision making is fanned out during protocol parsing. In one embodiment, the FPGA logic is generated by enumerating all expected combinations of sets of protocol fields for protocols supported by an encryption appliance. This logic is then broken into appropriately sized systolic array processing units and interconnected according to the enumeration. In operation, the FPGA logic compares the incoming data against the enumeration of all expected sets of protocol fields (e.g., all possible sets for data expected in the incoming data stream). If a set of protocol fields matches a particular set of protocol fields, then a protocol specific operation is performed, and the result is sent to the next processing unit in the systolic array.
In one embodiment, a packet input engine is configured as a protocol parser in a systolic array. A systolic array is, for example, a two-dimensional array of processing units that can take in data, fan out the data to multiple parallel processing units, and push the data through a pipeline of parallel processing units on every clock tick. This creates a two-dimensional pipeline assembled in rows and columns that spreads processing out over space and time with each stage doing a different piece of the processing for a continuous stream of incoming data. This systolic array approach allows for a pipelined architecture such that multiple packets can be processed simultaneously, providing much higher throughput. This enables high-speed “bump in the wire” data processing for systems such as a high-speed data-at-rest encryptor. It also allows for much faster implementation of new protocols.
Computer systems use various protocols operating at different levels within a software stack to transmit and store data. For example, at the link layer the Ethernet Media Access Control (MAC) protocol may be used. The Internet Protocol (IP) layer may use Internet Protocol version 4 (IPv4) or Internet Protocol Version 6 (IPv6). The Transport layer may use Transport Control Protocol (TCP). The storage protocol may be implemented using Network File System (NFS) or a cloud storage protocol like Amazon Web Service (AWS) S3. These are all standard protocols defined for computing systems. A systolic protocol parser can be designed to understand and parse various different protocols, such as the foregoing and/or others.
In one embodiment, a systolic array protocol parser includes one or more protocol decoding units programmed to understand protocols. The protocol decoding unit is used to decode the protocol on an incoming stream, for example an AWS S3 or NFS storage protocol. The protocol itself will determine which systolic array processing units are implemented as instances in the array, and further how these units are connected.
In one exemplary implementation for secure data storage, a user system writes out data blocks to a remote storage device, and an encryption appliance performs protocol processing as described herein on the incoming data blocks, and then stores the data associated with the writing of the data blocks (e.g., an identifier can be stored for each data block). When the data blocks are later retrieved from the remote storage device, the encryption appliance can access the stored data (e.g., access the stored data block identifiers). In one example, the encryption appliance can use protocol processing by parsing protocol fields as described in the various embodiments below.
In one embodiment, each data portion (e.g., a data block) from the input data stream 102 is provided as an input to a processing unit in the systolic array. Several of the processing units (e.g., protocol decoding units 104 or 106) are used to perform decoding of one or more protocol fields in the data portion. In one embodiment, protocol fields are parsed and decoded one at a time as data moves through the systolic array (e.g., a first protocol field for a first data wrapper is parsed, which determines routing of the data through the systolic array to a next processing unit, which parses a second protocol field exposed by removing the first data wrapper).
After each data portion undergoes protocol processing from propagating through the systolic array, one or more outputs 114, 116 are provided from the systolic array. In one embodiment, the outputs 114 and/or 116 are used to indicate one or more portions of the data stream for further processing. For example, the outputs 114 or 116 can indicate starting and ending bytes of data to be encrypted.
In one embodiment, the outputs from the systolic array are provided to data manipulation engine 108. Data manipulation 108 can perform further processing of data from the input data stream 102 (e.g., prior to encryption). After any further processing, data is provided from data manipulation engine 108 to one of the cryptographic engines 110-112.
In one embodiment, the systolic array is configured as a systolic protocol parser that separates out the data portion of the input data stream 102 from the protocol headers. In one example, this is performed by a systolic protocol parser that uses an array of protocol field partial decoders (e.g., protocol decoding units 104, 106) and interconnects to other similar decoders. These partial protocol field decoders can be created from an enumeration of all possible sets of supported protocol fields and an identification where in the data stream there is data that needs to be further processed, for example encrypted/decrypted, and what type of processing needs to be done for that protocol.
When input data comes in, the input data is broken into several different parts (e.g., first byte, second byte, third byte, etc.) and fanned across multiple decoding units. For example, the output from one processing unit can move to at least two other processing units based on results from the first processing unit (e.g., a result from decoding a first protocol field, or from matching a combination of protocol fields for incoming data to one of many possible sets of protocol fields). Each decoder processes as much information as it can within a clock tick (e.g., of a system clock of the array) and then passes the partial results on to the next stages of the systolic array. The initial column does processing in one clock, then the second column processes in one clock and sends the output to the next column, etc. Each column is like a parallel pipeline. Each column performs the extent of processing as the underlying FPGA technology allows in one clock tick. For example, some processes may take more than one clock, but most processing for each stage is typically done in one clock. The total amount of work that needs to be done is fixed by the protocol that is being implemented. The work is spread across the number of columns required by the protocol. Slower FPGA technologies will require more columns or more clocks per column. A significant advantage of the present embodiments is provided by systolic array processing with splitting of the input into parallel processing streams.
In one embodiment, all data for an incoming stream (e.g., input data stream 102) comes in one end of the systolic array and results in a decision (e.g., output 114 or 116). However, there are many paths through the logic of the systolic array. Pipeline pieces may feed different follow-on units. The route an incoming data stream takes is determined by the algorithm and logic results for that data stream. For example, a processing unit in the array may detect a current state or the presence of some text string, which may drive the data to one or more follow-on array processing units for further protocol parsing or data processing. A systolic array may also include processing units (e.g., delay unit 118) that are primarily for, or do nothing other than, delay the input data stream 102 so that it arrives at a data manipulation engine 108 (e.g., a TCP processing unit) at the same time as the protocol decode results.
In one embodiment, when an input data stream is received by the array, the parser is configured with the first protocol field, and all the other possible protocols that may show up after the first field. For example, an Ethernet connection always starts with an Ethernet header that contains an Ethernet type field. The Ethernet type field indicates what type of data is next, for instance IPv4 or ARP. So, once the Ethernet type field is understood, the data can be sent to an appropriate downstream processing unit in the systolic array.
In another example, when processing an NFS stream, some data packets may have commands to create a file, set a timestamp, or return information that is stored in a directory. The protocol parser simply passes those commands on through since they do not contain any file content. If the protocol parser detects an operation command to write a file, for example, that is a flag that the data will need further processing (e.g., to start encrypting at a specified byte in the data stream, end encrypting at a specified byte, and encrypt everything in between the start and end points).
In one embodiment, to set the flags that indicate where to start and end processing (e.g., encryption when writing a file for secure storage), the protocol parser monitors the stream as it is moving through the array. The stream is parsed and key information identified that is needed to decide how to process the stream (e.g., processing by cryptographic engine 110 or 112). Each protocol has different characteristics, so the protocol parser needs to be protocol aware in order to make fast decisions based on the information parsed out of the stream. Protocol decoding units may also request insertion of other protocol fields such as protocol fields for later retrieving data from a data storage unit (e.g., remote storage in a cloud).
In one embodiment, when needed, each protocol decoding unit can alternatively and/or additionally check the integrity of the protocol fields and associated protocol field arguments parsed out of the input data stream 102 to see if the protocol fields and/or field arguments meet certain requirements (e.g., a predetermined rule set). For example, a decoder (e.g., protocol decoding unit 104 or 106) may only allow access through a specific TCP port. This checking against requirements is sometimes referred to herein as “guarding”. Guarding protects unexpected traffic from passing thru the device (e.g., an encryption appliance). In one example, unexpected traffic is discarded and/or reported (e.g., reported to a local user device that has requested that data blocks be written to storage).
In one embodiment, the systolic array components are compiled into a field-programmable gate array (FPGA). In other embodiments, the array can be implemented in other ways such as by using an Application Specific Integrated Circuit (ASIC), or by splitting processing across multiple FPGAs. Protocols are complex and typically require a significant number of FPGA gates to implement. Therefore, each FPGA sometimes may decode only a single protocol at a time. FPGAs can also be reprogrammed to support different protocols at different times. For example, the FPGA can be programmed to support one or more protocols and later reprogrammed with a different protocol or protocols.
If there are enough logic elements in an FPGA, it may be possible to support multiple protocols in one FPGA. In this case, all enumerations of possible combinations of sets of protocol fields are combined to determine the processing that is needed for incoming data. Then, protocol specific actions are contained in separate systolic processing units. For example, a data encryptor may have Ethernet, TCP, and IP layers parsed in a combined set of systolic processing units. The result is then sent to a TCP processing unit that reassembles multiple packets into an NFS stream. The NFS stream is then parsed, and the result is sent to, for example, encryption engines. In this example, the systolic processing units doing the initial parsing are based upon the enumeration of all possible sets of Ethernet, TCP and IP protocol headers before the result is sent to systolic processing units dedicated to specific protocols.
In one embodiment, a packet engine includes an FPGA that is programmed to implement processing units (e.g., protocol decoding units 104, 106) in a systolic array. Based on results from parsing one or more protocol fields of data (e.g., a combination of protocol fields for an incoming packet), the data is routed to a cryptographic engine (e.g., cryptographic engine 110) for encryption processing. The cryptographic engine includes, for example, an FPGA programmed to implement a systolic array for encryption of the data from the packet engine.
In alternative embodiments, a two-dimensional systolic array can be implemented in other ways such as with an array of general purpose CPUs. However, other implementations would likely require more hardware and more power and also be slower. General purpose CPUs must fetch instructions to identify what to do next. By contrast, each processing unit in an FPGA implementation of a systolic array is built to do one task quickly. If an input is entered into a systolic array processing unit on one clock cycle there will quickly be an output, usually within one clock cycle. Hence, an improved pipelined circuit is generated for that protocol.
Data 204 from the input data stream is provided to the data processing engine 210. In one embodiment, all of the input data stream is provided as data 204. In another embodiment, a portion of the input data stream is provided as data 204.
In one embodiment, data processing by data processing engine 210 is controlled based on output decision(s) 206. In one example, data processing engine 210 includes data manipulation engine 108.
Processed data from data processing engine 210 is provided to one of cryptographic engines 212. In some embodiments, a single cryptographic engine 212 can be used. Cryptographic engines 212 can be implemented, for example, using one or more FPGAs. The selected cryptographic engine 212 performs cryptographic processing on identified data from the input data stream (e.g., encryption to provide encrypted data). In one embodiment, the identified data is encrypted based on starting and ending points provided as part of the output decision(s) 206.
In some embodiments, one or more systolic arrays can be used that have a wide variety of processing units such as protocol processing, network protocol, protocol delineation, data encryption, etc. For example, the systolic array can integrate functions as used in packet input engine 208, data processing engine 210, and/or cryptographic engines 212. Each processing unit is, for example, a small, single function processor. As data comes in, it is processed by the first processing unit, and is forwarded to the next processing unit, then on to the next processing unit, etc., until all processing is complete. Each processing unit is part of a data processing pipeline. Within each processing unit, data is also pipelined so that each processing unit can take in new data on each clock tick. In one embodiment, each processing unit performs just one function and has the specific resources needed to be able to process data on each clock tick.
In one embodiment, incoming data packets may be processed by a TCP packet processor, move on to a protocol processor (e.g., packet input engine 208), then move on to an encryption processor (e.g., cryptographic engine 212), etc., before being sent to storage.
In one embodiment, the TCP packet processor converts incoming packets to a byte stream for further processing by another engine or other layer. For example, incoming TCP packets can be processed by identifying the connection with which the packet is associated, stripping out the TCP headers, and assembling data into the correct blocks required by the underlying data type (e.g. Network File System or Amazon Web Services).
In one embodiment, each protocol processor is composed of several processing units. Each processing unit is an independent processor. When a packet first comes into the protocol processor (e.g., packet input engine 208), the header of the packet is searched to determine if the packet should be handled by the TCP packet processor, it is determined whether the packet uses a supported network protocol, and an identification is made for where the packet should be routed next.
In one embodiment, a next step in a TCP receive pipeline is a protocol processor (e.g., packet input engine 208). The protocol processor separates data into blocks based on the protocol for which a device has been built (e.g. Network File System (NFS), Internet Small Computer Systems Interface (iSCSI), Amazon Web Services (AWS), Hypertext Transfer Protocol (HTTP), Transport Layer Security (TLS), etc.) Each protocol processor is compiled for a specific protocol. For example, one TCP engine can support many protocol delineator engines. For each protocol, the data block size is either known or may be identified from data in the input data stream. The protocol processor is used to find the boundaries of each data block. In the TCP byte stream, the protocol processor will find the beginning of the data block, separate out one block of data and send that block of data to the next section. For example, with an NFS block, the protocol delineator would need to find a first command (e.g., an op-code), and then determine the next 500 bytes or whatever block size is correct for that particular NFS block.
In one embodiment, after protocol processing, a received packet is sent to the next systolic processing engine. There are several processors in the path before data is sent to storage. The next processing block can be the encryption engine. The encryption engine identifies what data needs to be encrypted and does the encryption.
In one embodiment, an Ethernet/IP/TCP packet encapsulator takes a packet of data and adds Ethernet headers such as source and destination MAC addresses, type fields, etc., so that the packet can be recognized by standard network devices. The packet is then sent out onto the network (e.g., remote network 306 of
In one embodiment, the process for creating and implementing a systolic array starts with creating an algorithm, identifying what array units are needed, and the data processing flow and interconnects needed between processing units. Then, a tool is used to compile the required systolic array processing units and interconnects, and then program the systolic array solution into an FPGA. Each engine in the systolic array is built from FPGA gates using digital logic (e.g., AND gates, OR gates, flip-flops, etc.). Each engine has all the dedicated resources needed to perform its dedicated function.
In one embodiment, encryption appliance 304 receives the data stream from local data device 302. The data stream includes data portions (e.g., data blocks). One or more protocol fields for each data portion are parsed using a systolic array (e.g., the systolic array of
In one example, encryption appliance 304 is used to write each data block to the remote storage device 308. A data block is retrieved that was previously written to the remote storage device 308. The encryption appliance 304 can be, for example, a hardware device that observes all data blocks being written out from a local data device's file system to a remote storage device, or read back into the local data device from a remote storage device. An example of this is an encryption device that is encrypting and decrypting data blocks to or from a remote storage provider such as Amazon Web Services (AWS) and transmitting the data through an Internet Small Computer Systems Interface (iSCSI).
In one embodiment, data blocks are stored using an iSCSI-based system or a system using another block storage protocol. The data blocks can also be stored on storage systems with self-encrypting drives. In one embodiment, TCP processing of incoming data is used to assemble the incoming data into data blocks corresponding to the storage protocol.
Variations
Without limiting the generality of the foregoing embodiments, various additional non-limiting embodiments and examples are now discussed below. In one embodiment, protocol processing as described above (e.g., protocol field decoding as illustrated in
In some embodiments, the network appliance or encryption appliance (e.g., encryption appliance 304) can be implemented by or use encryption/decryption and/or communication methods and systems as described in U.S. patent application Ser. No. 14/177,392, filed Feb. 11, 2014, entitled “SECURITY DEVICE WITH PROGRAMMABLE SYSTOLIC-MATRIX CRYPTOGRAPHIC MODULE AND PROGRAMMABLE INPUT/OUTPUT INTERFACE,” by Richard J. Takahashi, and/or as described in U.S. patent application Ser. No. 14/219,651, filed Mar. 19, 2014, entitled “SECURE END-TO-END COMMUNICATION SYSTEM,” by Richard J. Takahashi, and/or as described in U.S. patent application Ser. No. 15/688,743, filed Aug. 28, 2017, entitled “CLOUD STORAGE USING ENCRYPTION GATEWAY WITH CERTIFICATE AUTHORITY IDENTIFICATION,” by Jordan Anderson et al., the entire contents of which applications are incorporated by reference as if fully set forth herein. For example, the encryption appliance (e.g., encryption appliance 304 of
In one embodiment, data to be stored in remote storage is encrypted by the encryption appliance at a file or file object level, and at least one key is associated to a file object. Examples of an executable file include a complete program that can be run directly by an operating system (e.g., in conjunction with shared libraries and system calls). The file generally contains a table of contents, a number of code blocks and data blocks, ancillary data such as the memory addresses at which different blocks should be loaded, which shared libraries are needed, the entry point address, and sometimes a symbol table for debugging. An operating system can run an executable file by loading blocks of code and data into memory at the indicated addresses and jumping to it.
Examples of a file object include code that is logically divided into multiple source files. Each source file is compiled independently into a corresponding object file of partially-formed machine code known as object code. At a later time these object files are linked together to form an executable file. Object files have several features in common with executable files (table of contents, blocks of machine instructions and data, and debugging information). However, the code is not ready to run. For example, it has incomplete references to subroutines outside itself, and as such, many of the machine instructions have only placeholder addresses.
In one embodiment, the encryption appliance sets up a transport session with the remote cloud storage or server prior to receiving a payload from the client (e.g., from an application executing on the client), and the encryption appliance uses the transport session for sending or writing data from a plurality of client applications, including the client application, to the remote cloud storage or server.
In one embodiment, data received from a client for writing to remote storage includes a payload having a plurality of file objects, and a payload key is associated to each of the file objects. The payload key can be derived using metadata or file header information, as was described above. In either case, the metadata or file header contains information that is used to derive the payload cipher key with a KEK. The metadata or file header is maintained with the file/object for the life of the file/object so that it can be used at any time to derive the payload cipher key to decrypt the file/object (e.g., when it is read from remote cloud storage).
In one embodiment, the data received from the client comprises packets including a first packet, and a header is inserted into one or more of the packets (e.g., the first packet), wherein the header associates each packet to the client. The file object may be split among multiple packets. In the first packet of a file, identifying information is stored that is used to extract the correct key for decryption when the file is later read (this provides key association with the data).
In one embodiment, the payload key is associated to the client or an object in the data received from the client. The payload key association is made through an identifying feature of the cloud server protocol associated with the cloud or remote server. In Amazon Web Services (AWS), for example, a specific “bucket” (e.g., a folder) can have a key associated with it. The key to use is identified based on that information and uses that association.
Various additional embodiments (embodiment numbers 1-49) are described below. Each embodiment is numbered merely for the sake of reference and convenience.
At least some aspects disclosed can be embodied, at least in part, in software. That is, the techniques may be carried out in a computer system or other data processing system in response to its processor(s), such as a microprocessor, executing sequences of instructions contained in a memory, such as ROM, volatile RAM, non-volatile memory, cache or a remote storage device.
In various embodiments, hardwired circuitry (e.g., one or more hardware processors or other computing devices) may be used in combination with software instructions to implement the techniques above (e.g., the protocol processing system may be implemented using one or more FPGAs and/or other hardware in various types of computing devices). Thus, the techniques are neither limited to any specific combination of hardware circuitry and software nor to any particular source for the instructions executed by the data processing system.
In one embodiment, a computing device may be used that comprises an inter-connect (e.g., bus and system core logic), which interconnects a microprocessor(s) and a memory. The microprocessor is coupled to cache memory in one example.
The inter-connect interconnects the microprocessor(s) and the memory together and also interconnects them to a display controller and display device and to peripheral devices such as input/output (I/O) devices through an input/output controller(s). Typical I/O devices include mice, keyboards, modems, network interfaces, printers, scanners, video cameras and other devices which are well known in the art.
The inter-connect may include one or more buses connected to one another through various bridges, controllers and/or adapters. In one embodiment the I/O controller includes a USB (Universal Serial Bus) adapter for controlling USB peripherals, and/or an IEEE-1394 bus adapter for controlling IEEE-1394 peripherals.
The memory may include ROM (Read Only Memory), and volatile RAM (Random Access Memory) and non-volatile memory, such as hard drive, flash memory, etc.
Volatile RAM is typically implemented as dynamic RAM (DRAM) which requires power continually in order to refresh or maintain the data in the memory. Non-volatile memory is typically a magnetic hard drive, a magnetic optical drive, or an optical drive (e.g., a DVD RAM), or other type of memory system which maintains data even after power is removed from the system. The non-volatile memory may also be a random access memory.
The non-volatile memory can be a local device coupled directly to the rest of the components in the data processing system. A non-volatile memory that is remote from the system, such as a network storage device coupled to the data processing system through a network interface such as a modem or Ethernet interface, can also be used.
In one embodiment, a data processing system such as the computing device above is used to implement one or more of the following: an encryption appliance or gateway, a router, a switch, a key manager, a client application, cloud storage, a load balancer, and a firewall.
In one embodiment, a data processing system such as the computing device above is used to implement a user terminal, which may provide a user interface for control of a computing device. For example, a user interface may permit configuration of the encryption appliance or gateway. A user terminal may be in the form of a personal digital assistant (PDA), a cellular phone or other mobile device, a notebook computer or a personal desktop computer.
In some embodiments, one or more servers of the data processing system can be replaced with the service of a peer to peer network of a plurality of data processing systems, or a network of distributed computing systems. The peer to peer network, or a distributed computing system, can be collectively viewed as a server data processing system.
Embodiments of the disclosure can be implemented via the microprocessor(s) and/or the memory above. For example, the functionalities described can be partially implemented via hardware logic in the microprocessor(s) and partially using the instructions stored in the memory. Some embodiments are implemented using the microprocessor(s) without additional instructions stored in the memory. Some embodiments are implemented using the instructions stored in the memory for execution by one or more general purpose microprocessor(s). Thus, the disclosure is not limited to a specific configuration of hardware and/or software.
In this description, various functions and operations may be described as being performed by or caused by software code to simplify description. However, those skilled in the art will recognize what is meant by such expressions is that the functions result from execution of the code by a processor, such as a microprocessor. Alternatively, or in combination, the functions and operations can be implemented using special purpose circuitry, with or without software instructions, such as using an Application-Specific Integrated Circuit (ASIC) or a Field-Programmable Gate Array (FPGA). For example, the encryption appliance can be implemented using one or more FPGAs.
Embodiments can be implemented using hardwired circuitry without software instructions, or in combination with software instructions. Thus, the techniques are limited neither to any specific combination of hardware circuitry and software, nor to any particular source for the instructions executed by the data processing system.
While some embodiments can be implemented in fully functioning computers and computer systems, various embodiments are capable of being distributed as a computing product in a variety of forms and are capable of being applied regardless of the particular type of machine or computer-readable media used to actually effect the distribution.
At least some aspects disclosed can be embodied, at least in part, in software. That is, the techniques may be carried out in a computer system or other data processing system in response to its processor, such as a microprocessor, executing sequences of instructions contained in a memory, such as ROM, volatile RAM, non-volatile memory, cache or a remote storage device.
Hardware and/or software may be used to implement the embodiments above. The software may be a sequence of instructions referred to as “computer programs.” The computer programs typically comprise one or more instructions set at various times in various memory and storage devices in a computer, and that, when read and executed by one or more processors in a computer, cause the computer to perform operations necessary to execute elements involving the various aspects.
Software used in an embodiment may be stored in a machine readable medium. The executable software, when executed by a data processing system, causes the system to perform various methods. The executable software and data may be stored in various places including for example ROM, volatile RAM, non-volatile memory and/or cache. Portions of this software and/or data may be stored in any one of these storage devices. Further, the data and instructions can be obtained from centralized servers or peer to peer networks. Different portions of the data and instructions can be obtained from different centralized servers and/or peer to peer networks at different times and in different communication sessions or in a same communication session. The data and instructions can be obtained in entirety prior to the execution of the applications. Alternatively, portions of the data and instructions can be obtained dynamically, just in time, when needed for execution. Thus, it is not required that the data and instructions be on a machine readable medium in entirety at a particular instance of time.
Examples of computer-readable media include but are not limited to recordable and non-recordable type media such as volatile and non-volatile memory devices, read only memory (ROM), random access memory (RAM), flash memory devices, floppy and other removable disks, magnetic disk storage media, optical storage media (e.g., Compact Disk Read-Only Memory (CD ROMS), Digital Versatile Disks (DVDs), etc.), among others. The computer-readable media may store the instructions.
In general, a tangible machine readable medium includes any mechanism that provides (e.g., stores) information in a form accessible by a machine (e.g., a computer, network device, personal digital assistant, manufacturing tool, any device with a set of one or more processors, etc.).
Although some of the drawings may illustrate a number of operations in a particular order, operations which are not order dependent may be reordered and other operations may be combined or broken out. While some reordering or other groupings are specifically mentioned, others will be apparent to those of ordinary skill in the art and so do not present an exhaustive list of alternatives. Moreover, it should be recognized that various stages or components could be implemented in hardware, firmware, software or any combination thereof.
Benefits, other advantages, and solutions to problems have been described herein with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any elements that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as critical, required, or essential features or elements of the disclosure.
In the foregoing specification, the disclosure has been described with reference to specific exemplary embodiments thereof. It will be evident that various modifications may be made thereto without departing from the broader spirit and scope as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.
This application claims priority to U.S. Provisional Application Ser. No. 62/751,905 filed Oct. 29, 2018, entitled “SYSTOLIC PROTOCOL PROCESSOR,” by JORDAN ANDERSON et al., the entire contents of which application is incorporated by reference as if fully set forth herein.
Number | Name | Date | Kind |
---|---|---|---|
4972361 | Rader | Nov 1990 | A |
6226680 | Boucher et al. | May 2001 | B1 |
6247060 | Boucher et al. | Jun 2001 | B1 |
6480937 | Vorbach et al. | Nov 2002 | B1 |
6571381 | Vorbach et al. | May 2003 | B1 |
6687788 | Vorbach et al. | Feb 2004 | B2 |
6990555 | Vorbach et al. | Jan 2006 | B2 |
7003660 | Vorbach et al. | Feb 2006 | B2 |
7069372 | Leung, Jr. et al. | Jun 2006 | B1 |
7174443 | Vorbach et al. | Feb 2007 | B1 |
7210129 | May et al. | Apr 2007 | B2 |
7266725 | Vorbach et al. | Sep 2007 | B2 |
7382787 | Barnes et al. | Jun 2008 | B1 |
7394284 | Vorbach | Jul 2008 | B2 |
7418536 | Leung et al. | Aug 2008 | B2 |
7434191 | Vorbach et al. | Oct 2008 | B2 |
7444531 | Vorbach et al. | Oct 2008 | B2 |
7450438 | Holst et al. | Nov 2008 | B1 |
7480825 | Vorbach | Jan 2009 | B2 |
7525904 | Li et al. | Apr 2009 | B1 |
7577822 | Vorbach | Aug 2009 | B2 |
7581076 | Vorbach | Aug 2009 | B2 |
7595659 | Vorbach et al. | Sep 2009 | B2 |
7602214 | Vorbach | Oct 2009 | B2 |
7657861 | Vorbach et al. | Feb 2010 | B2 |
7657877 | Vorbach et al. | Feb 2010 | B2 |
7710991 | Li et al. | May 2010 | B1 |
7782087 | Vorbach | Aug 2010 | B2 |
7840842 | Vorbach et al. | Nov 2010 | B2 |
7844796 | Vorbach et al. | Nov 2010 | B2 |
7928763 | Vorbach | Apr 2011 | B2 |
7996827 | Vorbach et al. | Aug 2011 | B2 |
8058899 | Vorbach et al. | Nov 2011 | B2 |
8069373 | Vorbach | Nov 2011 | B2 |
8099618 | Vorbach et al. | Jan 2012 | B2 |
8127061 | Vorbach et al. | Feb 2012 | B2 |
8145881 | Vorbach et al. | Mar 2012 | B2 |
8156284 | Vorbach et al. | Apr 2012 | B2 |
8209653 | Vorbach et al. | Jun 2012 | B2 |
8270399 | Holst et al. | Sep 2012 | B2 |
8270401 | Barnes et al. | Sep 2012 | B1 |
8281265 | Vorbach et al. | Oct 2012 | B2 |
8301872 | Vorbach et al. | Oct 2012 | B2 |
8312301 | Vorbach et al. | Nov 2012 | B2 |
8335915 | Plotkin et al. | Dec 2012 | B2 |
8423780 | Plotkin et al. | Apr 2013 | B2 |
8429385 | Vorbach | Apr 2013 | B2 |
8686549 | Vorbach | Apr 2014 | B2 |
8791190 | Paressley et al. | Jul 2014 | B2 |
8812820 | Vorbach et al. | Aug 2014 | B2 |
8914590 | Vorbach et al. | Dec 2014 | B2 |
9037807 | Vorbach | May 2015 | B2 |
9094237 | Barnes et al. | Jul 2015 | B2 |
9170812 | Vorbach et al. | Oct 2015 | B2 |
9231865 | Sankaralingam | Jan 2016 | B2 |
9317718 | Takahashi | Apr 2016 | B1 |
9374344 | Takahashi | Jun 2016 | B1 |
9794064 | Takahashi | Oct 2017 | B2 |
20010021949 | Blightman et al. | Sep 2001 | A1 |
20010023460 | Boucher et al. | Sep 2001 | A1 |
20010027496 | Boucher et al. | Oct 2001 | A1 |
20010036196 | Blightman et al. | Nov 2001 | A1 |
20030046607 | May et al. | Mar 2003 | A1 |
20030056202 | May et al. | Mar 2003 | A1 |
20030074518 | Vorbach et al. | Apr 2003 | A1 |
20040015899 | May et al. | Jan 2004 | A1 |
20040025005 | Vorbach et al. | Feb 2004 | A1 |
20040095998 | Luo et al. | May 2004 | A1 |
20040128474 | Vorbach | Jul 2004 | A1 |
20040153608 | Vorbach et al. | Aug 2004 | A1 |
20040153642 | Plotkin et al. | Aug 2004 | A1 |
20040158640 | Philbrick et al. | Aug 2004 | A1 |
20040190512 | Schultz | Sep 2004 | A1 |
20040243984 | Vorbach et al. | Dec 2004 | A1 |
20040249880 | Vorbach | Dec 2004 | A1 |
20050022062 | Vorbach | Jan 2005 | A1 |
20050053056 | Vorbach et al. | Mar 2005 | A1 |
20050066213 | Vorbach et al. | Mar 2005 | A1 |
20050086462 | Vorbach | Apr 2005 | A1 |
20050086649 | Vorbach et al. | Apr 2005 | A1 |
20050223212 | Vorbach et al. | Oct 2005 | A1 |
20050257009 | Vorbach et al. | Nov 2005 | A9 |
20060013162 | Lim | Jan 2006 | A1 |
20060075211 | Vorbach | Apr 2006 | A1 |
20060117126 | Leung et al. | Jun 2006 | A1 |
20060136735 | Plotkin et al. | Jun 2006 | A1 |
20060165006 | Kelz | Jul 2006 | A1 |
20060192586 | Vorbach | Aug 2006 | A1 |
20060245225 | Vorbach | Nov 2006 | A1 |
20060248317 | Vorbach et al. | Nov 2006 | A1 |
20070011433 | Vorbach | Jan 2007 | A1 |
20070050603 | Vorbach et al. | Mar 2007 | A1 |
20070083730 | Vorbach et al. | Apr 2007 | A1 |
20070113046 | Vorbach et al. | May 2007 | A1 |
20070150637 | Vorbach | Jun 2007 | A1 |
20070195951 | Leung, Jr. | Aug 2007 | A1 |
20070299993 | Vorbach et al. | Dec 2007 | A1 |
20080191737 | Vorbach | Aug 2008 | A1 |
20090006895 | May et al. | Jan 2009 | A1 |
20090031104 | Vorbach et al. | Jan 2009 | A1 |
20090037587 | Yoshimi et al. | Feb 2009 | A1 |
20090037865 | Vorbach et al. | Feb 2009 | A1 |
20090063702 | Holst et al. | Mar 2009 | A1 |
20090100286 | Vorbach et al. | Apr 2009 | A1 |
20090144522 | Vorbach et al. | Jun 2009 | A1 |
20090146691 | Vorbach et al. | Jun 2009 | A1 |
20090150725 | Vorbach | Jun 2009 | A1 |
20090172351 | Vorbach et al. | Jul 2009 | A1 |
20090210653 | Vorbach et al. | Aug 2009 | A1 |
20090300262 | Vorbach | Dec 2009 | A1 |
20100023796 | Vorbach et al. | Jan 2010 | A1 |
20100039139 | Vorbach | Feb 2010 | A1 |
20100070671 | Vorbach et al. | Mar 2010 | A1 |
20100095088 | Vorbach | Apr 2010 | A1 |
20100095094 | Vorbach et al. | Apr 2010 | A1 |
20100122064 | Vorbach | May 2010 | A1 |
20100153654 | Vorbach et al. | Jun 2010 | A1 |
20100174868 | Vorbach | Jul 2010 | A1 |
20100238942 | Estan | Sep 2010 | A1 |
20100241823 | Vorbach et al. | Sep 2010 | A1 |
20110006805 | Vorbach | Jan 2011 | A1 |
20110060942 | Vorbach | Mar 2011 | A1 |
20110161977 | Vorbach | Jun 2011 | A1 |
20110238948 | Vorbach et al. | Sep 2011 | A1 |
20120197856 | Banka et al. | Aug 2012 | A1 |
20120277333 | Pressley et al. | Nov 2012 | A1 |
20120320921 | Barnes et al. | Dec 2012 | A1 |
20130283060 | Kulkarni et al. | Oct 2013 | A1 |
20150095645 | Eldar | Apr 2015 | A1 |
20150106596 | Vorbach et al. | Apr 2015 | A1 |
20160055120 | Vorbach et al. | Feb 2016 | A1 |
20170085372 | Anderson et al. | Mar 2017 | A1 |
20170230447 | Harsha et al. | Aug 2017 | A1 |
20170359317 | Anderson et al. | Dec 2017 | A1 |
20190158427 | Harsha et al. | May 2019 | A1 |
20200004695 | Anderson et al. | Jan 2020 | A1 |
20200177540 | Abel | Jun 2020 | A1 |
20200192701 | Horowitz et al. | Jun 2020 | A1 |
Number | Date | Country |
---|---|---|
10-2005-0073506 | Jul 2005 | KR |
2006093079 | Sep 2006 | WO |
2020023364 | Jan 2020 | WO |
2020092009 | May 2020 | WO |
Entry |
---|
Shaaban, “Systolic Architectures”, EECC756, Portland State University, Mar. 11, 2003. |
The International Bureau of WIPO; PCT International Preliminary Report of Patentability, issued in connection to PCT/US2019/056644; 7 pages; dated Apr. 27, 2021; Switzerland. |
Korean Intellectual Property Office; PCT International Search Report, issued in connection to PCT/US2019/042793; dated Nov. 11, 2019; 3 pages; Korea. |
Korean Intellectual Property Office; PCT Written Opinion of the International Searching Authority, issued in connection to PCT/US2019/042793; dated Nov. 11, 2019; 6 pages; Korea. |
Korean Intellectual Property Office; PCT International Search Report, issued in connection to PCT/US2019/056644; dated Feb. 5, 2020; 9 pages; Korea. |
Korean Intellectual Property Office; PCT Written Opinion of the International Searching Authority, issued in connection to PCT/US2019/056644; dated Feb. 5, 2020; 6 pages; Korea. |
The International Bureau of WIPO; PCT International Preliminary Report on Patentability, issued in connection to PCT/US2019/042793; dated Jan. 26, 2021; 7 pages; Switzerland. |
Number | Date | Country | |
---|---|---|---|
20200293487 A1 | Sep 2020 | US |
Number | Date | Country | |
---|---|---|---|
62751905 | Oct 2018 | US |