The present application claims priority to United Kingdom Patent Application Serial No. 1115384.8 filed Sep. 6, 2011. The content of the above-identified patent document(s) is incorporated herein by reference.
The present application relates generally to memory systems and, more specifically, to minimizing memory latency when crossing a security engine.
Security (encryption) algorithms integrated with mass storage memory devices such as dynamic random access memories (DRAMs) improved data integrity, but can contribute significantly to memory latency and thus to overall processing latency.
There is, therefore, a need in the art for improved memory used with security engines.
A first engine and a memory access controller are each configured to receive memory operation information in parallel. In response to receiving the memory operation information, the first engine is prepared to perform a function on memory data associated with the memory operation and the memory controller is configured to prepare the memory to cause the memory operation to be performed.
Before undertaking the DETAILED DESCRIPTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation; the term “or,” is inclusive, meaning and/or; the phrases “associated with” and “associated therewith,” as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term “controller” means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.
For a more complete understanding of the present disclosure and its advantages, reference is now made to the following description taken in conjunction with the accompanying drawings, in which like reference numerals represent like parts:
The present disclosure relates to an arrangement which may comprise or be coupled to a memory, in particular (but not exclusively) a dynamic random access memory (DRAM). Modern System-on-Chip (SoC) or Network-on-Chip (NoC) designs in application domains may require higher central processing unit (CPU) performance than was previously acceptable. However CPU performance is impacted by memory latency—that is, by the number of clock cycles or the time delay for writing data into the memory and/or the number of clock cycles or the time delay for reading data out of the memory. In particular, the read latency may be different from the write latency.
Still further, some applications require security engines for the encryption and/or decryption of data including, for example, data stored in the memory. A security engine provided in the path between the CPU and the memory may increase the time taken for the read and/or write operations to be completed.
The present disclosure relates to a versatile data processor embedded in a memory controller that masks memory encryption latency. According to one aspect of the present disclosure, an arrangement includes a first engine and a memory controller each configured to receive memory operation information in parallel (concurrently). In response to receiving such information, first engine is prepared to perform a function on memory data associated with the memory operation and memory controller prepares the memory to cause the memory operation to be performed.
Each processing unit 103 is arranged to communicate with the memory 102 via a network-on-chip 104 and a memory controller 105. A processing unit 103 sends requests to the memory 102 via the network-on-chip 104 and receives responses from the memory 102 again via the network-on-chip 104. The network-on-chip 104 is arranged to communicate with the memory 102 via the memory controller 105. The network-on-chip 104 provides a routing function, while the memory controller 105 is arranged to control the storage (writing) of data to and/or retrieval (reading) of data from the memory 102. The communication channel 106 between the processing units 103 and the memory controller 105 can be considered to be between the processing units 103 and the network-on-chip 104, and between the network-on-chip 104 and the memory controller 105. The memory 102 contains data that is shared by the processing units 103.
As shown schematically, the processing units 103, network-on-chip 104 and memory controller 105 are provided in the SoC IC 101, with the memory 102 external to the SoC IC. However, it should be appreciated that embodiments may have memory itself as part of the SoC IC 101.
In an alternative arrangement, the security engine may be arranged between the NoC 104 and the processors 103.
Because SoCs are consuming more and more data and are requiring higher and higher memory bandwidth, some memories use a protocol with a pipeline command channel to handle these requirements, with several commands per DRAM channel and a data channel. One example of a memory using such a protocol is the DRAM. These channels are provided for example between the memory controller 105 and the memory 102.
It will be appreciated that while communications on the command channel and data channel are shown for a write operation, the read operation will have a similar delay between a read command on the command channel and the read data of the data channel. Thus, the delays between the command and data channels are commonly known as the write and read latencies, respectively.
The preamble and post-amble commands are used in at least some DRAMs, although those skilled in the art will appreciate that alternative memories may not require the preamble and/or post-amble commands or may have one or more different commands.
With the architecture of
In some scenarios, flexibility and/or scalability requirements and industrial standard protocols often lead to the serialization and functional split of overall processing into multiple processing units. DRAM protocol complexity, with the various read and write latencies, may render placement of the security engine in the path between the DRAM controller and the DRAM itself difficult. For example, the DRAM and the associated controller may be provided in a single functional block while the security engine is implemented by a different block, to provide modularity in the design process. The result is that the DRAM and the associated controller do not need to be changed even when used in different products, and likewise the security engine will not need to be changed. However, this means that the DRAM and the associated controller will need to interact with the security engines via their respective interfaces.
The embodiments described have a DRAM with a DRAM controller and a security engine. However, it should be appreciated that alternative embodiments may be used at other locations in the SoC 111 and/or with other entities other than a memory and its controller and/or the security engine. Such alternatives may be used where the protocol used by interface processing units is managing command and data channels and some other manipulation also needs to be performed on the data. For example, some embodiments may be used where there is data manipulation and a check needs to be made to ascertain the probability that data has been read correctly. Some embodiments may be used where there is redundancy error correction. Some embodiments may be used where there is an application task performed on data.
Some embodiments may be used with a network AXI (Advanced eXtensible Interface) protocol. Of course, other embodiments may be used with other protocols which manage separate command and data channels.
Referring to
The interface 301 is arranged to provide the DRAM operation information 302 to a command delay compensation block 306 and to a first queue 305. The output 307 of the first queue 305 is a delayed version of the DRAM operation information 302. This output 307 is input to a pipeline scramble pattern engine 308. The DRAM operation information may be a read or write operation. DRAM operation information is received directly by the command delay compensation block 306.
The output of the command delay compensation block 306 is provided to a DRAM protocol converter 310. The DRAM protocol converter 310 is one example of a memory controller 105. The DRAM protocol converter 310 is arranged to receive the DRAM operation information and to output the DRAM command operation 311 to the DRAM 102.
The pipeline scramble pattern engine 308 provides an output to a second queue 312, the output of which is received by a data scrambling block 314, in the case of a write operation. The pipeline scramble engine 308 also provides an output to a third queue 313, the output of which is received by a data descrambling block 315, in the case of a read operation. The pipeline scramble pattern engine 308, the data scrambling block 314 and the data descrambling block 315 (as well as the second and third queues 312 and 313) may be regarded as being the security engine 112. The output provided by the pipeline scramble pattern engine 308 to the data scrambling block and data descrambling block comprises the scrambling and descrambling pattern, respectively. It should be appreciated that the data scrambling block 314 is configured to scramble data to be written to the DRAM 102 while the data descrambling block 315 is configured to descramble data received from the DRAM 102 (i.e., the read data). The DRAM operation information 302 is thus used to get the data scrambling block or data descrambling block ready to carry out the respective operation on the data received by those blocks. The DRAM operation will comprise a read operation or a write operation, in some embodiments.
The NoC protocol interface 301 is configured to provide the write data via path 316 to be written to the DRAM to the data scrambling unit block 314. The data scrambling block 314 scrambles the data using the pattern provided by the pipelined scramble pattern engine 308 via the second pattern queue. The scrambled data is provided via path 317 to the DRAM protocol converter 310. This data 320 is then written to the DRAM.
For read data, the read data 320 is provided by the DRAM 102 to the DRAM protocol converter 310. The read data is then provided via path 318 by the DRAM protocol converter 310 to the data descrambling block 316, which descrambles the read data and provides the descrambled read data to the NoC protocol interface 301.
Read latency and write latency information is fed back from the output of the DRAM protocol converter to the command delay compensation block 306. This feedback may be provided by a data analyzer or snooper or any other suitable mechanism. The read or write latency is or includes the delay between the command channel and the read channel. This information may be determined by snooping the inputs and/or outputs of the DRAM protocol. In some embodiments, the information may alternatively be already known, which may be dependent on configuration. If the information is already known, the information may be stored in the command delay compensation block and/or the protocol convertor.
The function of the command delay compensation block 306 will be described in more detail below.
Referring to
The first internal information 302 which is used by the command delay compensation block is the DRAM operation information which is received from the output of the NoC protocol interface (not via the queue 305). The second information which is received by the command delay compensation unit 306 is the DRAM command output of the DRAM protocol converter 310, which is indicated by reference character 322. As mentioned previously, the output of the DRAM protocol converter 310 may be snooped and provided to the command delay compensation block 306. Alternatively or additionally, the second information may be provided by an internal signal of the DRAM protocol convertor. This may have the same timing as the DRAM command output or may have a particular timing relationship with the DRAM command output. For example, the internal signal may have an earlier timing or a later timing than the DRAM command output. The internal signal may be output from the DRAM protocol convertor to the command delay compensation unit 306. The third information which is provided is from the input side of the second pattern queue 312, which is identified by reference character 326a. The fourth information which is provided is from the input side of the third queue 313, which is identified by reference character 326b. The fifth information which is provided is from the output side of the second pattern queue 312, identified by reference character 328a. The sixth information which is provided is from the output side of the third queue 313, identified by reference character 328b. The seventh information which is provided is from the output side of the first queue 305. The inputs and/or outputs of the queues may be snooped or monitored in any suitable way.
The command delay compensation block 306 is arranged to provide an output to the DRAM protocol converter. This is the DRAM operation information 302 which comprises the DRAM command channel. The command delay compensation block 306 is able to control the timing of the DRAM operation information and in particular the DRAM commands. In particular, the timing of the provision of the DRAM operation signal to the DRAM protocol converter 310, controls the timing of the DRAM command 322.
In this regard, reference is made to
The command delay compensation block has a second time measure block 403. This measures a delay between the DRAM command at the DRAM 102 and the output of the scramble queue. In one embodiment, this done by measuring the delay between the second information 322 and the fifth information 328a. This delay WL′ provides information relating to a measure of the write latency 404 and the scrambling delay. This information is provided to a decision block 402.
1. NoC protocol interface receives DRAM operation;
2. The first queue outputs the DRAM operation 302a;
3. Command delay compensation unit receives DRAM operation;
4. Scrambling pattern at input of queue;
5. DRAM command at DRAM;
6a. Write data at NoC protocol interface;
6b. Scramble pattern at output of queue;
7. Scrambled write data output by data scrambling block 314; and
8. Data written to DRAM.
Depending on the latencies, there may by some variation in the relative times of some of the steps. Relative positions of the vents related to the command path with respect to the scrambling path may change. For example, step 5 may occur before step 4 or step 6b may occur before step 5. It should be appreciated that a measure of the write latency can be measured between the DRAM command at the output of the DRAM protocol convertor 310 and the data 320 at the input of the DRAM.
The output of the second time measure block 403 is input to the decision block 402. Thus, the decision block 402 receives information which reflects the latency of the scramble pattern engine and also the DRAM write latency.
The output of the decision block 402 controls the delay applied to the DRAM operation. In particular, the output of the command delay compensation block 306 is used to control when the DRAM protocol converter outputs the DRAM command. This may be controlled by delaying when the DRAM protocol converter 310 receives the DRAM operation from the command delay compensation block 306.
Referring to
The second time measure block 403 measures a delay between the DRAM command at the DRAM 102 and the output of the descramble queue. In one embodiment, this is done by measuring the delay between the second information 322 and the sixth information 328b. This delay RL′ provides information about the read latency 410 and the scrambling delay. This information is provided to the decision block 402.
1. NoC protocol interface receives DRAM operation;
2. The first queue outputs the DRAM operation 32a;
3. Command delay compensation unit receives DRAM operation;
4. Descrambling pattern at input of queue;
5. DRAM command at DRAM;
6. DRAM data read from DRAM;
7a. Descramble pattern at output of queue;
7b. Scrambled read data output from DRAM protocol converter; and
8. Read data at NoC protocol interface.
Depending on the latencies, there may by some variation in the relative times of some of the steps as discussed in relation to
Referring to
In the case of the architecture of
In some embodiments, the latency may be M+x, where x is used to mask the delay N. Generally x is greater than or equal to N. Where x is not greater than or equal to N, the decision logic will add a delay y to satisfy the requirement, by delaying when the command is issued by the memory controller. x+y will be greater than or equal to N.
The first time measure block 401 is providing a measure of N and the second time measure block 403 is providing a measure of x. The command delay compensation may adjust the delay on the DRAM operation using an iterative algorithm which adjusts the delay and can learn over several DRAM operations. Some embodiments may improve the DRAM access latency of systems having a security engine. The latency required for the scramble pattern computation may be effectively hidden by taking advantage of the intrinsic latency of the DRAM protocol. Embodiments may permit the encryption of sensitive data stored in an external memory. The security engines have a latency associated therewith. The encryption latency can be masked fully or partially due to the latency present in a number of memory protocols supporting for example burst mode operation.
Some embodiments may have the advantage that a modular approach may be made with respect to the memory controller on the one hand and the scrambling engine on the other hand. This may reduce design time and effort.
The embodiments described have the first, second and third queues. One or more of these queues may be dispensed with. In alternative embodiments one or more additional queues may be provided at any suitable location or locations. For example, one or more queues may be associated with the DRAM protocol convertor 40. Some embodiments may even have no queues. In some embodiments, the number and position of the queues may be dependent on a required timing performance for a specific implementation.
The one or more queues may provide synchronization between different blocks. For example the first queue may provide synchronization between for example, one or more of the NoC protocol interface 301, the pipelined scramble pattern engine 308, the data scrambling block 314 and the data descrambling block 315. Similar synchronization may be provided by the second queue between, for example, the scramble pattern engine and the data scrambling block 314. Likewise, similar synchronization may be provided by the third queue between for example, the scramble pattern engine and the data descrambling block 315.
Some embodiments may be used with only one or with more processing units. Some embodiments may be used other than in system on chips. Some embodiments may be in an integrated circuit or partly in an integrated circuit and off chip or completely off chip. Some embodiments may be used in a set of two or more integrated circuits or in two or more modules in a common package. Some embodiments may be used with a different routing mechanism to the NoC routing described. For example, crossbar buses or other interconnects may be used.
The security engine has been described as performing scrambling and descrambling. Other embodiments may additionally or alternatively use other methods of applying security to data.
One or more of the queues may be provided by buffers, FIFOs or any other suitable circuitry. Alternative embodiments may use different references points in order to provide a measure of a particular latency. The command delay compensation block means that some embodiments have the learning capability to measure unknown system delays as well as DRAM latencies.
Some embodiments have the adaptive capability to compensate system delays with respect to DRAM latencies and adjust the DRAM operation execution time to satisfy the operation requirements. While embodiments have been described in relation to a DRAM, it should be appreciated that embodiments may alternatively be used with any other memory.
The described embodiments have been in the context of a security engine with respect to read and write latency. It should be appreciated that alternative embodiments may be used with any other engine with an associated delay.
Although the present disclosure has been described with an exemplary embodiment, various changes and modifications may be suggested to one skilled in the art. It is intended that the present disclosure encompass such changes and modifications as fall within the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
GB 1115384.8 | Sep 2011 | GB | national |