The subject disclosure relates to data transfer. Particularly, the subject disclosure pertains to a streaming protocol for improving data throughput between two transfer devices.
Bandwidth maximization is desired for high speed applications, such as solid-state drives (SSD), where increasing bandwidth usage can result in greatly improved performance. The instant disclosure provides a system and method for implementing a streaming protocol that improves data throughput and provides an efficient “quality of result” (QoR). In certain aspects, the protocol implementation can be configured such that data transfer can be made to occur between either synchronous or asynchronous devices.
A system for transferring data may comprise a hardware device and an input streaming interface operably connected to the hardware device. In this implementation, the hardware device may be configured to receive data and the input streaming interface is configured to determine that a receiving device will accept data transmitted by the hardware device, activate a receiver ready signal to inform a data source that the input streaming interface is ready to receive data, detect the activation of a source signal and a data initiation signal associated with the data source, receive, in response to the detection, source data transmitted by the data source over a data bus, and forward the source data to the hardware device.
The input streaming interface may be further configured to receive an indication that the data source has completed transmission of the source data, and complete the receiving of the source data at the input streaming device on receiving the indication. In some aspects, the input streaming device may be configured to detect that the source signal has been suspended before the transmission of the source data is completed, and suspend the receiving of the source data for a period of time that the source signal remains suspended. Additionally or in the alternative, the system may further comprise output streaming interface operably connected to the hardware device and configured to determine that the receiving device will accept data transmitted by the hardware device, activating output ready signal to inform the input streaming interface that the receiving device will accept data transmitted by the hardware device, the receiver ready signal being activated based on the output ready signal, receive the source data from the hardware device, and forward the source data to the receiving device.
In further aspects, the system may further comprise output streaming interface operably connected to the data source. Accordingly, the output streaming interface may initiate a transmission of the source data from the output streaming interface to the input streaming interface over the data bus, inform the input streaming interface that the transmission has been initiated by activating the source signal and the data initiation signal, suspend the transmission of the source data for a period of time, and in conjunction with suspending the transmission, inform the input streaming interface that the transmission has been suspended by deactivating the source signal for the period of time.
In some implementations, the subject disclosure provides a method for transferring data, the method comprising on receiving a data initiation signal in conjunction with a source signal from a data source, reading source data from a data bus, and, on receiving a data completion signal from the data source after the data initiation signal, completing the reading of the source data.
In further implementations, a method for transferring data may comprise initiating a transmission of data to a receiving device, activating a source signal and a data initiation signal in connection with the initiation of the transmission of data, and transmitting the data only when a receiver ready signal is activated. The method may further comprise, transmitting a first portion of data to the receiving device, determining that the receiver ready signal has been deactivated before the transmission of the first portion of data is complete, suspending the transmission of the first portion of data until the receiver ready signal is reactivated, receiving an indication that the receiver ready signal has been reactivated, completing, on the indication, the transmission of the first portion of data, and transmitting a second portion of data to the receiving device.
Additionally or in the alternative, the method may further comprise transmitting a first portion of data to the receiving device, deactivating, after transmitting the first portion of data, the source signal to signal a suspension of the transmission of data, waiting for a period of time, transmitting, after the period of time, a subsequent portion of data to the receiving device.
It is understood that other configurations of the subject technology will become readily apparent to those skilled in the art from the following detailed description, wherein various configurations of the subject technology are shown and described by way of illustration. As will be realized, the subject technology is capable of other and different configurations and its several details are capable of modification in various other respects, all without departing from the scope of the subject technology. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not as restrictive.
The detailed description set forth below is intended as a description of various configurations of the subject technology and is not intended to represent the only configurations in which the subject technology may be practiced. The appended drawings are incorporated herein and constitute a part of the detailed description. The detailed description includes specific details for the purpose of providing a thorough understanding of the subject technology. However, the subject technology may be practiced without these specific details. In some instances, structures and components are shown in block diagram form in order to avoid obscuring the concepts of the subject technology. Like components are labeled with identical element numbers for ease of understanding.
In various implementations, the subject technology includes an output streaming interface (“output streamer interface”) and an input streaming interface (“input streamer interface”) for transferring data between hardware devices using a minimal number of signals. The subject technology further implements a protocol that facilitates a high transfer of data by efficient management of data flow. Accordingly, data is transferred between interfaces in response to specific signaling patterns between the interfaces. For example, the input streamer interface informs the output streamer that it is ready to receive data over a data bus by activating a signal. The input streamer then detects the activation of a source signal and a data initiation signal from the output streamer interface, and receives, in response to the detection, source data transmitted by the output streamer interface over a data bus, and forwards the source data to a hardware device.
Controller 101 may be implemented with a general-purpose microprocessor, a microcontroller, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a programmable logic device (PLD), a controller, a state machine, gated logic, discrete hardware components, or a combination of the foregoing. One or more sequences of instructions may be stored as firmware on ROM within the controller. One or more sequences of instructions also may be software stored and read from another storage medium, such as flash memory array 103, or received from a host 104 via host interface 102. ROM, storage media, and flash memory arrays represent examples of machine or computer readable media on which instructions/code executable by the controller can be stored. Machine or computer readable media may generally refer to any medium or media used to provide instructions to controller 101, including both volatile media, such as dynamic memory used for storage media or for buffers within the controller, and non-volatile media, such as electronic media, optical media, and magnetic media.
Host interface 102 may be configured to implement a standard interface, such as Serial-Attached SCSI (SAS), Fiber Channel interface, PCI Express (PCIe), SATA, USB, and the like. The host interface may be configured to implement only one interface. Alternatively, host interface 102 may be configured to implement multiple interfaces, which are individually selectable using a configuration parameter selected by a user or programmed at the time of assembly. Host interface 102 may include one or more buffers for buffering transmissions between a host and the controller. Host 104 may be any device configured to be coupled to the data storage system and to store data in data storage system. The host may be, for example, a computing system such as a personal computer, a server, a workstation, a laptop computer, PDA, smart phone, and the like. Alternatively, the host may be an electronic device such as a digital camera, a digital audio player, a digital video recorder, and the like.
Flash memory array 103 represents non-volatile memory devices for storing data. According to one aspect of the subject technology, flash memory array 103 includes NAND flash memory. Each component of flash memory array 103 may include a single flash memory device or chip, or may include multiple flash memory devices or chips arranged in multiple channels, as depicted in
It is desirable for storage applications to increase bandwidth utilization, transfer data in each cycle without interruptions (stalls), and have the ability to transfer a precise number of byte bursts, irrespective of the data-bus size. Bandwidth maximization is particularly useful for high speed applications (e.g., for use in SSD), where increasing bandwidth usage can result in greatly improved performance.
Implemented measures or protocols for increasing bandwidth utilization should take into consideration other aspects of hardware implementation, for example static and dynamic power considerations, area and timing. Systems that utilize many hand-shake signals and buses result in a greater amount of power dissipation and increased area. In order to increase bandwidth, and decrease latency, it may be desirable to decrease the number of signals/connections between transfer devices.
The instant disclosure provides a system and method for increasing bandwidth usage while decreasing the number of hand-shake signals and data buses between configurable (synchronous or asynchronous) transfer devices. As a result, the subject disclosure provides for improved bandwidth utilization while also improving upon area and power parameters. More specifically, the instant disclosure describes a streaming protocol that improves throughput and provides an efficient “quality of result” (QoR). In certain aspects, the protocol implementation can be configured such that data transfer can be made to occur between either synchronous or asynchronous devices.
In the depicted example, system block A may be considered a “starting block.” System block A includes an output streamer interface 201, which is internally connected to a hardware device (e.g., a memory interface or a core logic interface), and operably connected to an input streamer interface 202 of system block B. Block B is in the middle of the chain and contains both input streamer interface 202 and an output streamer interface 203. Output streamer interface 203 of system block B is connected to an input streamer interface 204 of system block C, which represents an end device in the chain.
System block A may be considered a “data source” and the input side of system block B may be considered a “receiving device” (or “sink block” or “data sink”). Similarly, the output side of system block B may be considered a data source to the input side of system block C (e.g., a downstream receiving device). In this regard, each data source may include a hardware device such as a flash memory device or other hardware device capable of receiving and/or sending data, or, as depicted by
In certain aspects, data signals between a data source and a receiving device (e.g., between output streamer interface 201 and input streamer interface 202) include a receiver ready signal (“_sink”), a source signal (“_src”), a data bus (one or more “_data” signals), a data initiation signal (“_first”), a data completion signal (“last”), and a byte valid signal (“_vbc”).
When activated (e.g., asserted, set to a binary high, or the like) the _sink signal indicates that a receiving device can accept data from a data source. In practice, the _sink signal can be used for back pressuring data flow from the data source (e.g., system block A) to a downstream receiving device (e.g., system block C). The _src signal is activated by a data source to inform a receiving device that the data source is ready to transmit data and that all other source-to-sink signals from the data source are valid. When the _sink and _src are activated, the data bus and other source-to-sink signals (for example, _first, _last, and _vbc) are sampled by the receiving device. Otherwise these source-to-sink signals are ignored. One or more _data signals are carried via the data bus from the data source to the receiving device. In certain aspects, the width of the data bus is defined according to manufacturer or user data transfer requirements. Although any number of bytes may be potentially transferred on the data bus, in some examples the data bus only supports an integer number of bytes and does not support fractional bytes.
The _first signal is activated by the data source to inform the receiving device (e.g., input streamer interface 202 or 204) that a data transmission has been initiated. The data bus may be used to transfer one or more payloads of source data, including, for example, a block, page, or code word of data. Accordingly, activation of the _first signal may indicate the beginning of the payload. As described previously, the _src signal may inform the receiving device that the _first signal is valid. The _vbc signal provides an indication as to which bytes transmitted on the _data bus are valid for the last data sample transfer. The _vbc signal indicates the total data transfer size of data transferred in a corresponding cycle. For example, _vbc may be 4 bits wide for a 4 byte wide data bus, with each bit representative of a byte transferred on the data bus. If all 4 bytes of the data bus are used in a current data transfer then all 4 bits of _vbc may be activated. If a final transmission only requires 3 bytes to be transferred then the fourth data bit may be set to 0 to inform the receiving device that the fourth byte should not be read.
During the data transmission, output streamer interface 201 may signal the suspension of data transmission and suspend the data transmission by deactivating the _src signal. For example, output streamer interface 201 may transmit a first portion of data to input streamer interface 202 (e.g., data0 and data1 of
As illustrated, in the first clock cycle, wherein Data0 is transmitted, the _sink, _src and _first signals are all high. Subsequently, in the second clock cycle, wherein Data1 is transmitted, the _first signal returns to the low state. The transfer break occurs between Data1 and Data2 in the third clock cycle. During the transfer break, the _sink signal is high, while the _src and _first signals are low. After the transfer break has ended (e.g., in the fourth clock cycle wherein Data2 is transmitted), the _sink and _src signals are both high, indicating both devices are ready to send and receive data, and wherein the _first and _last signals are low, indicating that data transmission is ongoing. Data transmission concludes with the activation of the _last signal and simultaneous transmission of DataN. On completion of the transmission, the _sink, _src and _last signals are all high.
For example, output streamer interface 201 may transmit a first portion of data to input streamer interface 202, and then determine that the _sink signal has been deactivated before the first portion of data is completed (e.g., accepted by input streamer interface 202). Output streamer interface 201 then suspends the transmission of the first portion of data until the _sink signal is reactivated. Suspending the transmission of the first portion may include holding the current values of the _data signals, letting them float, or the like. Concurrently, input streamer interface 202 suspends reading of the _data signals (e.g., ignores them) while the _sink signal is deactivated.
Input streamer interface 202 may have deactivated the _sink signal to suspend reading of data because a read buffer satisfied a threshold (e.g., was at capacity) or because a downstream receiving device has indicated that the device would not receive further data. Interface 202 may wait a period of time (e.g., one or more clock cycles) until it receives an indication that data flow may continue. Accordingly, interface 202 may reactivate the _sink signal after a period of time to inform output streamer interface 201 that data transmission may continue. Concurrent with reactivation of the _sink signal, input streamer interface 202 reads the first portion of data (that was previously left unread) and any subsequent portions of data placed on the _data signals in subsequent clock cycles.
As illustrated, in a first clock cycle, Data0 is transmitted, while the _sink, _src and _first signals are high and the _last signal is low. In the second clock cycle, Data1 is transmitted, and the _sink and _src signals are high and the _first and _last signals are low. The _sink signal is suspended by deactivating the signal for the third clock cycle, and reactivating the signal for the fourth clock cycle. Accordingly, Data2 is transmitted over the third and fourth clock cycles. As illustrated, in the third clock cycle, the _sink, _first and _last signals are low and the _src signal is high. In the fourth clock cycle, the _sink signal is returned to a high state. Data transmission concludes with the activation of the _last signal and simultaneous transmission of DataN. On completion of the transmission, the _sink, _src and _last signals are all high.
Input streaming hardware component 801 also includes combinational and sequential logic 804 for receiving the FIFO data from buffer 803, arranging the data in a format used by the second protocol. Hardware component 801 is then configured to connect to one or more receiving devices using the second protocol, and to forward the data received from buffer 803 to the one or more receiving devices using the second protocol. In some aspects, an output streamer component may be implemented in a manner as the input streamer depicted by
System 1000 further includes a Core/Memory Logic block (“core logic block”) 1003 between the IS and OS blocks. Core logic block 1003 may include a flash memory device, processor, or other device configured to send and/or receive data to and/or from another device. In some implementations, the IS and OS streamer blocks may sit between two cores (or two memory devices). In these implementations, a determination as to which of the IS/OS blocks are used may depend upon the direction of data flow. According to various aspects of the subject technology, IS block 1001, OS block 1002, and core logic block 1003 may be implemented on multiple dies or the same die.
In the depicted example, core logic block 1003 is situated between the IS and OS blocks and interfaces with five signals at the input side and three signals at the output side. Core logic block 1003 is configured to receive and send data. When configured between IS block 1001 and OS block 1002, core logic block 1003 operates to forward data from the IS to the OS, after internally processing the input data. In this example, the IS and OS blocks do not force data movement through core logic block 1003. Instead, core logic block 1003 functions to stop and start the transfer at every clock cycle. Core logic block 1003 may have a latency that depends on the pipeline delay involved.
In some implementations, IS block 1001 is operably connected to core logic block 1003 on an input side of core logic block 1003. In one aspect, IS block 1001 is configured to determine that a receiving device will receive data transmitted by core logic block 1003. The determination may be made by detection of an output streamer ready signal provided by OS block 1002 and a core ready signal provided by core logic block 1003. The output streamer ready signal and core ready signal are provided when there is no back pressure indicated from downstream devices. In the depicted example, the output streamer ready signal and core ready signal are fed into an AND gate 1004, the output of which is read by IS block 1001 to determine that the downstream devices are ready to receive data. AND gate 1004 may be part of IS block 1001 or implemented as a separate component.
Based on the previously described determination, IS block 1001 is configured to inform a data source operably connected to IS block 1001 that the IS block is ready to receive data. IS block 1001 then monitors the data signals for the activation of a source signal and a data initiation signal. On detecting both of these signals, IS block 1001 is configured to receive source data transmitted by the data source. In some aspects, the source data is transmitted over a data bus that includes one or more of the previously described _data signals. Once received, IS block 1001 is configured to be forwarded the source data to the core logic block 1003 (e.g., over one or more of the depicted is s_data signals).
In some implementations, OS block 1002 is operably connected to core logic block 1003 on an output side of core logic block 1003. In one aspect, OS block 1002 is configured to determine that a downstream receiving device 1005 will accept data transmitted by core logic block 1003. Such determination may be made using any of the previously described techniques. OS block 1002 is further configured to inform IS block 1001 that receiving device 1005 will accept and/or receive data transmitted by the hardware device. In this regard, IS block 1001 may activate the previously described output streamer ready signal to AND gate 1004. OS block 1002 then receives the source data from core logic block 1003 in the manner previously described with reference to
During the forth and fifth clock cycles, data D0 and D2 are transferred from IS block 1001 over an internal data bus (represented by iss_data) to a buffer of core logic block 1003 (represented by core_data). IS block 1001 is signaled on the sixth clock cycle that OS block 1002 is not ready to receive data, by the oss_ready signal falling to low. Accordingly, on the seventh cycle, IS block 1001 stores the current data D2 on the data bus to a local buffer. In implementations such as that of the depicted example, signaling between adjacent components in a chain may take one or more clock cycles to propagate. For example, AND gate 1004 receives, on the sixth clock cycle, an indication (via the oss_ready signal) that OS block 1002 is not ready to receive data. On the seventh clock cycle that indication is communicated to IS block 1001 via the core_ready signal. In this example, core logic block 1003 is one cycle behind OS block 1002. After a two-cycle delay, on the ninth cycle, IS block 1001 transfers the data stored in its local buffer to core logic block 1003.
The objective is to drive a core_ready signal high to get a next data sample from the IS block when core logic block 1003 and OS block 1002 are ready to receive a new sample. The input signal iss_data_valid is used in some implementations to drive a data pipeline of core logic block 1003 for simplicity, but it is not mandatory and user can choose his/her own method of operation.
In the previously described example, issues may arise when OS block 1002 experiences back pressure (e.g., when the _sink signal goes low) or when core logic block 1003 is busy and unable to accept new input data. The first situation is depicted in the “Sink Break” waveforms illustrated in
If the _src signal from IS block 1001 goes low, then the iss_data_valid signal will also go low after a few clock cycles. In this situation, the _src signal breaks and core logic block 1003 simply wait for the iss_data_valid signal to go high. When core logic block 1003 needs additional cycles and cannot accept new input data, then it will drive the core_ready signal low and store a next sample in the local buffer. This local buffer is used to immediately send the sample to the OS block when core logic block 1003 is ready.
Block 1203 determines whether a data initiation signal has been detected. The data initiation signal indicates that a data transfer is underway. Until the initiation signal is detected, the input streamer continues to monitor the data source. The data initiation signal may be a pulse signal wherein the signal is activated only for a brief amount of time (e.g., one clock cycle), after which detection of the data initiation signal is no longer required. In block 1204, on receiving the data initiation signal in conjunction with the source signal, source data is read from a data bus. In the depicted example of
Those of skill in the art would appreciate that the various illustrative blocks, modules, elements, components, methods, and algorithms described herein may be implemented as electronic hardware, computer software, or combinations of both. To illustrate this interchangeability of hardware and software, various illustrative blocks, modules, elements, components, methods, and algorithms have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application. Various components and blocks may be arranged differently (e.g., arranged in a different order, or partitioned in a different way) all without departing from the scope of the subject technology.
It is understood that the specific order or hierarchy of steps in the processes disclosed is presented as an illustration of some exemplary approaches. Based upon design preferences and/or other considerations, it is understood that the specific order or hierarchy of steps in the processes can be rearranged. For example, in some implementations some of the steps can be performed simultaneously.
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. The previous description provides various examples of the subject technology, and the subject technology is not limited to these examples. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Pronouns in the masculine (e.g., his) include the feminine and neuter gender (e.g., her and its) and vice versa. Headings and subheadings, if any, are used for convenience only and do not limit the invention.
The predicate words “configured to”, “operable to”, and “programmed to” do not imply any particular tangible or intangible modification of a subject, but, rather, are intended to be used interchangeably. For example, a processor configured to monitor and control an operation or a component may also mean the processor being programmed to monitor and control the operation or the processor being operable to monitor and control the operation. Likewise, a processor configured to execute code can be construed as a processor programmed to execute code or operable to execute code.
A phrase such as an “aspect” does not imply that such aspect is essential to the subject technology or that such aspect applies to all configurations of the subject technology. A disclosure relating to an aspect may apply to all configurations, or one or more configurations. An aspect may provide one or more examples. A phrase such as an aspect may refer to one or more aspects and vice versa. A phrase such as an “embodiment” does not imply that such embodiment is essential to the subject technology or that such embodiment applies to all configurations of the subject technology. A disclosure relating to an embodiment may apply to all embodiments, or one or more embodiments. An embodiment may provide one or more examples. A phrase such as an “embodiment” may refer to one or more embodiments and vice versa. A phrase such as a “configuration” does not imply that such configuration is essential to the subject technology or that such configuration applies to all configurations of the subject technology. A disclosure relating to a configuration may apply to all configurations, or one or more configurations. A configuration may provide one or more examples. A phrase such as a “configuration” may refer to one or more configurations and vice versa.
The word “exemplary” is used herein to mean “serving as an example or illustration.” Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs.
This application claims priority from U.S. Provisional Application No. 61/580,113, entitled “STREAMING PROTOCOL FOR SSD APPLICATION” and filed Dec. 23, 2011, the subject matter of which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61580113 | Dec 2011 | US |