This disclosure relates to data flow control in a data storage system.
A conventional data storage system may include one device capable of bidirectional communication with another device. One device may include a computer node having a host bus adapter (HBA). The other device may be a mass storage device. A variety of intermediate devices such as expanders, bridges, routers, and switches may also be utilized in the data storage system to facilitate coupling and communication between a plurality of HBAs and mass storage devices. The HBA and mass storage device may each function as a transmitting and receiving device in order to exchange data and/or commands with each other using one or more of a variety of communication protocols. A protocol engine having a transmitting and receiving portion may be utilized to facilitate such communication. The receiving portion of the protocol engine may include a receive buffer that accepts data from any variety of transmitting devices and provides such data to memory.
In one prior art embodiment, the receive buffer may have one fixed threshold level, e.g., a fixed high threshold level. When the total amount of data in the receive buffer exceeds the fixed high threshold level, a hold type command may be sent from the receiving device to the transmitting device instructing the transmitting device to hold transmission of additional data. In response to such a hold command, the transmitting device may send a command acknowledging such command. A certain amount of time may expire, and a certain amount of data may be received, in an interim time interval from when the receiving device sends the hold type command until an acknowledgement of such command is received by the receiving device. The fixed high threshold level of the receive buffer may be fixed at a level to allow enough remaining space in the receive buffer to accept a worst case or largest amount of data as defined by the communication protocol during this interim time interval. This may lead to wasted space in the receive buffer since the worst case amount of data received during this interim time may rarely happen in actual data storage systems.
In addition, the receiving device may issue a command to the transmitting device to start sending data again as soon as the data level in the receive buffer is less than the fixed high threshold level. However, the data accumulated in the receive buffer may then quickly exceed the fixed high threshold level causing another hold command to be sent by the receiving device. The receiving device may then issue conflicting commands to hold transmission of additional data and to send additional data as accumulated data in the receive buffer varies from a level slightly below the fixed high threshold level to the fixed high threshold level resulting in data flow inefficiencies.
Features and advantages of embodiments of the claimed subject matter will become apparent as the following Detailed Description proceeds, and upon reference to the Drawings, where like numerals depict like parts, and in which:
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications, and variations thereof will be apparent to those skilled in the art. Accordingly, it is intended that the claimed subject matter be viewed broadly.
Such communication between the HBA and mass storage 104 may take place by transmission of one or more frames. As used herein in any embodiment, a “frame” may comprise one or more symbols and/or values. Both the HBA 120 and mass storage 104 may act as a receiving device that receives data and/or commands from the other. Each of the HBA 120 and mass storage 104 may have protocol engine circuitry 150a, 150b to facilitate such communication. As used herein, “circuitry” may comprise, for example, singly or in any combination, hardwired circuitry, programmable circuitry, state machine circuitry, and/or firmware that stores instructions executed by programmable circuitry.
The data storage system 100 may also generally include a host processor 112, a bus 122, a user interface system 116, a chipset 114, system memory 121, a circuit card slot 130, and a circuit card 120 capable of communicating with mass storage 104. The host processor 112 may include one or more processors known in the art such as an Intel® Pentium® IV processor commercially available from the Assignee of the subject application. The bus 122 may include various bus types to transfer data and commands. For instance, the bus 122 may comply with the Peripheral Component Interconnect (PCI) Express™ Base Specification Revision 1.0, published Jul. 22, 2002, available from the PCI Special Interest Group, Portland, Oreg., U.S.A. (hereinafter referred to as a “PCI Express™ bus”). The bus 122 may alternatively comply with the PCI-X Specification Rev. 1.0a, Jul. 24, 2000, available from the aforesaid PCI Special Interest Group, Portland, Oreg., U.S.A. (hereinafter referred to as a “PCI-X bus”).
The user interface system 116 may include one or more devices for a human user to input commands and/or data and/or to monitor the system 100 such as, for example, a keyboard, pointing device, and/or video display. The chipset 114 may include a host bridge/hub system (not shown) that couples the processor 112, system memory 121, and user interface system 116 to each other and to the bus 122. Chipset 114 may include one or more integrated circuit chips, such as those selected from integrated circuit chipsets commercially available from the assignee of the subject application (e.g., graphics memory and I/O controller hub chipsets), although other integrated circuit chips may also, or alternatively be used. The processor 112, system memory 121, chipset 114, bus 122, and circuit card slot 130 may be on one circuit board 132 such as a system motherboard.
The circuit card 120 may be constructed to permit it to be inserted into the circuit card slot 130. When the circuit card 120 is properly inserted into the slot 130, connectors 134 and 137 become electrically and mechanically coupled to each other. When connectors 134 and 137 are so coupled to each other, the card 120 becomes electrically coupled to bus 122 and may exchange data and/or commands with system memory 121, host processor 112, and/or user interface system 116 via bus 122 and chipset 114.
Alternatively, without departing from this embodiment, the operative circuitry of the circuit card 120 may be included in other structures, systems, and/or devices. These other structures, systems, and/or devices may be, for example, in the motherboard 132, and coupled to the bus 122. These other structures, systems, and/or devices may also be, for example, comprised in chipset 114.
The circuit card 120 may communicate with mass storage 104 via one or more communication links 106 using one or more communication protocols. One exemplary communication protocol may include Serial Advanced Technology Attachment (S-ATA). If a S-ATA protocol is used by circuit card 120 to exchange data and/or commands with mass storage 104, it may comply or be compatible with the protocol described in “Serial ATA: High Speed Serialized AT Attachment,” Revision 1.0a, published on Jan. 7, 2003 by the Serial ATA Working Group and/or later-published versions. Another exemplary protocol may include the Serial Attached Small Computer Systems Interface (SAS) protocol. If a SAS protocol is used, it may comply or be compatible with the protocol described in “Information Technology-Serial Attached SCSI-1.1 (SAS),” Working Draft American National Standard of International Committee For Information Technology Standards (INCITS) T10 Technical Committee, Project T10/1562-D, Revision 1, published Sep. 18, 2003, by American National Standards Institute (hereinafter termed the “SAS Standard”) and/or later-published versions of the SAS Standard.
To accomplish such communication, the circuit card 120 may have protocol engine circuitry 150a. The protocol engine circuitry 150a may exchange data and commands with mass storage 104 by transmission and reception of one or more frames, e.g., frames 170, 172. A large number of frames from many different devices such as mass storage devices and HBAs may be transmitted via communication links 106. The protocol engine circuitry 150a may be included in an integrated circuit (IC) 140. As used herein, an “integrated circuit” or IC means a semiconductor device and/or microelectronic device, such as, for example, a semiconductor integrated circuit chip.
The protocol engine circuitry 150a may include a receive buffer 208, buffer control circuitry 206, link layer circuitry 214, and PHY layer circuitry 209. The protocol engine circuitry 150a may also include other circuitry such as data transport layer circuitry and port layer circuitry (not illustrated) to further facilitate communication using the appropriate protocol. The receive buffer 208 may be considered a mid-point holding place for data and the buffer control circuitry 206 may control storage of data in, and retrieval of data from, the receive buffer 208. In one embodiment, the receive buffer 208 may be a first-in, first-out (FIFO) buffer.
Data output from the receive buffer 208 may be provided to memory 210. The memory 210 may include one or more machine readable storage media such as random-access memory (RAM), dynamic RAM (DRAM), static RAM (SRAM) magnetic disk (e.g. floppy disk and hard drive) memory, optical disk (e.g. CD-ROM) memory, and/or any other device that can store information. The PHY layer circuitry 209 may comprise a physical PHY containing transceiver circuitry to interface to the applicable communication link. The PHY circuitry 209 may alternately and/or additionally comprise a virtual PHY to interface to another virtual PHY or to a physical PHY.
Processor circuitry 212 may include processor core circuitry that may comprise a plurality of processor cores. As used herein, a “processor core” may comprise hardwired circuitry, programmable circuitry, and/or state machine circuitry. Machine readable program instructions may be stored in any variety of machine readable media, e.g., the processor core may have a set of micro-code program instructions that may be executed by the processor circuitry 212, such that when such instructions are executed by the processor circuitry 212 it results in the processor circuitry 212 performing operations described herein. In addition, such program instructions, e.g., machine-readable firmware program instructions, may be stored in other memory locals that may be accessed and executed by the integrated circuit 140 to perform operations described herein.
Processor bus 216 may allow exchange of data and/or commands between at least the processor circuitry 212 and the buffer control circuitry 206. Additional components (not illustrated) may also be coupled to the processor bus 216. The integrated circuit 140 may also include additional components (not illustrated) such as bridge circuitry to bridge the processor bus 216 with an I/O bus. Host interface circuitry (not illustrated) may couple the I/O bus with the bus 122 of the system of
The hold type command takes time to reach the remote transmitting node based, at least in part, on the transmission rate and the location of the transmitting node. In addition, there may be an additional delay from the time the remote node receives the hold command until the remote node responds to the hold command by sending an acknowledgement command which suspends transmission of additional data. Therefore, data may continue to accumulate in the receive buffer 208 as indicated by arrow 421. For example, in S-ATA such acknowledgement may be the HOLDA primitive. Such HOLDA primitive may be sent by the remote transmitting node as long as the HOLD primitive is received from the receiving node.
Once the hold acknowledge command is received as illustrated in
Eventually, the data level in the receive buffer 208 may decrease until it reaches the adjustable low threshold level 304 (
The adjustable high and low threshold levels 302, 304 may be adjusted manually or automatically depending on any variety of factors. For a manual adjustment, a user may utilize the user interface system 116 to input commands to set the adjustable high and/or low threshold levels 302, 304 at desired levels. To accomplish such a manual adjustment, a program may be written and stored in any variety of storage medium that, based upon commands entered by the user, adjusts the high and/or low threshold levels 302, 304. The buffer control circuitry 206 may then be responsive to such a program to instruct the link layer circuitry 214 to issue a hold type command when the buffer control circuitry 206 recognizes the data level in the receive buffer 208 reached the level specified by the user as the high threshold level 302.
The adjustable high and low threshold levels 302, 304 may also be adjusted automatically based on an automatic adjustment algorithm. A user may select the automatic adjustment option to allow the algorithm to set the high and/or low threshold levels 302, 304. In general, the automatic adjustment algorithm may base decisions on how to set the threshold levels 302, 304 based on any variety of factors to dynamically adjust the high and/or low level threshold levels.
The high level threshold level 302 may be dynamically adjusted based on any factor that may impact the overall latency period from the time the receiving node sends a hold type command to the transmitting node until the receiving node receives an acknowledgement command from the transmitting node, e.g., time interval Δt1 between
The low threshold level 304 may also be dynamically adjusted based on any variety of factors. The low threshold level may be adjusted to a level so that the receiving device is delayed from sending a receive type command to the remote transmitting node until the adjustable low threshold level is reached. Therefore, the low threshold level may be adjusted to a level less than the adjustable high threshold level. The factors that may be considered in selecting the low threshold level may include, but not be limited to, actual history of round trip delay times for particular transmitting nodes and/or actual amounts of data received during those times, transmission rates, distance of the transmitting node from the receiving node, and the status of whether data is being emptied from the receive buffer 208 and at what rate.
It will be appreciated that the functionality described for all the embodiments described herein, including the automatic adjustment algorithm, may be implemented using hardware, firmware, software, or a combination thereof.
Thus, in summary, one embodiment may comprise an apparatus. The apparatus may comprise circuitry capable of receiving data in a receive buffer, and sending a hold command to a transmitting node currently sending data to hold transmission of additional data when a level of the data in the receive buffer reaches an adjustable high threshold level.
Another embodiment may comprise an article. The article may comprise circuitry comprising a receive buffer to receive data, the receive buffer having a high threshold level. The circuitry is capable of sending a hold command to a transmitting node sending data to hold transmission of additional data when a level of the data in the receive buffer reaches the high threshold level. The article may further comprise a storage medium having stored therein instructions that when executed by a machine results in the following: adjusting the high threshold level.
A system embodiment may comprise a circuit card comprising an integrated circuit. The integrated circuit may comprise circuitry capable of receiving data in a receive buffer, and sending a hold command to a transmitting node currently sending data to hold transmission of additional data when a level of the data in said receive buffer reaches an adjustable high threshold level.
Advantageously, in these embodiments, the adjustable high level threshold level 302 of the receive buffer 208 enables a system designer to tune any particular system to improve data flow control performance. For example, the adjustable high threshold level 302 may be raised compared to a prior art embodiment having a lower fixed high threshold level such that the probability of entering a hold type state, e.g., transmission of a HOLD primitive and receipt of a HOLDA primitive, is minimized and hence line utilization and efficiency is improved. For instance, the current SAS standard for Serial Advanced Technology Attachment (ATA) Tunneled Protocol (STP) flow control specifies that a fixed high threshold level should be set to allow 24 Dwords of data at 1.5 gigabits per second (Gbps) and 28 Dwords of data at 3.0 Gbps to be received during the elapsed time interval Δt1 (see
In addition, a low level threshold level 304 may be added to the receive buffer 208. This solves the problem that a receive buffer with only a fixed high threshold level may encounter if its data level fluctuates over short time periods from a level slightly below the fixed high threshold level to the fixed high threshold level. In such a situation for the receive buffer with only a fixed high threshold level, the link layer circuitry 214 would quickly flip flop between providing hold and receive type commands resulting in reduced link efficiency. The adjustable nature of the low threshold level 302 provides additional tuning ability to a system designer.
The terms and expressions which have been employed herein are used as terms of description and not of limitation, and there is no intention, in the use of such terms and expressions, of excluding any equivalents of the features shown and described (or portions thereof), and it is recognized that various modifications are possible within the scope of the claims. Other modifications, variations, and alternatives are also possible. Accordingly, the claims are intended to cover all such equivalents.