Embodiments described herein relate generally to a storage system, in particular, a storage system that includes a plurality of routing circuits and a plurality of node modules connected thereto.
A storage system of one type is connected to a plurality of clients and stores data in accordance with requests received from the clients. The storage system may include a plurality of non-volatile memories such as flash memories for the data storage. However, if a plurality of accesses is concentrated on particular one of the non-volatile memories, congestion of data traffic may occur in a communication path from an interface which receives a request from the client to the non-volatile memory, and a writing performance of the storage system may be compromised.
According to an embodiment, a storage device includes a storage unit having a plurality of routing circuits networked with each other, each of the routing circuits configured to route packets to a plurality of node modules that are connected thereto, each of the node modules including nonvolatile memory, and a plurality of connection units, each communication with one or more of the routing circuits for communication therewith, and configured to access each of the node modules through one or more of the routing circuits. Each of the connection units is configured to transmit an inquiry to a target node module, to initiate a write operation, and determine whether or not to transmit a write command based on a notice returned by the target node module in response to the inquiry.
Embodiments of a storage system will be described below, with reference to the drawings.
The storage system 100 may include a system manager 110, a power supplying unit (PSU) 120, a battery backup unit (BBU) 130, connection units (CUs) 140-1 to 140-n (n: arbitrary natural number), node modules (NMs) 150, a routing circuit (RC) 160, and an interface 170, but not limited thereto. Hereinafter, if each of the CU is not distinguished, each of them is simply described as a CU 140.
The system manager 110 may be implemented by a processor such as a CPU (central processing unit) which executes a program stored in a program memory. The system manager 110 may be also implemented in hardware such as a large scale integration (LSI) and an ASIC application specific integrated circuit (ASIC) which has the same function as the processor which executes the program. For example, the system manager 110 records a status of the CU 140, resets, and manages a power source.
The PSU 120 converts an external power voltage, which is supplied from an external power source, to a predetermined direct voltage, and the PSU 120 supplies the direct voltage to components of the storage system 100. For example, the external power source is an alternating-current power source of which voltage is 100 [V] or 200 [V].
The BBU 130 includes a secondary battery, and accumulates electric power which is supplied from the PSU 120. If the storage system 100 is electrically disconnected from the external power source, the BBU 120 supplies an auxiliary power voltage to components of the storage system 100. A node controller (NC) 151 of the NM 150, which will be described below, performs backup for protecting data using the auxiliary power voltage.
The CU 140 is a connector which is connectable to one or more client 200-1 to 200-n (n: arbitrary natural number). Hereinafter, if each the client is not distinguished, each of them is simply described as a client 200. The client 200 is used by a user of the storage system 100. The client 200 transmits, to a CU 140, a command such as a read command, a write command, and a remove command with respect to the storage system 100. The CU 140 receives these commands, and transmits a request, which corresponds to a received command, to the NM 150 of which address corresponds to address information included in the command, via a communication network of the RCs 160, which will be described below. The CU 140 obtains data, which are requested by a read request, from the NM 150, and transmits the obtained data to the client 200.
The NM 150 includes a non-volatile memory. The NM 150 is a storage which stores data in accordance with an instruction from the client 200. A configuration of the NM 150 will be described below.
For example, the storage system 100 includes a plurality of RCs 160 arranged in a matrix configuration. The matrix is an arrangement in which the composition elements are arranged in a first direction and a second direction which is perpendicular to the first direction. A torus routing is an arrangement, described below, in which the NMs 150 are connected in a torus form.
The RC 160 transmits a packet, which includes data transmitted from the CU 140 or another RC 160, by using a mesh-shaped network. The mesh-shaped network is a network which is formed into a mesh shape or a grid shape. Specifically, the mesh-shaped network is a network in which the RCs 160 are arranged at intersections where vertical lines and horizontal lines intersect. The vertical lines and horizontal lines are communication paths. Each of the RCs 160 includes two or more RC interfaces 161. The RC 160 is electrically connected to each of one or more adjacent RCs 160 via the RC interface 161.
The system manager 110 is electrically connected to the CUs 140 and the RCs 160 of desired number. Each of the NMs 150 is electrically connected to adjacent NMs 150 via the RC 160 and a packet management unit (PMU) 180, which will be described below, and configures the NMs 150 as a RAID (redundant array of inexpensive disks).
Each of the NMs 150 is connected to NMs 150 adjacent in two or more directions. For example, the NM 150 (0, 0) positioned at the upper-left corner is connected, via the RC 160, to the NM 150 (1,0) which is adjacent in the X direction, the NM 150 (0,1) which is adjacent in the Y direction different from the X direction, and the NM 150 (1,1) which is adjacent in a diagonal direction.
In
The torus form is a connection form in which the NMs 150 are circularly connected and at least two paths exist as paths from one NM 150 to another NM 150. The two paths include a first path in a first direction and a second path in a direction opposite to the first direction.
In
The number of the CUs 140 can be arbitrarily selected. Each of the CUs 140 may be connected to a plurality of the RCs 160, and each of the RCs 160 may be connected to a plurality of the CUs 140.
The interface 170 connects the system manager 110 and a manager terminal 300. The manager terminal 300 is a terminal device used by an administrator that manages the storage system 100. The manager terminal 300 provides an interface such as a GUI (Graphical User Interface) to the administrator. The manager terminal 300 transmits, to the system manager 110, an instruction with respect to the storage system 100.
The processor 141 performs various types of processes by executing an application program, using the CU memory 144 as a work area. The first network interface 142 is a connection interface which is connected to the client 200. The second network interface 143 is a connection interface which is connected to the system manager 110. The CU memory 144 is a memory which temporarily stores data. For example, the CU memory 144 is a RAM, but various types of memories may be used. The CU memory 144 may include a plurality of memories. The PCIe interface 145 is a connection interface which is connected to the RC 160.
For example, each of addresses of the four FPGAs 0 to 3 are represented as (000, 000), (010, 000), (000, 010), and (010, 010), using binary numbers.
One RC 160 and four NMs, which are in each of the FPGAs, are electrically connected to the RC interface 161 via the PMU 180 which will be described below. During a data transmission operation, the RC 160 performs routing with reference to addresses x and y of an FPGA address.
Four PMUs 180 are disposed with respect to the four NMs 150, and one PMU 180 is disposed with respect to the PCIe interface 181. Each of the four PMUs 180 analyzes a packet which is transmitted from the CU 140 and the RC 160. Each of the four PMUs 180 determines whether or not a coordinate (relative node address) included in the packet corresponds to an own coordinate (relative node address). If the coordinate included in the packet corresponds to the own coordinate, the PMU 180 directly transmits the packet to the corresponding NM 150. On the other hand, if the coordinate included in the packet does not correspond to the own coordinate (in a case of another coordinate), the PMU 180 transmits the determination to the RC 160.
For example, if a node address of a final destination is (3, 3), the PMU 180, which is connected to the node address (3, 3), determines that the coordinate (3, 3) described in the analyzed packet corresponds to the own coordinate (3, 3). Then, the PMU 180, which is connected to the node address (3, 3), transmits the analyzed packet to the NM 150 of the node address (3, 3) which is connected thereto. The transmitted packet is analyzed by the NC 151 (described below) of the NM 150. Thereby, the FPGA performs processing in accordance with a request described in the packet. For example, the FPGA stores the data in the non-volatile memory disposed in the NM 150 by using the NC 151.
The PCIe interface 181 transmits a request and a packet, which are from the CU 140, to the PMU 180. The RC 160 analyzes the request and the packet stored in the PMU 180. The RC 160 may transmit the request and the packet to another RC 160 in accordance with a result of the analysis.
The NC 151 is electrically connected to the PMU 180. The NC 151 receives a packet from the CU 140 or another NM 150 via the PMU 180. The NC 151 transmits a packet to the CU 140 or another NM 150 via the PMU 180. The NC 151 performs processing in accordance with a request included in the packet which is received from the PMU 180. For example, if the request included in the packet is an access request (read request or write request), the NC 151 accesses the NM first memory 152.
For example, the NM first memory 152 may be a NAND-type flash memory, a bit cost scalable memory (BiCS), a magnetoresistive random access memory (MRAM), a phase change random access memory (PcRAM), a resistance random access memory (RRAM®), or a combination thereof.
The NM second memory 153 is not a non-volatile memory, and temporarily stores data. The NM second memory 153 may be various type of RAM such as a dynamic random access memory (DRAM). If the NM first memory 152 functions as a working area, the NM second memory 153 may not be disposed in the NM 150.
In general, the NM first memory 152 is non-volatile memory and the NM second memory 153 is volatile memory. Further, in one embodiment, the read/write performance of the NM second memory 153 is better than that of the NM first memory 152.
In this way, the RC 160 is connected to the RC interface 161, and the RC 160 is connected to the NM 150 via the PMU 180. Thereby, the communication network of the RCs 160 is formed, but limited thereto. For example, the communication network may be formed by directly connecting each of the NMs 150 without using the RC 160.
An interface standard used in the storage system according to the present embodiment is described below. In the present embodiment, following standards can be employed for the interface which electrically connects the components described above.
First, a low voltage differential signaling (LVDS) standard can be employed for the RC interface 161 which connects the RCs 160. A PCIe (PCI Express) standard can be employed for the RC interface 161 which electrically connects the RC 160 and the CU 140. These interface standards are examples. If necessary, another interface standard can be employed.
In the header area HA, for example, an address (from_x, from_y) of the x and y directions of a source and an address (to_x, to_y) of the x and y directions of a destination are described. In the payload area PA, for example, a command and data are described. A data size of the payload area PA is changeable. In the redundant area RA, for example, a CRC (Cyclic Redundancy Check) code is described. The CRC code is a code (information) for detecting an error of data in the payload area PA.
The RC 160, which receives the packet having the components shown in
For example, in accordance with the transfer algorithm, the RC 160 determines, as a transfer destination, a NM 150 which is positioned along a path through which a number of transfer of the packet from the own NM 150 to the final destination is minimum. In accordance with the transfer algorithm, if there is a plurality of paths along which the number of transfer of the packet from the own NM 150 to the final destination is minimum, the RC 160 selects one of the paths using an arbitrary method. If a NM 150 positioned along the path through which the number of transfer is minimum is broken down or busy, the RC 160 changes the transfer destination to another NM 150.
Because the NMs 150 are logically connected to form the mesh-shaped network, a plurality of paths through which the number of transfer of the packet is minimum may exist. In this case, if a plurality of packets of which destination is a same particular NM 150 is output, the output packets are dispersedly transmitted through different one of the plurality of paths in accordance with the transfer algorithm. Therefore, concentration of access on a particular NM 150 can be avoided, and reduction of throughput of the entire storage system 100 can be suppressed.
The NM 150-8 writes data in the NM first memory 152 thereof based on a write request W1 which is transmitted from the CU 140-3. If the NM 150-8 receives a new write request, the NM 150-8 temporarily stores the received write request in the NM second memory 153 thereof. If a plurality of write requests is stored in the NM second memory 153 of the NM 150-8 and the NM 150-8 cannot receive further write requests, the write requests are stored in the PMU 180 of the FPGA which is adjacent to the NM 150-8. The non-received requests may cause congestion in communication paths from each the CUs 140 to the NM 150-8, and a writing performance of the storage system 100 may be compromised.
For the reason, in the present embodiment, for example, if each of the CUs 140-1, 140-2, 140-4, and 140-5 is to transmit a write request to the NM 150-8, each of the CUs 140-1, 140-2, 140-4, and 140-5 transmits, to the NM 150-8, a verification packet P1 for verifying a load of the NM 150-8 before transmitting the write request. The verification packet P1 contains content shown in
If the NM 150-8 determines that the number of the write requests, which are stored in the NM second memory 153, is less than a reference value (if the load of the NM 150-8 is less than a reference value), the NM 150-8 generates a response packet P2 which indicates that a transmission of the write request is accepted (OK). On the other hand, if the NM 150-8 determines that the number of the write requests, which are stored in the NM second memory 153, is equal to or more than the reference value (if the load of the NM 150-8 is equal to or more than the reference value), the NM 150-8 generates a response packet P2 which indicates that a transmission of the write request is not accepted (NG).
The NM 150-8 transmits the generated response packets P2 to the CUs 140-1, 140-2, 140-4, and 140-5, which are sources of the verification packet P1. Each of these CUs 140 verifies the load of the NM 150-8 in accordance with the response packet P2 which is received from the NM 150-8.
The response packet P2 has the data components shown in
If each of the CUs 140-1, 140-2, 140-4, and 140-5 receives the response packet P2 which indicates that a transmission of a write request is accepted (OK), each of the CUs 140-1, 140-2, 140-4, and 140-5 transmits a write request. For example, the CU 140-1 generates a write request and transmits the write request to the NM 150-1. If the NM 150-1 received the write request, the NM 150-1 transmits, to the NM 150-6, the write request having a destination address of the NM 150-8. If the NM 150-6 received the write request from the NM 150-1, the NM 150-6 transmits the write request to the NM 150-7. If the NM 150-7 received the write request from the NM 150-6, the NM 150-7 transmits the write request to the NM 150-8. If the NM 150-8 receives the write request from the NM 150-7, the NM 150-8 stores the data into the NM first memory 152 of the NM 150-8.
The verification packet P1 and the response packet P2 are smaller in data size than the write request. Each of the NMs 150 has a storage area for storing data having the destination address, and each of the NMs 150 has a limited number of write requests that each of the NMs 150 accepts, in order to reserve an area for storing the verification packet P1 and the response packet P2 in the storage area. Thereby, even if congestion occurs in the communication network of the RCs 160, the NM 150 can transmit the verification packet P1 and the response packet P2 without delay.
If the CU 140-1 receives a write command for writing data from the client 200, the CU 140-1 transmits a verification packet P1 to the NM 150 which is a destination of the data (step S10). If the NM 150 receives the verification packet P1 from the CU 140-1, the NM 150 determines whether or not the number of write requests, which are stored in the NM second memory 153 of the NM 150, is less than the reference value (whether or not the load of the NM 150 is less than the reference value).
If the NM 150 determines that the number of write requests, which are stored in the NM second memory 153 of the NM 150, is less than the reference value, the NM 150 generates the response packet P2 which indicates that the write request is accepted (OK). Thereafter, the NM 150 transmits the generated response packet P2 to the CU 140-1 (step S11).
If the CU 140-1 receives the response packet P2 which indicates that the write request is accepted (OK), from the NM 150, the CU 140-1 generates a write request for instructing the NM 150 to write the data. Thereafter, the CU 140-1 transmits the generated write request to the NM 150 via the communication network of the RCs 160 (step S12). The NM 150 stores the write request, which is received from the CU 140-1, in the NM second memory 153 thereof, which functions as a temporary memory. And, the NM 150 writes the data into the NM first memory 152 thereof, which functions as a non-volatile memory, in accordance with the write request stored in the NM second memory 153.
On the other hand, if the CU 140-2 receives a write command for writing data from the client 200, the CU 140-2 transmits a verification packet P1 to the NM 150 which is a destination of the data (step S13). If the NM 150 receives the verification packet P1 from the CU 140-2, the NM 150 determines whether or not the number of requests stored in the NM second memory 153 of the NM 150 is less than the reference value (whether or not the load of the NM 150 is less than the reference value).
If the NM 150 determines that the number of write requests in the NM second memory 153 is equal to or greater than the reference value, the NM 150 generates the response packet P2 which indicates that the write request is not accepted (NG). Thereafter, the NM 150 transmits the generated response packet P2 to the CU 140-2 (step S14).
If the CU 140-2 receives the response packet P2 which indicates that the write request is not accepted (NG), from the NM 150, the CU 140-2 does not transmit, to the NM 150, a write request for instructing the NM 150 to write the data. Therefore, the CU 140-2 repeatedly transmits the verification packet P1 to the NM 150 until the CU 140-2 receives the response packet P2 which indicates that the write request is accepted (OK), from the NM 150.
If the NM 150 completes the data writing with respect to the write request received from the CU 140-1, the NM 150 transmits a write completion notice to the CU 140-1 (step S15). Thereafter, the NM 150 removes the write request of which data writing has been completed from the NM second memory 153.
On the other hand, the CU 140-2 transmits the verification packet P1 again to the NM 150 (step S16). If the NM 150 receives the verification packet P1 from the CU 140-2, the NM 150 determines whether or not the number of write requests in the NM second memory 153 is less than the reference value (whether or not the load of the NM 150 is less than the reference value).
If the NM 150 determines that the number write requests is less than the reference value, the NM 150 generates the response packet P2 which indicates that the write request is accepted (OK). Thereafter, the NM 150 transmits the generated response packet P2 to the CU 140-2 (step S17).
If the CU 140-2 receives the response packet P2 which indicates that the write request is accepted (OK) from the NM 150, the CU 140-2 generates a write request for instructing the NM 150 to write the data. Thereafter, the CU 140-2 transmits the generated write request to the NM 150 via the communication network of the RCs 160 (step S18). The NM 150 stores the write request received from the CU 140-2 in the NM second memory 153 thereof. Also, the NM 150 writes the data into the NM first memory 152 thereof, in accordance with the write request stored in the NM second memory 153.
If the NM 150 completes the data writing with respect to the write request received from the CU 140-2, the NM 150 transmits a write completion notice to the CU 140-2 (step S19). Thereafter, the NM 150 removes the write request of which data writing has been completed from the NM second memory 153.
If the NM 150 determines that the count value is not less than the upper limit value (No in step S22), the NM 150 generates the response packet P2 which indicates that the write request is not accepted (NG). Thereafter, the NM 150 transmits the generated response packet P2 to the CU 140 (step S23). On the other hand, if the NM 150 determines that the count value is less than the upper limit value, the NM 150 generates the response packet P2 which indicates that the write request is accepted (OK). Thereafter, the NM 150 transmits the generated response packet P2 to the CU 140 (step S24).
Thereafter, the NM 150 determines whether or not the NM 150 receives the write request from the CU 140 (step S25). If the NM 150 determines that the NM 150 does not receive the write request from the CU 140 (No in step S25), the process proceeds to the step S27. If the NM 150 determines that the NM 150 receives the write request from the CU 140 (Yes in step S25), the NM 150 adds 1 to the count value (step S26). The NM 150 stores the write request, which is received from the CU 140, in the NM second memory 153 which functions as a temporary memory. Also, the NM 150 writes the data into the NM first memory 152 which functions as a non-volatile memory, in accordance with the write request stored in the NM second memory 153.
Thereafter, the NM 150 determines whether or not the NM 150 completes the data writing to the NM first memory 152 (step S27). If the NM 150 determines that the NM 150 does not complete the data writing to the NM first memory 152 (No in step S27), the process returns to step S21. On the other hand, if the NM 150 determines that the NM 150 completes the data writing to the NM first memory 152 (Yes in step S27), the NM 150 transmits the write completion notice to the CU 140 (step S28). Next, the NM 150 removes the write request of which data writing has been completed from the NM second memory 153, and the NM 150 subtracts 1 from the count value (step S29). Thereafter, the process returns to step S21.
As described above, in the first embodiment, the CU 140 verifies that a load of the NM 150 is less than the reference value, and the CU 140 generates a write request for writing the data into the NM first memory 152 of the NM 150. Specifically, the CU 140 generates the verification packet P1 for verifying the load of the NM 150. The NM 150 receives the verification packet P1, and generates a response packet P2 to the verification packet P1. The CU 140 generates the write request in response to the response packet (OK) P2 accepting the request. Thereby, writing performance of the storage system 100 may not be compromised.
In the first embodiment, the CU 140 verifies that the load of the NM 150 of the write destination is less than the reference value, and transmits the write request to the NM 150. In contrast, in a second embodiment, the NM 150 performs a data write reservation, and the NM 150 transmits, to the CU 140, information indicating whether or not the reservation is accepted. Only if the reservation is accepted, the CU 140 transmits a write reservation to the NM 150. The “reservation” in the second embodiment means sequential operations in which the CU 140 transmits a reservation packet to the NM 150 and the CU 140 receives a reservation completion notice. The second embodiment is described below in detail.
If the CU 140-1 receives a write command for writing data from the client 200, the CU 140-1 transmits a reservation packet P3 to the NM 150 (step S30). The reservation packet P3 contains the content shown in
The reservation packet P3 is smaller in data size than the write request. The NM 150 limits a number of write requests that can be stored in a storage area of the NM second memory 153, in order to reserve an area for storing the reservation packet P3 in the storage area. Thereby, even if congestion occurs in the communication network of the RCs 160, the NM 150 can transmit the reservation packet P3 without delay.
If the NM 150 receives the reservation packet P3 from the CU 140-1, the NM 150 determines whether or not the number of data write reservations is less than a reference value. If the NM 150 determines that the number of data write reservations is less than the reference value, the NM 150 transmits a reservation completion notice to the CU 140-1, and the NM 150 adds 1 to a count value which indicates the number of data write reservations (step S31).
If the CU 140-1 receives the reservation completion notice from the NM 150, the CU 140-1 generates a write request for instructing the NM 150 to write the data. Thereafter, the CU 140-1 transmits the generated write request to the NM 150 via the communication network of the RCs 160 (step S32). The NM 150 stores the write request received from the CU 140-1 in the NM second memory 153, which functions as a temporary memory. And, the NM 150 writes the data into the NM first memory 152 which functions as a non-volatile memory, in accordance with the write request stored in the NM second memory 153.
On the other hand, if the CU 140-2 receives a write command for writing data from the client 200, the CU 140-2 transmits the reservation packet P3 to the NM 150 which is a destination of the data (step S33). If the NM 150 receives the reservation packet P3 from the CU 140-2, the NM 150 determines whether or not the number of data write reservations is less than the reference value. If the NM 150 determines that the number of data write reservations is equal to or more than the reference value, the NM 150 transmits a reservation unacceptable notice to the CU 140-2 (step S34). The reservation unacceptable notice indicates that the reservation is not accepted.
If the CU 140-2 receives the reservation unacceptable notice from the NM 150, the CU 140-2 does not transmit the write request to the NM 150. Instead, the CU 140-2 repeatedly transmits the reservation packet P3 to the NM 150 until the CU 140-2 receives the reservation completion notice from the NM 150.
If the NM 150 completes the data writing corresponding to the write request which is received from the CU 140-1, the NM 150 transmits a write completion notice to the CU 140-1 (step S35). Thereafter, the NM 150 removes the write request of which data writing has been completed from the NM second memory 153. Also, the NM 150 subtracts 1 from the count value which indicates the number of data write reservations.
On the other hand, the CU 140-2 transmits the reservation packet P3 again to the NM 150 (step S36). If the NM 150 receives the reservation packet P3 from the CU 140-2, the NM 150 determines whether or not the number of data write reservations is less than the reference value. If the NM 150 determines that the number of data write reservations is less than the reference value, the NM 150 transmits the reservation completion notice to the CU 140-2, and the NM 150 adds 1 to the count value which indicates the number of data write reservations (step S37).
If the CU 140-2 receives the reservation completion notice from the NM 150, the CU 140-2 generates a write request for instructing the NM 150 to write data. Thereafter, the CU 140-2 transmits the generated write request to the NM 150 via the communication network of the RCs 160 (step S38). The NM 150 stores the write request received from the CU 140-2 in the NM second memory 153. And, the NM 150 writes the data into the NM first memory 152, in accordance with the write request which is stored in the NM second memory 153.
If the NM 150 completes the data writing corresponding to the write request received from the CU 140-2, the NM 150 transmits a write completion notice to the CU 140-2 (step S39). Thereafter, the NM 150 removes the write request of which data writing has been completed from the NM second memory 153. Also, the NM 150 subtracts 1 from the count value which indicates the number of data write reservations.
If the NM 150 determines that the count value is not less than the upper limit value (No in step S52), the NM 150 transmits the reservation unacceptable notice to the CU 140 (step S53). On the other hand, if the NM 150 determines that the count value is less than the upper limit value (Yes in step S53), the NM 150 transmits the reservation completion notice to the CU 140 (step S54). Thereafter, the NM 150 adds 1 to the count value (step S55).
If the CU 140 receives the reservation completion notice from the NM 150, the CU 140 generates a write request for instructing the NM 150 to write the data. Thereafter, the CU 140 transmits the generated write request to the NM 150 via the communication network of the RCs 160. The NM 150 stores the write request received from the CU 140 in the NM second memory 153. Also, the NM 150 writes the data into the NM first memory 152, in accordance with the write request stored in the NM second memory 153.
Thereafter, the NM 150 determines whether or not the NM 150 completes the data writing to the NM first memory 152 (step S56). If the NM 150 determines that the NM 150 does not complete the data writing to the NM first memory 152 (No in step S56), the process returns to step S51. On the other hand, if the NM 150 determines that the NM 150 has completed the data writing to the NM first memory 152, the NM 150 transmits the write completion notice to the CU 140 (step S57). Next, the NM 150 removes the write request of which data writing has been completed from the NM second memory 153, and the NM 150 subtracts 1 from the count value (step S58). Thereafter, the process returns to step S51.
As described above, in the second embodiment, the CU 140 performs a write reservation of data with respect to the NM 150, and then generates a write request for writing the data into the NM first memory 152 of the NM 150. Specifically, the CU 140 generates a reservation packet P3, and the NM 150 determines whether or not the write reservation based on the reservation packet P3 is acceptable. The NM 150 generates a reservation acceptable notice, if the NM 150 determines that the write reservation is acceptable. The CU 140 generates a write request based on the reservation acceptable notice. The NM 150 generates a reservation unacceptable notice, if the NM 150 determines that the write reservation is unacceptable. The CU 140 re-generates a reservation packet based on the reservation unacceptable notice. Thereby, a writing performance of the storage system 100 may not be compromised.
In the second embodiment, the CU 140 may generate a reservation packet P3 for write reservation with respect to the NM 150, when the CU 140 verifies that the load of the NM 150 is less than the reference value. Thereby, the load of the NM 150 will not increase after the load is verified and before the write request is performed. Also, the number of write requests issued by the CUs 140 will not exceed the upper limit. Therefore, congestion will not occur in a communication path from the CU 140 to the NM 150, and the writing performance of the storage system 100 may not be compromised.
In a first embodiment, the CU 140 transmits the verification packet P1 to the NM 150. In the second embodiment, the CU 140 transmits the reservation packet P3 to the NM 150. In a third embodiment, the CU 140 does not transmit the verification packet P1 to the NM 150, but transmits the reservation packet P3 to the NM 150, and the NM 150 stores a reservation list for managing a reservation of write requests in the NM second memory 153. The “reservation” in the third embodiment means sequential operations in which the CU 140 transmits a reservation packet P3 to the NM 150 and the NM 150 registers a reservation of a write request with the reservation list. The third embodiment is described below in detail.
If the CU 140-1 receives a write command for writing data from the client 200, the CU 140-1 transmits a reservation packet P3 to the NM 150 which is a write destination (step S70). The NM 150 updates the reservation list stored in the NM second memory 153 in accordance with the reservation packet P3 from the CU 140-1. Specifically, the NM 150 registers, in the reservation list, the reservation of the write request corresponding to the received reservation packet P3.
If the CU 140-2 receives a write command for writing data from the client 200, the CU 140-2 transmits a reservation packet P3 to the NM 150 which is a write destination (step S71). The NM 150 updates the reservation list stored in the NM second memory 153 in accordance with the reservation packet P3 from the CU 140-2. Specifically, the NM 150 registers, in the reservation list, the reservation of the write request corresponding to the received reservation packet P3.
The NM 150 selects the oldest reservation (reservation of the CU 140-1) in the reservation list in the NM second memory 153 (step S72). Thereafter, the NM 150 transmits a data request to the CU 140-1 which is a source of the selected reservation (step S73).
If the CU 140-1 receives the data request from the NM 150, the CU 140-1 generates a write request for instructing the NM 150 to write data. Thereafter, the CU 140-1 transmits the generated write request to the NM 150 via the communication network of the RCs 160 (step S74). The NM 150 stores the write request from the CU 140-1 in the NM second memory 153 which functions as a temporary memory. Also, the NM 150 writes the data into the NM first memory 152 which functions as a non-volatile memory, in accordance with the write request stored in the NM second memory 153.
If the NM 150 completes the data writing corresponding to the write request from the CU 140-1, the NM 150 transmits a write completion notice to the CU 140-1 (step S75). Thereafter, the NM 150 removes the write request of which data writing has been completed from the NM second memory 153. Also, the NM 150 removes the reservation of the CU 140-1 from the reservation list.
Next, the NM 150 selects the oldest reservation (reservation of the CU 140-2) in the reservation list in the NM second memory 153 (step S76). Thereafter, the NM 150 transmits a data request to the CU 140-2 which is a source of the selected reservation (step S77).
If the CU 140-2 receives the data request from the NM 150, the CU 140-2 generates a write request for instructing the NM 150 to write data. Thereafter, the CU 140-2 transmits the generated write request to the NM 150 via the communication network of the RCs 160 (step S78). The NM 150 stores the write request from the CU 140-2 in the NM second memory 153. Also, the NM 150 writes the data into the NM first memory 152, in accordance with the write request stored in the NM second memory 153.
If the NM 150 completes the data writing corresponding to the write request from the CU 140-2, the NM 150 transmits a write completion notice to the CU 140-2 (step S79). Thereafter, the NM 150 removes the write request of which data writing has been completed from the NM second memory 153. Also, the NM 150 removes the reservation of the CU 140-2 from the reservation list.
Next, the NM 150 determines whether or not data are being written into the NM first memory 152 (step S83). If the NM 150 determines that data are being written, the process proceeds to step S87.
If the NM 150 determines that data are not being written (No in step S83), the NM 150 determines whether or not any reservation of a write request exists in the reservation list (step S84). If the NM 150 determines that any reservation of a write request does not exist in the reservation list, the process proceeds to step S87.
If the NM 150 determines that a reservation of a write request exists in the reservation list, the NM 150 selects the oldest reservation in the reservation list (step S85). Then, the NM 150 transmits a data request to the CU 140 which is a source of the selected reservation (step S86).
Thereafter, the NM 150 determines whether or not the NM 150 completes the data writing to the NM first memory 152 (step S87). If the NM 150 determines that the NM 150 does not complete the data writing to the NM first memory 152 (No in step S87), the process returns to step S81. On the other hand, if the NM 150 determines that the NM 150 completes the data writing to the NM first memory 152 (Yes in step S87), the NM 150 transmits the write completion notice to the CU 140 (step S88).
Next, the NM 150 removes the write request of which data writing has been completed from the NM second memory 153. Also, the NM 150 removes the reservation of which data writing has been completed from the reservation list. Thereafter, the process returns to step S81.
As described above, according to the third embodiment, the CU 140 performs a write reservation of the data with respect to the NM 150, and then generates the write request to the NM 150. Specifically, the CU 140 generates a reservation packet P3 for write reservation with respect to the NM 150. The NM 150 receives the reservation packet P3 from the CU 140. The NM 150 selects the oldest reservation based on the reservation packets P3 received from the CU 140. The NM 150 writes data associated with the oldest reservation, into the NM first memory 152 of the NM 150. The NM 150 has a reservation list for managing reservation of write requests. The NM 150 updates the reservation list in accordance with the reservation packet P3 which is transmitted from the CU 140. Thereby, the writing performance of the storage system 100 may not be compromised.
In the second embodiment, if a reservation exceeds a writing performance of the NM 150, the reservation is not accepted. However, in the third embodiment, because the NM 150 transfers a data request to a next CU 140 in accordance with the reservation list, more reservations can be accepted.
In the third embodiment, if the CU 140 receives the data request from the NM 150, the CU 140 transmits the write request to the NM 150. In contrast, in a fourth embodiment, if the CU 140 receives a right transfer notice from another CU 140, the CU 140 transmits a write request to NM 150. The fourth embodiment is described below in detail.
As shown in
On the other hand, as shown in
As shown in
As shown in
As shown in
Thereafter, the NM 150-8 removes the write request of which data writing has been completed from the NM second memory 153. Also, the NM 150-8 transfers the reservation data of the CU 140-3 stored in the queue 2 to the queue 1.
As shown in
If the CU 140-3 receives the right transfer notice P7 from the CU 140-1, the CU 140-3 generates a write request W3 for instructing the NM 150-8 to write data. Thereafter, the CU 140-3 transmits the generated write request W3 to the NM 150-8 via the communication network of the RCs 160. The NM 150-8 stores the write request W3 received from the CU 140-3 in the NM second memory 153. Also, the NM 150-8 writes the data into the NM first memory 152, in accordance with the write request W3 stored in the NM second memory 153.
Thereafter, the NM 150-8 removes the write request of which data writing has been completed from the NM second memory 153. Also, the NM 150-8 removes the reservation data of the CU 140-3 from the queue 1.
If the CU 140-1 receives a write command for writing data from the client 200, the CU 140-1 transmits a reservation packet P3 to the NM 150-8 which is a write destination (step S90). If the NM 150-8 receives the reservation packet P3 from the CU 140-1, the NM 150-8 stores a reservation data of the CU 140-1 in the queue 1.
If the CU 140-3 receives a write command for writing data from the client 200, the CU 140-3 transmits a reservation packet P4 to the NM 150-8 which is a write destination (step S91). If the NM 150-8 receives the reservation packet P4 from the CU 140-3, the NM 150-8 stores a reservation data of the CU 140-3 in the queue 2.
Next, the NM 150-8 transmits a data request packet P5 to the CU 140-1 which corresponds to the reservation data stored in the queue 1 (step S92). If the CU 140-1 receives the data request packet P5 from the NM 150-8, the CU 140-1 generates a write request W2 for instructing the NM 150-8 to write data. Thereafter, the CU 140-1 transmits the generated write request W2 to the NM 150-8 via the communication network of the RCs 160 (step S93).
The NM 150-8 stores the write request W2 received from the CU 140-1 in the NM second memory 153. Also, the NM 150-8 writes the data into the NM first memory 152, in accordance with the write request W2 stored in the NM second memory 153.
If the NM 150-8 completes the data writing with respect to the write request W2 received from the CU 140-1, the NM 150-8 transmits a write completion notice P6 and an identification information of the CU 140-3 to the CU 140-1 (step S94). Thereafter, the NM 150-8 removes the write request W2 of which data writing has been completed from the NM second memory 153. Also, the NM 150-8 moves the reservation data of the CU 140-3 stored in the queue 2, to the queue 1.
If the CU 140-1 receives the write completion notice P6 and the identification information from the NM 150-8, the CU 140-1 transmits a right transfer notice P7 to the CU 140-3 corresponding to the received identification information (step S95).
If the CU 140-3 receives the right transfer notice P7 from the CU 140-1, the CU 140-3 generates a write request W3 for instructing the NM 150-8 to write data. Thereafter, the CU 140-3 transmits the generated write request W3 to the NM 150-8 via the communication network of the RCs 160 (step S96).
The NM 150-8 stores the write request W3 received from the CU 140-3, into the NM second memory 153. Also, the NM 150-8 writes the data in the NM first memory 152, in accordance with the write request W3 stored in the NM second memory 153. If the NM 150-8 completes the data writing with respect to the write request W3 from the CU 140-3, the NM 150-8 transmits a write completion notice to the CU 140-3 (step S97).
Thereafter, the NM 150-8 removes the write request of which data writing has been completed from the NM second memory 153. Also, the NM 150-8 removes the reservation data of the CU 140-3 from the queue 1.
As described above, according to the fourth embodiment, if an execution of the write request W2 is completed, the NM 150-8 transmits the write completion notice P6 and the identification information of the CU 140-3 to the CU 140-1. If the CU 140-1 receives the write completion notice P6 and the identification information from the NM 150-8, the CU 140-1 transmits the right transfer notice P7 to the CU 140-3 corresponding to the identification information. If the CU 140-3 receives the right transfer notice P7 from the CU 140-1, the CU 140-3 transmits the write request W3 to the NM 150-8. Thereby, the load of the NM 150-8 can be reduced, and the writing performance of the storage system 100 may not be compromised.
In the first embodiment to the fourth embodiment, the CU 140 transmits the verification packet P1 or the reservation packet P3 to the NM 150. In contrast, in a fifth embodiment, the CU 140 transmits a congestion confirmation packet P8 to the NM 150. If the CU 140 receives a response to the congestion confirmation packet P8 from the NM 150, the CU 140 transmits a write request to the NM 150. The “congestion” in the fifth embodiment means a state in which a routing cannot be properly performed via the RC 160 because the PMU 180 is full of packets, and the NM 150 cannot properly transfer data (i.e., busy). The fifth embodiment is described below in detail.
If the congestion confirmation packet P8 is to be transmitted to the NM 150-13 through the shortest route, the congestion confirmation packet P8 is transferred to NM 150-3, NM 150-8, and NM 150-13 in this order. However, for example, if the PMU 180 connected to the NM 150-8 is full of packets (in a case of PMU FULL state), any packet cannot pass through the communication path including the NM 150-8. Therefore, if the NM 150-8 receives the congestion confirmation packet P8 from the NM 150-3, the NM 150-8 adds information for identifying the NM 150-8, as congestion information, to the payload area PA of the congestion confirmation packet P8. Thereafter, the NM 150-8 returns the congestion confirmation packet P8 to the NM 150-3.
If the NM 150-3 receives the congestion confirmation packet P8 from the NM 150-8, the NM 150-3 refers to the congestion information of the congestion confirmation packet P8, and the NM 150-3 transmits the congestion confirmation packet P8 to a path which does not include the NM 150-8. For example, the NM 150-3 transmits the congestion confirmation packet P8 to the NM 150-4, the NM 150-4 transmits the congestion confirmation packet P8 to the NM 150-9, and the NM 150-9 transmits the congestion confirmation packet P8 to the NM 150-14. Thereafter, the NM 150-14 transmits the congestion confirmation packet P8 to the NM 150-13 which is a destination of the congestion confirmation packet P8.
If the NM 150-13 receives the congestion confirmation packet P8 from the NM 150-14, the NM 150-13 generates a response packet P9. The response packet P9 contains content shown in
The congestion confirmation packet P8 and the response packet P9 are smaller in data size than the write request W4. The NM 150 may limit the number of the write requests that can be stored in the NM second memory 153, in order to reserve an area for storing the congestion confirmation packet P8 and the response packet P9 in the storage area. Thereby, even if congestion occurs in the communication network of the RCs 160, the NM 150 can transmit the congestion confirmation packet P8 and the response packet P9 without delay.
As shown in
Thereafter, as shown in
Next, the CU 140 determines whether or not the CU 140 receives the response packet P9 from the NM 150 (step S102). If the CU 140 determines that the CU 140 receives the response packet P9 from the NM 150, the CU 140 generates the write request W4 for instructing the NM 150 to write data (step S103). At this time, the CU 140 extracts the congestion information described in the response packet P9, and the CU 140 describes the extracted congestion information in the payload area PA of the write request W4. The CU 140 transmits the generated write request W4 to the NM 150 (step S104), and the process returns to step S100.
If the NM 150 determines that the destination of the congestion confirmation packet P8 is the own module, the NM 150 generates the response packet P9. At this time, NM 150 describes, in the payload area PA of the response packet P9, the congestion information included in the congestion confirmation packet P8. The NM 150 transmits the generated response packet P9 to the CU 140 (step S112), and the process returns to step S110.
On the other hand, in step S111, if the NM 150 determines that the destination of the congestion confirmation packet P8 is not the own module, the NM 150 determines whether or not the PMU 180 connected to the NM 150 is full of packets (whether or not the PMU 180 is in the PMU FULL state) (step S113).
If the NM 150 determines that the PMU 180 connected to the NM 150 is full of packets, the NM 150 adds information for identifying the own module, as the congestion information, to the payload area PA of the congestion confirmation packet P8 (step S114). Thereafter, the NM 150 returns the congestion confirmation packet P8 to an adjacent NM 150 which transmitted the congestion confirmation packet P8 (step S115), and the process returns to step S110.
On the other hand, in the step S113, if the NM 150 determines that the PMU 180 connected to the NM 150 is not full of packets, the NM 150 transmits the congestion confirmation packet P8 to an adjacent NM 150 (step S116). At this time, the NM 150 refers to the congestion information of the congestion confirmation packet P8, and the NM 150 transmits the congestion confirmation packet P8 to a path which does not include the NM 150 corresponding to the congestion information. If the NM 150 completes the transmission of the congestion confirmation packet P8, the process returns to step S110.
In the fifth embodiment, the congestion information is described in the congestion confirmation packet P8 in order to confirm the communication path along which congestion does not occur, but not limited thereto. For example, information for identifying the NM 150 which is not in the PMU FULL state may be described in the congestion confirmation packet P8 in order to confirm the communication path along which congestion does not occur.
As described above, according to the fifth embodiment, after the CU 140 confirms the communication path along which congestion does not occur, the CU 140 transmits the write request W4 to the NM 150 via the communication path along which congestion does not occur. Thereby, congestion in the communication network of the RCs 160 can be suppressed, and a writing performance of the storage system 100 may not be compromised.
In the first embodiment to the fifth embodiment, the CU 140 transmits the verification packet, the reservation packet, or the congestion confirmation packet to the NM 150 via the communication network of the RCs 160, but not limited thereto. For example, as shown in
In the first embodiment or the second embodiment, the CU 140 verifies the load of the NM 150 based on the response to the verification packet, but not limited thereto. For example, the NM 150 may periodically determine whether or not the load is equal to or more than the reference value. If the load is equal to or more than the reference value, the NM 150 may generate an overload notice which indicates that the load is equal to or more than the reference value, and the NM 150 may transmit the overload notice to at least one of the CUs 140. The CU 140, which receives the overload notice, may not transmit a write request to the NM 150 which is a source of the overload notice. Also, the CU 140, which receives the overload notice from the NM 150, may transmit the overload notice to the other CUs 140. In this case, because it is not necessary for the CU 140 to transmit the verification packet to the NM 150, the load of the CU 140 can be reduced.
In at least one embodiment described above, the storage system 100 includes a plurality of the NMs 150 and a plurality of the CUs 140. The plurality of the NMs 150 transmits data to the NM 150, which is a write destination, via the communication network of the RCs 160. The plurality of the CUs 140 verifies that a load of the NM 150, which is a write destination, is less than the reference value, or performs a write reservation of data with respect to the NM 150 which is a write destination. Thereafter, the plurality of the CUs 140 transmits a write request to the NM 150 which is a write destination. Thereby, the writing performance of the storage system 100 may not be compromised.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
This application is based upon and claims the benefit of priority from U.S. Provisional Patent Application No. 62/241,828, filed on Oct. 15, 2015, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62241828 | Oct 2015 | US |