The described subject matter relates to electronic computing, and more particularly to systems and methods for managing storage in electronic computing systems.
Effective collection, management, and control of information have become a central component of modem business processes. To this end, many businesses, both large and small, now implement computer-based information management systems.
Data management is an important component of computer-based information management systems. Many businesses now implement storage networks to manage data operations in computer-based information management systems. Storage networks have evolved in computing power and complexity to provide highly reliable, managed storage solutions that may be distributed across a wide geographic area.
Data redundancy is one aspect of reliability in storage networks. A single copy of data is vulnerable if the network element on which the data resides fails. If the vulnerable data or the network element on which it resides can be recovered, then the loss may be temporary. If neither the data nor the network element can be recovered, then the vulnerable data may be lost permanently.
Storage networks implement remote copy procedures to provide data redundancy. Remote copy procedures replicate data sets resident on a first storage site onto a second storage site, and sometimes onto a third storage site. Remote copy procedures have proven effective at enhancing the reliability of storage networks, but at a significant increase in the expense of implementing a storage network.
In an exemplary implementation a storage network is provided. The storage network comprises a first storage site comprising a first set of disk drives; a second storage site communicatively connected to the first storage site and comprising a storage medium; and a third storage site communicatively connected to the second storage site and comprising a second set of disk drives. The second storage site provides a data write spool service to the first storage site.
Described herein are exemplary storage network architectures and methods for implementing multiple site data replication. The methods described herein may be embodied as logic instructions on a computer-readable medium. When executed on a processor, the logic instructions cause a general purpose computing device to be programmed as a special-purpose machine that implements the described methods.
Exemplary Network Architecture
A plurality of logical disks (also called logical units or LUs) 112a, 112b may be allocated within storage pool 110. Each LU 112a, 112b comprises a contiguous range of logical addresses that can be addressed by host devices 120, 122, 124 and 128 by mapping requests from the connection protocol used by the host device to the uniquely identified LU 112. As used herein, the term “host” comprises a computing system(s) that utilize storage on its own behalf, or on behalf of systems coupled to the host. For example, a host may be a supercomputer processing large databases or a transaction processing server maintaining transaction records. Alternatively, a host may be a file server on a local area network (LAN) or wide area network (WAN) that provides storage services for an enterprise. A file server may comprise one or more disk controllers and/or RAID controllers configured to manage multiple disk drives. A host connects to a storage network via a communication connection such as, e.g., a Fibre Channel (FC) connection.
A host such as server 128 may provide services to other computing or data processing systems or devices. For example, client computer 126 may access storage pool 110 via a host such as server 128. Server 128 may provide file services to client 126, and may provide other services such as transaction processing services, email services, etc. Hence, client device 126 may or may not directly use the storage consumed by host 128.
Devices such as wireless device 120, and computers 122, 124, which are also hosts, may logically couple directly to LUs 112a, 112b. Hosts 120-128 may couple to multiple LUs 112a, 112b, and LUs 112a, 112b may be shared among multiple hosts. Each of the devices shown in
Client computers 214a, 214b, 214c may access storage cells 210a, 210b, 210c through a host, such as servers 216, 220. Clients 214a, 214b, 214c may be connected to file server 216 directly, or via a network 218 such as a Local Area Network (LAN) or a Wide Area Network (WAN). The number of storage cells 210a, 210b, 210c that can be included in any storage network is limited primarily by the connectivity implemented in the communication network 212. A switching fabric comprising a single FC switch can interconnect 256 or more ports, providing a possibility of hundreds of storage cells 210a, 210b, 210c in a single storage network.
Hosts 216, 220 are typically implemented as server computers.
Computing device 330 further includes a hard disk drive 344 for reading from and writing to a hard disk (not shown), and may include a magnetic disk drive 346 for reading from and writing to a removable magnetic disk 348, and an optical disk drive 350 for reading from or writing to a removable optical disk 352 such as a CD ROM or other optical media. The hard disk drive 344, magnetic disk drive 346, and optical disk drive 350 are connected to the bus 336 by a SCSI interface 354 or some other appropriate interface. The drives and their associated computer-readable media provide nonvolatile storage of computer-readable instructions, data structures, program modules and other data for computing device 330. Although the exemplary environment described herein employs a hard disk, a removable magnetic disk 348 and a removable optical disk 352, other types of computer-readable media such as magnetic cassettes, flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROMs), and the like, may also be used in the exemplary operating environment.
A number of program modules may be stored on the hard disk 344, magnetic disk 348, optical disk 352, ROM 338, or RAM 340, including an operating system 358, one or more application programs 360, other program modules 362, and program data 364. A user may enter commands and information into computing device 330 through input devices such as a keyboard 366 and a pointing device 368. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are connected to the processing unit 332 through an interface 370 that is coupled to the bus 336. A monitor 372 or other type of display device is also connected to the bus 336 via an interface, such as a video adapter 374.
Computing device 330 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 376. The remote computer 376 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to computing device 330, although only a memory storage device 378 has been illustrated in
When used in a LAN networking environment, computing device 330 is connected to the local network 380 through a network interface or adapter 384. When used in a WAN networking environment, computing device 330 typically includes a modem 386 or other means for establishing communications over the wide area network 382, such as the Internet. The modem 386, which may be internal or external, is connected to the bus 336 via a serial port interface 356. In a networked environment, program modules depicted relative to the computing device 330, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
Hosts 216, 220 may include host adapter hardware and software to enable a connection to communication network 212. The connection to communication network 212 may be through an optical coupling or more conventional conductive cabling depending on the bandwidth requirements. A host adapter may be implemented as a plug-in card on computing device 330. Hosts 216, 220 may implement any number of host adapters to provide as many connections to communication network 212 as the hardware and software support.
Generally, the data processors of computing device 330 are programmed by means of instructions stored at different times in the various computer-readable storage media of the computer. Programs and operating systems may distributed, for example, on floppy disks, CD-ROMs, or electronically, and are installed or loaded into the secondary memory of a computer. At execution, the programs are loaded at least partially into the computer's primary electronic memory.
Each NSC 410a, 410b further includes a communication port 428a, 428b that enables a communication connection 438 between the NSCs 410a, 410b. The communication connection 438 may be implemented as a FC point-to-point connection, or pursuant to any other suitable communication protocol.
In an exemplary implementation, NSCs 410a, 410b further include a plurality of Fiber Channel Arbitrated Loop (FCAL) ports 420a-426a, 420b-426b that implements an FCAL communication connection with a plurality of storage devices, e.g., sets of disk drives 440, 442. While the illustrated embodiment implement FCAL connections with the sets of disk drives 440, 442, it will be understood that the communication connection with sets of disk drives 440, 442 may be implemented using other communication protocols. For example, rather than an FCAL configuration, a FC switching fabric may be used.
In operation, the storage capacity provided by the sets of disk drives 440, 442 may be added to the storage pool 110. When an application requires storage capacity, logic instructions on a host computer 128 establish a LU from storage capacity available on the sets of disk drives 440, 442 available in one or more storage sites. It will be appreciated that, because a LU is a logical unit, not a physical unit, the physical storage space that constitutes the LU may be distributed across multiple storage cells. Data for the application is stored on one or more LUs in the storage network. An application that needs to access the data queries a host computer, which retrieves the data from the LU and forwards the data to the application.
A first communication connection 530 is provided between the first storage site 510 and the second storage site 514, and a second communication connection 532 is provided between the second storage site 514 and third storage site 518. Assuming the optional storage site 540 is implemented, a third communication connection 550 is provided between the second storage site 514 and the optional storage site 540, and a fourth communication connection 552 is provided between the optional storage site 540 and the third storage site 518. In an exemplary implementation the communication connections 530, 532, 550, 552 may be provided by a switching fabric such as a FC fabric, or a switching fabric that operates pursuant to another suitable communication protocol, e.g., SCSI, iSCSI, LAN, WAN, etc.
In an exemplary implementation, the first storage site 510 may be separated from the second storage site 514 by a distance of up to 40-100 kilometers, while the second storage site may be separated from the third storage site 518 by a much greater distance, e.g., between 400 and 5000 kilometers. The optional storage site 540 may be co-located with the second storage site 514, or may be separated from the second storage site 514 by a distance of up to 100 kilometers. The particular distance between any of the storage sites is not critical.
In one exemplary implementation, second storage site 514 includes a network element that has communication, processing, and storage capabilities. The network element includes an input port configured to receive data from a first storage site in the storage network, a cache memory module configured to store the received data, and a processor configured to aggregate data stored in the cache memory and to transmit the data to a third storage site. In one exemplary implementation the network element may be embodied as a plug-in card like the NSC card described in connection with
In an alternate implementation, the network element may be embodied as a stand-alone storage appliance. In an alternate implementation, the cache memory 516 in the second storage site 514 and the cache memory 542 in optional storage site 540 may be implemented using a low-cost replication appliance such as, e.g., the SV-3000 model disk array commercially available from Hewlett Packard Corporation of Palo Alto, Calif., USA.
Exemplary Operations
In an exemplary implementation, the components and connections depicted in
At operation 614 data in the cache memory of the second storage site 514 is aggregated into write blocks of a desired size for transmission to the third storage site. Conceptually, the aggregation routine may be considered as having a producer component that writes data into the cache memory of the second storage site and a consumer component that retrieves data from the cache memory and forwards it to the third storage site. The write operations may be synchronous or asynchronous. The size of inbound and outbound write blocks may differ, and the size of any given write block may be selected as a function of the configuration of the network equipment and/or the transmission protocol in the communication link(s) between the second storage site 514 and the third storage site 518. In Fibre Channel implementations, the write block size may be selected as a multiple of 64 KB.
In an exemplary implementation the write spool implements a first-in, first-out (FIFO) queue, in which data is written from the queue in the order in which it was received. In an alternate implementation data received from the first storage site 510 includes an indicator that identifies a logical group (e.g., a LU or a data consistency group) with which the data is associated and a sequence number indicating the position of the write operation in the logical group. In this embodiment the aggregation routine may implement a modified FIFO queue that selects data associated with the same logical group for inclusion in the write block.
At operation 616 the write block is transmitted to the third storage site 518. At operation 618 the network element waits to receive an acknowledgment signal from the third storage site 518 indicating that the write block transmitted in operation 616 was received by the third storage site 518. When the acknowledgment signal is received, the data received by the third storage site may be marked for deletion, at operation 620. The marked data may be deleted from the write spool, or may be marked with an indicator that allows the memory space in which the data resides to be overwritten.
In an alternate implementation in a network architecture having an optional fourth storage site 540, the network element in the second storage site 514 implements a synchronous write of data received in operation 610 to the optional fourth storage site 540. The network element in storage site 540 provides a synchronous write spool service to the network element in storage site 514. However, in normal operation the network element in storage site 540 does not need to transmit its data to the third storage site 518. Rather, the network element in storage site 540 transmits its data to the third storage site only upon failure in operation of the second storage site 514.
The network architecture depicted in
In addition to the specific embodiments explicitly set forth herein, other aspects and embodiments of the present invention will be apparent to those skilled in the art from consideration of the specification disclosed herein. It is intended that the specification and illustrated embodiments be considered as examples only, with a true scope and spirit of the invention being indicated by the following claims.