Traditionally, data management provided to end-consumer applications involves a variety of software layers. These software layers are normally split between storage subsystems, servers, and client computers (sometimes, the client computers and the servers may be embodied in a single computer system).
In a Storage Area Network (SAN) architecture, the division is typically set forth as described in
What is commonly referred to as Network Attached File Systems (NAS), as shown in
According to the present invention, a server is embedded directly into a storage subsystem. When moving between the storage subsystem domain and the server domain, data copying is minimized. Data management functionality written for traditional servers may be implemented within a stand-alone storage subsystem, generally without software changes to the ported subsystems. The hardware executing the storage subsystem and server subsystem are implemented in a way that provides reduced or negligible latency, compared to traditional architectures, when communicating between the storage subsystem and the server subsystem. In one aspect, a plurality of clustered controllers are used. In this aspect, traditional load-balancing software can be used to provide scalability of server functions. One end-result is a storage system that provides a wide range of data management functionality, that supports a heterogeneous collection of clients, that can be quickly customized for specific applications, that easily leverages existing third party software, and that provides optimal performance.
According to an aspect of the invention, a method is provided for embedding functionality normally present in a server computer system into a storage system. The method typically includes providing a storage system having a first processor and a second processor coupled to the first processor by an interconnect medium, wherein processes for controlling the storage system execute on the first processor, porting an operating system normally found on a server system to the second processor, and modifying the operating system to allow for low latency communications between the first and second processors.
According to another aspect of the invention, a storage system is provided that typically includes a first processor configured to control storage functionality, a second processor, an interconnect medium communicably coupling the first and second processors, an operating system ported to the second processor, wherein said operating system is normally found on a server system, and wherein the operating system is modified to allow low latency communication between the first and second processors.
According to yet another aspect of the invention, a method is provided for optimizing communication performance between server and storage system functionality in a storage system. The method typically includes providing a storage system having a first processor and a second processor coupled to the first processor by an interconnect medium, porting an operating system normally found on a server system to the second processor, modifying the operating system to allow for low latency communications between the first and second processors, and porting one or more file system and data management applications normally resident on a server system to the second processor.
According to still another aspect of the invention, a method is provided for implementing clustered embedded server functionality in a storage system controlled by a plurality of storage controllers. The method typically includes providing a plurality of storage controllers, each storage controller having a first processor and a second processor communicably coupled to the first processor by a first interconnect medium, wherein for each storage controller, an operating system normally found on a server system is ported to the second processor, wherein said operating system is allows low latency communications between the first and second processors. The method also typically includes providing a second interconnect medium between each of said plurality of storage controllers. The second communication medium may handle all inter-processor communications. A third interconnect medium is provided in some aspects, wherein inter-processor communications between the first processors occur over one of the second and third mediums and inter-processor communications between the second processors occur over the other one of the second and third mediums.
According to another aspect of the invention, a storage system is provided that implements clustered embedded server functionality using a plurality of storage controllers. The system typically includes a plurality of storage controllers, each storage controller having a first processor and a second processor communicably coupled to the first processor by a first interconnect medium, wherein for each storage controller, processes for controlling the storage system execute on the first processor, an operating system normally found on a server system is ported to the second processor, wherein said operating system is allows low latency communications between the first and second processors, and one or more file system and data management applications normally resident on a server system are ported to the second processor. The system also typically includes a second interconnect medium between each of said plurality of storage controllers, wherein said second interconnect medium handles inter-processor communications between the controller cards. A third interconnect medium is provided in some aspects, wherein inter-processor communications between the first processors occur over one of the second and third mediums and inter-processor communications between the second processors occur over the other one of the second and third mediums.
Reference to the remaining portions of the specification, including the drawings and claims, will realize other features and advantages of the present invention. Further features and advantages of the present invention, as well as the structure and operation of various embodiments of the present invention, are described in detail below with respect to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements.
According to one embodiment all, or a substantial portion, of the data management functionality is moved within the storage subsystem. In order to maximize the utilization of existing software, including third party software, and to minimize porting effort, in one aspect, the data management functionality is implemented as two separate software towers running on two separate microprocessors. While any high speed communication between the processors could be used, a preferred implementation involves implementing hardware having two (or more) microprocessors that are used to house a storage software tower and a server software tower, but allowing each microprocessor having direct access to a common memory. An example of a server tower embedded in a storage system according to one embodiment is shown in
It will be apparent to one skilled in the art that multi-processor chip implementations may be used to accomplish a similar architecture. It will also be apparent to one skilled in the art that processor virtualization software can be used to emulate two separate processors executing a single ‘real’ processor. It will also be apparent that the software tower can run as a task of the server tower.
In a preferred implementation, connectors 430 are used to connect the I/O portions of the hardware. This allows alternate I/O modules to be used to provide alternate host protocol connections such as InfiniBand®, e.g. as shown in
According to one embodiment, as shown in
According to one embodiment, the common memory, e.g., memory 420 and 422 of
According to one aspect, an integer field representing priority is included and the link is scanned multiple times looking for decreasing priorities of requests.
In one aspect, data buffers are pre-allocated by the request target and can be used by the source processor to receive actual data. In this aspect, the processor initiating the request is responsible for copying the data blocks from its memory to the pre-allocated buffer on the receiving processor.
According to another aspect, the actual data coping is deferred until deemed more convenient, thus minimizing latency associated with individual transactions. This is preferably done without modifications outside the device driver layer of the Linux operating system; e.g. during a write operation, by “nailing” the I/O page to be written and using the Linux page image for the I/O operations in the storage system. The page can be replicated as a background function on the Storage System processor (the processor implementing storage system control functionality). When the copy is complete and in use by the Storage System, the Server Device Driver is notified that the page is now “clean” and can be “un-nailed.”
In one aspect, all of the above is implemented on the Server processor (the processor implementing server functionality) using a special device driver.
According to one aspect, the virtual memory management modules of both the Sever operating system and the storage system work cooperatively in using common I/O buffers, thus advantageously avoiding unnecessary copies and minimizing the redundancy of space usage.
In order to prevent defective software from making unauthorized writes, the memory management units (MMUs) from both processors are used to protect memory not currently assigned to the respective processor.
In one embodiment, multiple storage system controller nodes are clustered together. The concept of clustering controllers was introduced in U.S. Pat. No. 6,148,414. Additional refinements of clustered controllers were introduced in US Application No. 2002/0188655. U.S. Pat. No. 6,148,414 and US Application No. 2002/0188655, which are each hereby incorporated by reference in its entirety, teach how to make multiple storage system towers work cooperatively, how to load-balancing their workloads, and how to make their respective caches coherent even though they are implemented on separate physical memories. One advantageous result of implementing aspects of the present invention in multiple storage system controllers is that multiple Storage System Towers can export a given virtual volume of storage to multiple embedded servers. The performance scales as additional Storage System towers are added. Clustered file systems are now common, wherein multiple file system modules running on multiple servers can export to their host a common file system image. An example of a clustered file system is the Red Hat Global File System (http://www.redhat.com/software/rha/gfs/). If the file system 726 (
In one aspect, the file system allocates its buffer space using the common buffer allocation routines described above. The buffers are the largest storage consumer of a file system. Allocating them from the common pool 810, rather than the Server Tower specific pool 840, optimizes the usage of controller memory and makes the overall system more flexible.
Porting common software that balances application software execution load between multiple servers, such as LSF from Platform computing, onto the server tower 724 allows single instance applications to benefit from the scalability of the overall platform. The load-balancing layer 724 moves applications between controllers to balance the execution performance of controllers and allow additional controllers to be added to a live system to increase performance.
While the invention has been described by way of example and in terms of the specific embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. To the contrary, it is intended to cover various modifications and similar arrangements, in addition to those discussed above, as would be apparent to those skilled in the art. For example, although two processors were discussed, the present invention is applicable to implementing more than two processors for sharing server and or storage system control functionality on any given node. Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.
This application claims the benefit of U.S. provisional application No. 60/493,964 which is hereby incorporated by reference in its entirety. This application also claims priority as a Continuation-in-part of U.S. non-provisional application Ser. No. 10/046,070 which is hereby incorporated by reference in its entirety.
| Number | Date | Country | |
|---|---|---|---|
| 60493964 | Aug 2003 | US | |
| 60261140 | Jan 2001 | US |
| Number | Date | Country | |
|---|---|---|---|
| Parent | 10046070 | Jan 2002 | US |
| Child | 10913008 | Aug 2004 | US |