The present invention relates to a storage system and a mapping method, and is suitably applied to, for example, Software-Defined Storage (SDS).
The SDS is a software-based storage system that executes processing similar to that of a storage controller in a storage system on a general purpose server. Efficiency of operation can be achieved and a server system can be built at a low cost by combining virtualization technology of the SDS and the server, virtualizing a plurality of servers and storage systems, and then putting together the same in the general purpose server.
In the SDS, regardless of a hardware configuration of the storage system, it is necessary to recognize a storage device, and a method of making a physical label correspond to a logical name for each storage device (hereinafter, simply referred to as device) is important. A program used in the SDS controls the storage device by accessing a device file corresponding to the logical name.
In general, the logical name is specified by using a hardware address (Media Access Control (MAC) address, etc.) as the physical label, thereby corresponding to a case where a device is added.
However, when the device is replaced, correspondence between the physical label and the logical name is changed, and the existing device, the replaced device, or the added device may not be correctly recognized in some cases. The program used in the SDS may access an unintended device file and may not be able to control the device as intended.
Therefore, a user of a computer system may need to manually make the physical label correspond to the logical name.
In order to solve such a problem, for example, PTL 1 discloses an invention that queries a hardware configuration and allocates a logical name to a device based on a bus address and a hardware address (MAC address, etc.).
However, the invention disclosed in the PTL 1 mainly assumes a case where the device replaced or added is a network apparatus. Therefore, in a case where another device is dependently connected is not considered, allocation of the logical name to the device is not intended for a device other than the network apparatus, and the allocation may not be applicable.
Specifically, there may be a case where a plurality of apparatuses are connected to one device by Peer to Peer (P2P) bridge connection of Peripheral Component Interconnect Express (PCIe), and a plurality of paths (buses) connected to the device are present. In this case, the bus number is shifted, and the intended logical name cannot be allocated to the intended device, and the invention of the PTL 1 cannot be applied.
Therefore, when a device is added or the device is replaced, the unintended logical name is allocated to the unintended device by applying related technique only. As a result, the intended device cannot be accessed, loss of data may occur, and there is a problem of a system having low reliability.
The invention is made in view of the above circumstances, and an object of the invention is to propose a storage system and a mapping method that can build an information processing system having high reliability.
In order to solve such a problem, the invention provides a storage system, which is configured to separately create a device file corresponding to each device in the system and to access, based on a bus number of a bus of a corresponding device connection destination stored in the device file, the device corresponding to the device file, includes: a bus tree information creation unit configured to create bus tree information indicating a connection relationship between each bus and each device in the system; a connection relationship detection unit configured to separately detect, based on the latest bus tree information created by the bus tree information creation unit, a bus number of a bus of each device connection destination; and a mapping unit configured to change the bus number of the bus of the corresponding device connection destination respectively described in each device file or to re-create each device file, based on a detection result of the connection relationship detection unit.
Further, the invention provides a mapping method, which is a method of mapping between a device and a device file, and is executed in a storage system configured to separately create the device file corresponding to each device in the system and to access, based on a bus number of a bus of a corresponding device connection destination stored in the device file, the device corresponding to the device file, includes: a first step of creating a latest bus tree information indicating a connection relationship between each bus and each device in the system; a second step of separately detecting, based on the latest created bus tree information, a bus number of a bus of each device connection destination; and a third step of changing the bus number of the bus of the corresponding device connection destination respectively described in each device file or re-creating each device file, based on a detection result.
Further, the invention provides a storage system, which includes a storage controller configured to separately create a device file corresponding to each device in the system and to access, based on a bus number of a bus of a corresponding device connection destination stored in the device file, the device corresponding to the device file, and includes a disk enclosure connected to the storage controller. The storage controller includes a memory in which a program and each device file are stored, and a processor configured to control operation of the entire storage controller based on the program stored in the memory. The disk enclosure includes a network switch having an upstream bridge and one or a plurality of downstream bridges, the upstream bridge being connected to the storage controller, and one or a plurality of devices including a memory device respectively connected to the different downstream bridge in the network switch via the bus. The processor of the storage controller is configured to create a latest bus tree information indicating a connection relationship between each bus and each device in the system based on the program stored in the memory, to separately detect for each device, based on the latest created bus tree information, a correspondence relationship among an identifier of the upstream bridge to which the device is connected in the network switch, a device number of the downstream bridge to which the device is connected in the network switch, and a device name of the device, to separately detect a bus number of a bus of each device connection destination based on the detected correspondence relationship and the latest bus tree information, and to change the bus number of the bus of the corresponding device connection destination respectively described in each device file or to re-create each device file, based on a detection result.
According to the storage system and the mapping method of the invention, even when a configuration of the storage system is changed due to replacement or addition of the device, the desired device can be reliably accessed.
According to the invention, it is possible to implement a storage system and a mapping method that can build an information processing system having high reliability.
Hereinafter, an embodiment of the invention is described in detail with reference to the drawings.
In
The host computer 2 is a computer device including information resources such as a Central Processing Unit (CPU) and a memory, and includes, for example, an open-system server, and a mainframe computer or the like. The host computer 2 transmits a write command or a read command to the storage system 3 via the network 4 in response to a user operation or a request from an installed program.
The storage system 3 includes a storage controller 5 and one or a plurality of storage devices 6 connected (added) to the storage controller 5. Hereinafter, a case where a disk enclosure is applied as the storage device 6 is described. Hereinafter, the storage device 6 is referred to as the disk enclosure 6.
The storage controller 5 is a general purpose server device on which software necessary for providing the host computer 2 with a function as an SDS is mounted, and includes a microprocessor 10, a memory 11, a front-end interface 12, and a storage device (not shown).
The microprocessor 10 is hardware that controls an overall operation of the storage controller 5, and includes one or a plurality of processor cores 10A and a root complex 13 to be described below with reference to
The memory 11 includes, for example, a semiconductor memory such as a Synchronous Dynamic Random Access Memory (SDRAM), and is used for memorizing and holding a necessary program (including an Operating System (OS)) and data. The program stored in the memory 11 is executed by the processor core 10A of the microprocessor 10, so that various kinds of processing as the SDS are executed by the storage controller 5. However, for ease of understanding, a description will proceed on an assumption that such a program is executed by the microprocessor 10.
The front-end interface 12 is an interface with respect to the host computer 2, and performs protocol control during communication with the host computer 2 via the network 4.
The disk enclosure 6 is a memory device where a PCIe switch 20 and one or a plurality of drives 21 are mounted on one blade substrate or accommodated in one housing. The PCIe switch 20 is a switch conforming to a PCIe standard, and includes one upstream port 20A and a plurality of downstream ports 20B. The upstream port 20A is connected to the storage controller 5 via a cable, and the downstream port 20B is connected to a port 21A of any one of the drives 21 in the disk enclosure 6.
The drive 21 is a memory device or a memory medium that memorizes and holds data given by the host computer 2. In addition to a NAND flash memory, the drive 21 includes various kinds of memory devices or memory media such as a hard disk device, a Solid State Drive (SSD), a Magnetoresistive Random Access Memory (MRAM), a phase-change memory, a Resistance RAM (ReRAM), a Ferroelectric RAM (FeRAM), a magnetic disk, or an optical disk. Each drive 21 is given a logical name (hereinafter, referred to as a device name) that is unique to the drive 21 and is set by a user.
A management terminal 14 is also connected to the microprocessor 10 of the storage controller 5. The management terminal 14 includes, for example, a notebook personal computer, and is used when an administrator performs various kinds of settings and maintenance with respect to the storage controller 5.
The root complex 13 includes a plurality of PCI Express to PCI Express bridges (P2P bridges) 31 connected via a PCIe bus 30. The disk enclosure 6 or a Network Interface Card (NIC) is connected to these P2P bridges 31 via PCIe buses 32 as necessary. The P2P bridge 31 may be a virtual bridge (virtual P2P bridge).
A device number (“dev#0” to “dev#3” in
Meanwhile, the PCIe switch 20 of the disk enclosure 6 includes an upstream P2P bridge 33 and a plurality of downstream P2P bridges 35 connected to the upstream P2P bridge 33 via a PCIe bus 34.
The upstream P2P bridge 33 is connected to any one of the P2P bridges 31 in the root complex 13 of the microprocessor 10 via the PCIe bus 32, and the downstream P2P bridge 35 is connected to the drive 21 via a PCIe bus 36. Unique device numbers (“dev#0”, “dev#0” to “dev#7” in
In the information processing system 1 having such a configuration, a bus number is given to each of the PCIe buses 30, 32, 34 and 36 in the storage system 3 by the microprocessor through the function of the BIOS. In practice, the microprocessor 10 separately collects information about all the PCI buses 30, 32, 34, and 36 in the storage system 3 and all the devices connected thereto from these devices when the storage system 3 (accurately, the storage controller 5) is started or restarted. Then, the microprocessor 10 generates bus tree information 37 as shown in
Here, the bus tree information 37 is information having a tree structure indicating a connection relationship between each PCI bus and each device that are present in the storage system 3, and is created by the microprocessor 10 through an OS function after the storage controller 5 is started or restarted.
For example, in the case of
In an example of
Further, in the example of
The user space 40 is an address space in which user programs are stored. In this embodiment, a device file creation program 50, a device management program 51, a storage service program 52, and a user program 53 are stored in the user space 40.
The device file creation program 50 is a program having a function of, after the storage controller 5 is started or restarted, detecting a bus number of the PCIe bus to which each device in the storage system 3 is respectively connected and notifying the device management program 51 of information indicating a connection relationship between these devices and the PCIe buses. The device file creation program 50 is described in detail below.
The device management program 51 is a program having a function of dynamically creating a device file 55 of each device in the storage system 3 such as Userspace Device management (udev), for example, in the case of Linux (registered trademark) and allocating the device file 55 to the device. In practice, the device management program 51 gives an instruction to the OS so as to create the device file 55 with respect to each device and allocate the device file 55 to the corresponding device based on the information which is given by the device file creation program 50 and indicates the connection relationship between the device and the PCIe bus described above. Thus, a kernel of the OS creates the corresponding device file 55 in the kernel space 41 in response to such an instruction.
The storage service program 52 is a program having a function of controlling reading/writing of data with respect to the drive 21 (
The user program 53 is a program not related to storage management such as an application program that performs numerical calculation, and an application program that manages a database.
Meanwhile, the kernel space 41 is a virtual memory area in which a kernel of a not shown OS mounted on the storage controller 5 is present, and includes a device file system 54. The device file system 54 manages each device file 55 (“drv 1” to “drv 8”, . . . , “sda”, “sdb”, . . . ) associated with each device in the storage system 3 created by the above-described kernel.
The device files 55 are files that control the devices in the storage system 3, respectively. Each device file 55 stores information such as necessary information about a type or performance of the corresponding device, and a bus number of the PCIe bus to which the device is connected. In
In a case where a command is given from the command processing unit 52A to the device file 55, a device driver (not shown) corresponding to the device file 55 in the kernel of the OS is started. Then, the device driver reads out a bus number of a PCIe bus of a corresponding device connection destination from the device file 55, and executes necessary processing such as read processing or write processing with respect to a corresponding device connected to the above-described PCIe bus of the bus number.
In
As apparent from the comparison between
A bus number of the PCIe bus 34 is also changed to a bus number “Bus#21” with the PCIe bus 34 disposed in the PCIe switch 20 of the disk enclosure 6 and located between the upstream P2P bridge 33 and each downstream P2P bridge 35. Bus numbers of the PCIe buses 36 connecting between the downstream P2P bridges 35 and the drive s21 are also changed to bus numbers ranging from “Bus#22” to “Bus#29”, respectively.
Accordingly, in a case where the disk enclosure 6 is added to the storage controller 5 of
Accordingly, in a case where the bus number of the PCIe bus 36 of the connection destination of the drive 21 which is originally connected to the storage controller 5 is changed, there is a problem that the bus number of the PCIe bus 36 of the drive 21 connection destination described in the device file 55 (
Therefore, the information processing system according to this embodiment has a function (hereinafter, refers to as mapping function) of detecting a latest correspondence relationship (latest relationship between each device and each PCIe bus) between each device in the storage system 3 and a PCIe bus to which the device is connected, by using the latest bus tree information 37 (
In practice, the storage controller 5 creates the latest bus tree information 37 after being started or restarted. For each device in the storage system 3, the storage controller 5 then detects, by using the bus tree information 37, a correspondence relationship among an identifier (hereinafter, referred to as serial number of the upstream P2P bridge 33) unique to the upstream P2P bridge 33 (
Based on the latest bus tree information 37 and the detected correspondence relationship among the identifier, the device number and the device name, the storage controller 5 detects a correspondence relationship between the device name of each device and the bus number of the PCIe bus to which the device is connected after the storage controller 5 is started or restarted.
Thereafter, the storage controller 5 changes the bus number of the PCIe bus of the corresponding device connection destination respectively set in each device file 55 (
As a means for implementing the mapping function according to this embodiment as described above, as shown in
The device name registration unit 60 is a functional unit having a function of detecting a connection relationship (correspondence relationship among the device name of the device, the device number of the downstream P2P bridge 35 (
The bus number/device name correspondence identification unit 61 is a functional unit having a function of detecting the correspondence relationship between the device name of the device and the PCIe bus to which the device is connected for each device presenting in the storage system 3 based on the latest bus tree information 37 and the device name registration table 63, and separately registering and managing the detected connection relationship of each device in the bus number/device name correspondence identification table 64.
The setting unit 62 is a functional unit having a function of notifying the device management program 51 of the correspondence relationship of each device between the bus number and the device name stored in the bus number/device name correspondence identification table 64 created by the bus number/device name correspondence identification unit 61. Based on the correspondence relationship of each device between the bus number and the device name notified from the setting unit 62, the device management program 51 changes the bus number of the PCIe bus of the corresponding device connection destination set in each device file 55 to a latest bus number.
Meanwhile, the device name registration table 63 is a table used for managing the latest connection relationship between the device and the PCIe switch 20 (
The device name column 63C stores a name (device name) separately set by a user of each device presenting in the storage system 3. The downstream P2P bridge device number column 63B stores the device number of the downstream P2P bridge 35 (
Further, the upstream P2P bridge ID column 63A stores the serial number as the identifier (ID) of the upstream P2P bridge 33 (
Therefore, in the example of
The bus number/device name correspondence identification table 64 is a table used for managing the latest connection relationship between the device and the PCI bus as described above, and includes a bus number column 64A and a device name column 64B as shown in
The device name column 64B stores a device name of each device presenting in the storage system 3. The bus number column 64A stores a bus number given by the microprocessor 10 (
Therefore, in the example of
Initial values of the bus number/device name correspondence identification table 64 (
Next, contents of various kinds of processing executed in the storage controller 5 in relation to the mapping function are described. Hereinafter, processing entities of various kinds of processing are described as the device name registration unit 60, the bus number/device name correspondence identification unit 61, or the setting unit 62 of the device file creation program. 50 (
When this device name registration processing is started, the device name registration unit 60 first initializes the device name registration table 63 (
Subsequently, the device name registration unit 60 selects one unprocessed entry (row) from the bus number/device name correspondence identification table 64 (S2), and obtains information (the bus number and the device name) of the entry (S3).
Next, the device name registration unit 60 refers to the latest bus tree information 37, specifies the downstream P2P bridge 35 in the PCIe switch 20 of the connection destination of the device connected to the PCIe bus to which the bus number obtained in step S2 is given, and obtains the device number of the downstream P2P bridge 35 from the latest bus tree information 37 (S4).
The device name registration unit 60 refers to the bus tree information 37, specifies the upstream P2P bridge 33 in the PCIe switch 20 including the downstream P2P bridge 35 having the device number obtained in step S4, and obtains the identifier (ID) of the upstream P2P bridge 33 from the bus tree information 37 (S5).
Subsequently, the device name registration unit 60 registers the device name obtained in step S3, the device number obtained in step S4, and the ID obtained in step S5 that are in association with each other as one entry in the device name registration table 63 (
Next, the device name registration unit 60 determines whether the processing of steps S2 to S6 is executed and completed for all the entries (rows) in the bus number/device name correspondence identification table 64 (S7). When a negative result is obtained in this determination, the device name registration unit 60 returns to step S2, and thereafter repeats the processing of steps S2 to S7 until a positive result is obtained in step S7.
Then, when a positive result is obtained in step S7 by executing and completing the processing of steps S2 to S6 for all the entries in the bus number/device name correspondence identification table 64, the device name registration unit 60 completes the device name registration processing after calling the bus number/device name correspondence identification unit 61 (S8), and sorting each entry in the bus number/device name correspondence identification table 64 in a manner that the same is arranged in a descending order of an identifier (ID) value of the upstream P2P bridge 33.
Accordingly, for each drive 21, the correspondence relationship among the identifier of the upstream P2P bridge 33 to which the drive 21 is connected in the PCIe switch 20, the device number of the downstream P2P bridge 35 to which the drive 21 is connected in the PCIe switch 20, and a drive name of the drive 21 is registered and managed in the device name registration table 63, so that bus number/device name correspondence processing to be described below with respect to
Meanwhile,
Subsequently, the bus number/device name correspondence identification unit 61 obtains the latest bus tree information 37 (S11), and detects all the upstream P2P bridges 33 in the storage system 3 based on the obtained bus tree information 37 (S12).
Next, the bus number/device name correspondence identification unit 61 selects one upstream P2P bridge 33 unprocessed in steps S14 to S18 from the detected upstream P2P bridges 33 (S13), and obtains the identifier (ID) of the upstream P2P bridge 33 from the bus tree information 37 (S14).
Subsequently, among the entries in the device name registration table 63 (
Further, the bus number/device name correspondence identification unit 61 obtains a bus number corresponding to a device number stored in the downstream P2P bridge device number column 63B (
The bus number/device name correspondence identification unit 61 determines, among all the entries in the device name registration table 63, whether the processing of steps S16 and S17 is executed and completed for all the entries in which the IDs obtained in step S14 are stored in the upstream P2P bridge ID column 63A (S18). When a negative result is obtained in this determination, the bus number/device name correspondence identification unit 61 returns to step S15, and then sequentially switches the entry selected in step S15 to another applicable unprocessed entry and repeats the processing of steps S15 to S18.
When an affirmative result is eventually obtained in step S18 by executing and completing, among the entries in the device name registration table 63, the processing of steps S16 and S17 for all the entries in which the IDs obtained in step S14 are stored in the upstream P2P bridge ID column 63A, the bus number/device name correspondence identification unit 61 determines whether the processing of steps S14 to S18 is executed and completed for all the upstream P2P bridges 33 detected in step S12 (S19).
When a negative result is obtained in this determination, the bus number/device name correspondence identification unit 61 returns to step S13, and then sequentially switches the upstream P2P bridge 33 selected in step S13 to another applicable unprocessed upstream P2P bridge 33 and repeats the processing of steps S13 to S19.
When a positive result is obtained in step S19 eventually by completing the processing of steps S14 to S18 for all the upstream P2P bridges 33 detected in step S12, the bus number/device name correspondence identification unit 61 completes the bus number/device name correspondence processing after calling the setting unit 62 (
Then, the setting unit 62 called by the bus number/device name correspondence identification unit 61 separately notifies the device management program 51 (
Thus, based on each notified correspondence relationship between the device name and the bus number, the device management program 51 updates the bus number of the PCIe bus of the corresponding device connection destination respectively stored in each device file 55 (
As described above, in the information processing system 1 according to this embodiment, the storage controller 5 of the storage system 3 detects the latest correspondence relationship (latest connection relationship between each device and the PCIe bus) between each device in the storage system 3 and the PCIe bus to which the device is connected by using the latest bus tree information 37 created after the storage controller 5 of the storage system 3 is started or restarted. Based on the detection result, the storage controller 5 of the storage system 3 changes the bus number of the PCIe bus of the corresponding device connection destination stored in each device file 55 to the bus number of the PCIe bus of the latest connection destination.
Therefore, according to the information processing system 1 of this embodiment, the device driver (eventually the command processing unit 52A of the storage service program 52 or the host computer 2) can reliably access a desired device and can consequently build an information processing system having high reliability even when configuration of the storage system 3 is changed due to replacement or addition of a device.
In the above-described embodiment, it is described that a serial number of the upstream P2P bridge 33 is applied as an identifier (ID) unique to the upstream P2P bridge 33, but the invention is not limited thereto, and a Universally Unique Identifier (UUID) can also be applied.
In the above-described embodiment, it is described that based on each correspondence relationship between a device name and a bus number notified by the setting unit 62, the device management program 51 updates a bus number of a PCIe bus of a corresponding device connection destination respectively stored in each device file 55 (
In the above-described embodiment, it is described that a bus tree information creation unit that creates the bus tree information 37 indicating a connection relationship between each PCIe bus and each device in a system includes the microprocessor 10 and an OS, a connection relationship detection unit that separately detects, based on the latest bus tree information 37 created by the bus tree information creation unit, a bus number of a PCIe bus of each device connection destination includes the microprocessor 10 and the device file creation program 50, and a mapping unit that changes, based on a detection result of the connection relationship detection unit, the bus number of the PCIe bus of the corresponding device connection destination respectively described in each device file 55 includes the microprocessor 10, the device file creation program 50 and the device management program 51. But the invention is not limited thereto, part or all of the bus tree information creation unit, the connection relationship detection unit and the mapping unit may include dedicated hardware.
In the above-described embodiment, it is described that the invention is applied to the information processing system 1 having a general configuration as shown in
For example, as shown in
In such an information processing system, each command processing unit 52A of the storage service program 52 transmits a read/write request, in response to a read/write command from the host computer 2, to a corresponding logical volume 71. Then, the read/write request is transferred to the RAID control program 70, and is transferred under the control of the RAID control program 70 to the device file 55 of each device (the drive 21) that provides the memory area to the logical volume 71, and necessary read processing or write processing is executed by a corresponding device driver.
Further, as shown in
In this case, two PCIe switches 83 are disposed in one disk enclosure 82, and the drive 21 is connected to each PCIe switch 83. With such a configuration, in a case where one of the two storage controller 5X or 5Y fails, the other storage controller 5Y or 5X can be used to correspond to the read command or the write command from the host computer 2.
In the information processing system 80 having the configuration of
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2017/019039 | 5/22/2017 | WO | 00 |