The present invention relates to data storage generally and, more particularly, to a method and/or apparatus for implementing a smart hybrid storage based on intelligent data access classification.
In conventional storage arrays, data storage specifications are classified into 3 major categories including (i) mission-critical data, high performance or sensitive data, (ii) reliable data or (iii) reliable and sensitive data.
Mission-critical data, high performance or sensitive data is used in key business processes or customer applications. Such data typically has a very fast response time specification. The data is transactional data having a high input/output process (i.e., IOP) performance with optimal and/or moderate reliability.
Reliable data is classified as company confidential data. Reliable data does not have an instantaneous recovery criteria for the business to remain in operation. The redundancy of such confidential data is important as data should be available under all conditions.
Data that is both reliable and sensitive uses both a high IOP performance and a highly reliable storage technology. Conventional storage systems are challenged to effectively move data between the three categories of storage based on the dynamic input/output load specifications in a storage area network (i.e., SAN).
It would be desirable to implement a hybrid storage system that considers performance to cost impact to dynamically allocate high IOP drives efficiently based on user needs.
The present invention concerns a method for configuring resources in a storage array, comprising the steps of (a) determining if a data access is a first type or a second type, (b) if the data access is the first type, configuring the storage array as a reliable type configuration, (c) if the data access is the second type, configuring the storage array as a secure type configuration.
The objects, features and advantages of the present invention include providing smart hybrid storage that may (i) be based on intelligent data access classification, (ii) drive group or volume group creation based on classified data access criteria of a user, (iii) use vendor unique bits in a control byte of a small computer system interface command descriptor block (e.g., (SCSI CDB) for input/output classification and input/output routing, (iv) provide intelligent data access pattern learn logic to dynamically allocate a solid state device drive or a group of solid state device drives to one or more hard disk groups based on the input/output load, (v) use a control byte of a small computer system interface command descriptor block by the intelligent data access pattern learn logic to initialize a track of an input/output load increase for any particular category of drive groups and track the data flow pattern, and/or (vi) provide automatic de-allocation of drives if the input/output load or data demand has reduced for any particular disk drive groups.
These and other objects, features and advantages of the present invention will be apparent from the following detailed description and the appended claims and drawings in which:
Referring to
The circuit 102 may be implemented as a host. The host 102 may be implemented as one or more computers (or servers or processors) in a host/client configuration. The circuit 106 may be implemented as a number of storage devices (e.g., a drive array). The circuit 108 may be implemented as a controller (e.g., an array controller). In one example, the circuit 108 may be a redundant array of independent disks (e.g., RAID) controller. The circuit 108 may include a block (or module, or circuit) 109. The block 109 may be implemented as firmware (or software or program instructions or code) that may control the controller 108.
The host 102 may have an input/output 110 that may present a signal (e.g., REQ). A configuration file 130 may be sent via the signal REQ through the network 104 to an input/output 112 of the controller 108. The controller 108 may have an input/output 114 that may present a signal (e.g., CTR) to an input/output 116 of the storage array 106.
The array 106 may have a number of storage devices (e.g., drives or volumes) 120a-120n, a number of storage devices (e.g., drives or volumes) 122a-122n and a number of storage devices (e.g., drives or volumes) 124a-124n. In an example, each of the storage devices 120a-120n, 122a-122n, and 124a-124n may be implemented as a single drive, multiple drives, and/or one or more drive enclosures. The storage devices 120a-120n, 122a-122n and/or 124a-124n may be implemented as one or more hard disc drives (e.g., HDDs), one or more solid state devices (e.g., SSDs) or a combination of HDDs and SSDs.
The system 100 may implement a data access classification scheme to determine whether a particular data access should use high performance processing, high reliability storage and/or a mix of both. The system 100 may efficiently allocate data storage in the array 106 using the controller 108. A number of bytes (e.g., SCSI CDB bytes) may be modified to detect a data class and/or allocate high reliability storage (e.g., solid state device storage versus hard disk drive storage) on the fly (e.g., without rebooting the controller 108).
The system 100 may process data using high performance processing and/or high reliability storage by dynamically determining an active data block access and/or a pattern received from the host 102. The controller firmware 109 may implement an intelligent data pattern learn logic engine with smart data access classification. One or more of the solid state device drives (e.g., the drives 120a-120n) may be attached to the controller 108 to form volumes, groups or disks based on a number of implementation options. The system 100 may provide a hybrid storage system with a combination of hard disk drives 122a-122n and/or solid state drives 120a-120n to dynamically enhance the performance of the storage subsystem based on the input/output loads.
The system 100 may further provide an option to create and/or allocate storage based on storage criteria and/or data access classification (e.g., high sensitive data versus high reliable storage). Data that uses both reliable storage and high performance processing may be implemented dynamically by attaching one or more of the solid state drives 120a-120n to the array 106. An intelligent data access learning module may be implemented in the controller firmware 109 to monitor the data accesses and the active data blocks per unit time. The process of attaching and de-attaching the solid state drives 120a-120n may be based on the controller 108 (i) mapping the active data blocks accessed and the solid state drives 120a-120n and (ii) modifying the small computer system interface (e.g., SCSI) command descriptor block (e.g., CDB). The writes may be directed to the hard disk drives 122a-122n and the reads may be performed via the solid state drives 120a-120n. The drives 122a-122n and the drives 120a-120n may be asynchronously accessed.
The modes of operation of the system 100 and a flow may be described as follows. A user is generally provided an option to select a drive group or volume group based on data access classification such as (i) the data that uses reliable storage and high redundancy, (ii) the data that uses storage which may be sensitive and transactional (e.g., the storage may be implemented with fast drives and high input/output processes) and/or (iii) the data that uses high input/output processes and reliable storage with high redundancy. An administrator (or operator or technician) may create storage pools/volumes in the array 106 based on the data classification specifications of the user. The classifications during volume creation by a storage manager (or operator or technician) may be reliable storage or sensitive data storage.
Referring to
Referring to
Referring to
The circuit 402 may be implemented as an input/output (e.g., IO) network circuit. The circuit 404 may be implemented as an input/output processor circuit. The circuit 406 may be implemented as a data path virtualization circuit. The circuit 408 may be implemented as a virtual logical-unit-number (e.g., LUN) to logical-unit-number map manager circuit. The circuit 410 may be implemented as a controller firmware interface layer. The circuit 412 may be implemented as a router circuit. The circuit 414 may be implemented as a command circuit. The circuit 416 may be implemented as a volume creation manager circuit. The circuit 418 may be implemented as a disk drive group circuit.
The data path virtualization layer circuit 406 may receive SCSI input/output processes from the initiators (e.g., the host 102) and update the input/output processes with vendor unique bit information (to be described further in
Referring to
A data pattern learn logic engine (to be described in more detail in connection with
Referring to
Referring to
Referring to
Referring to
Referring to
The state 1002 may be implemented as a start state. The state 1004 may be implemented to allow an administrator (or operator or technician) to create storage based on a data classification. For example, the storage may be created based on sensitive data versus reliable data. Next, the decision state 1006 generally determines if the data is reliable/sensitive. If the data is sensitive, the method 1000 generally moves to the state 1008. The state 1008 may configure the storage as a RAID 50 or RAID 60 storage device. Next, the method 1000 may move to the state 1010. If the state 1006 determines that the data is intended to be reliable data, the method 1000 generally moves to the state 1012. In the state 1012, the method 1000 may configure the storage array as a RAID 51 or RAID 61 storage device and the method 1000 may move to the state 1010. The state 1010 may analyze a data pattern and generates a mapping table between the volume group and the active blocks. Next, the method 1000 may move to the decision state 1014. The decision state 1014 generally determines if an active block may benefit from a performance boost. If an active block may benefit from the performance boost and the data is sensitive data, the state 1016 generally attaches a solid state device the RAID 50/RAID 60 storage. If active block may benefit from the performance boost and the data is reliable data, the method 1000 may attache a solid state device to the RAID 51/RAID 61 storage in the state 1020. If the active block may not benefit from a performance boost, the method 1000 may move to the state 1018. In the state 1018, a data access module generally decides whether removal of one or more of the solid state devices may be appropriate based on a learn cycle. Next, the state 1022 frees up the solid state device identified in the state 1020. Next, the state 1024 ends the process.
Implementation of a neural network may provide a possibility of learning. Given a specific task to solve and a class of functions, the learning may involve using a set of observations to find functions and/or relations that solves the tasks in an optimal sense. A machine learning method may involve a scientific discipline concerned with the design and development of techniques that may allow computers to evolve behaviors based on empirical data, such as from sensor data or databases. Artificial neural networks may comprise mathematical models or computational models inspired by structural and/or functional aspects of biological neural network. Cluster analysis (or clustering) may be the assignment of a set of observations into subsets (call clusters) so that observations in a same cluster may be similar in some sense. Clustering may be a technique (or method) of unsupervised learning and a common technique for statistical data analysis.
In some embodiments, the smart data access classification may be based on artificial intelligence. An artificial intelligence based smart data classification module generally performs the data pattern analysis based on an artificial neural network computation module. The artificial neural network computation model generally forms a cluster for data utilizing the sensitive/reliable storage over a learning time (e.g. Tlearn). The computation model may classify the volume group/disk/active blocks under the categories. Some artificial neural networks, such as a self-organizing map (e.g., SOM) network, may be used to cluster the data automatically. Thereafter, the high-performance data may be viewed as one of the clusters.
The data pattern analysis may be a three-dimensional computation where the learning is done based on the following criteria:
1) Analyzing the input/output data coming to a volume group in the storage subsystem behind the controller 108.
2) A next level of data pattern analysis may be performed based on the input/output transfers reaching the target physical drives and the blocks that are active during the input/output transfer.
3) A table may be built during the learning cycle with the column group versus the drive versus the active blocks.
4) Based on the high activity blocks that may be available, clusters may be created for (i) high input/output processes for sensitive blocks and (ii) average input/output processes for reliable blocks using unsupervised cluster analysis method.
5) The learning cycle may be dynamic and self-defined based on the patterns and a consistency of the patterns to derive a relationship between the active blocks and the input/output transfers.
An example of multiple (e.g., N) learning cycles per multiple (e.g., three) active volume groups is generally illustrated in Table I as follows:
An example of the multiple learning cycles per physical drive (e.g., PD) is generally illustrated in Table II as follows:
An example of the learning cycles per active blocks (e.g., B) is generally illustrated in Table III as follows:
As per the tables, the data classification module generally identifies the blocks utilizing the high input/output process storage. The data classification module may also decide among the blocks based on the active volume groups, the physical drives and the active blocks.
The system 100 may implement a user option to select multiple (e.g., three) different levels of data storage access. The different levels may include, but are not limited to, (i) sensitive data storage, (ii) reliable data storage and (iii) reliable and sensitive data storage. The system 100 may allocate and/or de-allocating a number of solid state drives 120a-120n to act as temporary cache layers to a disk drive group/volume group by a learn logic engine based on input/output load requirements. The system 100 may provide (i) easy and efficient storage planning based on data access criteria of the user, (ii) better reliability, (iii) dynamic performance boost and/or (iv) a cost versus performance advantage. Usage of hybrid drives with NAND flash memory integrated for disk caching may further boost the performance. The system 100 may be implemented for (i) web service and Internet service providers (e.g., ISPs), (ii) database applications, (iii) military applications, (iv) high performance computing applications and/or (v) image processing applications.
The functions performed by the diagram of
The present invention may also be implemented by the preparation of ASICs (application specific integrated circuits), Platform ASICs, FPGAs (field programmable gate arrays), PLDs (programmable logic devices), CPLDs (complex programmable logic device), sea-of-gates, RFICs (radio frequency integrated circuits), ASSPs (application specific standard products), one or more monolithic integrated circuits, one or more chips or die arranged as flip-chip modules and/or multi-chip modules or by interconnecting an appropriate network of conventional component circuits, as is described herein, modifications of which will be readily apparent to those skilled in the art(s).
The present invention thus may also include a computer product which may be a storage medium or media and/or a transmission medium or media including instructions which may be used to program a machine to perform one or more processes or methods in accordance with the present invention. Execution of instructions contained in the computer product by the machine, along with operations of surrounding circuitry, may transform input data into one or more files on the storage medium and/or one or more output signals representative of a physical object or substance, such as an audio and/or visual depiction. The storage medium may include, but is not limited to, any type of disk including floppy disk, hard drive, magnetic disk, optical disk, CD-ROM, DVD and magneto-optical disks and circuits such as ROMs (read-only memories), RAMS (random access memories), EPROMs (electronically programmable ROMs), EEPROMs (electronically erasable ROMs), UVPROM (ultra-violet erasable ROMs), Flash memory, magnetic cards, optical cards, and/or any type of media suitable for storing electronic instructions.
The elements of the invention may form part or all of one or more devices, units, components, systems, machines and/or apparatuses. The devices may include, but are not limited to, servers, workstations, storage array controllers, storage systems, personal computers, laptop computers, notebook computers, palm computers, personal digital assistants, portable electronic devices, battery powered devices, set-top boxes, encoders, decoders, transcoders, compressors, decompressors, pre-processors, post-processors, transmitters, receivers, transceivers, cipher circuits, cellular telephones, digital cameras, positioning and/or navigation systems, medical equipment, heads-up displays, wireless devices, audio recording, storage and/or playback devices, video recording, storage and/or playback devices, game platforms, peripherals and/or multi-chip modules. Those skilled in the relevant art(s) would understand that the elements of the invention may be implemented in other types of devices to meet the criteria of a particular application.
While the invention has been particularly shown and described with reference to the preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made without departing from the scope of the invention.