This application claim priority from Chinese Patent Application Number CN201610453751.8, filed on Jun. 21, 2016 at the State Intellectual Property Office. China, titled “DATA PROCESSING METHOD AND DEVICE” the contents of which is herein incorporated by reference in its entirety
Embodiments of the present invention generally relate to data processing, and more specifically, relate to a method and apparatus for data processing.
Currently, there is an increasing demand on data storage. Widely used storage systems include, for example, file systems, block storage, and object storage. Compared with other storage systems (for example, the file systems that manage data as a file hierarchy and the block storage that manages data as blocks), the object storage is a storage architecture that manages data as objects.
The object storage, for example, is suitable for storing unstructured data and allows relatively inexpensive, scalable and self-healing retention of a massive amount of data. Solutions for public cloud object storage services have already been proposed. There have also been solutions intended to provide private cloud object storage services. These known solutions have some common features, for example, based on HTTP/HTTPS protocol, simple read/write application programming interface (API) in representational state transfer (REST) style, based on a specific API or the like. When a user uses the existing object storage services, the user will usually face problems such as low efficiency, low security and the like, which directly reduces user experience.
Embodiments of the present disclosure provide a method and apparatus for data processing and a corresponding computer program product.
According to a first aspect of the present disclosure, a method of data processing is proposed. The method comprises: obtaining an intermediate identifier of data to be processed in an intermediate system; converting, based on an identifier mapping between the intermediate system and a remote system, the intermediate identifier into a first identifier in the remote system; and processing, in association with the remote system, the data at least partially based on the first identifier.
In some embodiments, obtaining the intermediate identifier comprises: receiving, from a client, a user request for operating the data at the remote system; and extracting, from the user request, the intermediate identifier of the data.
In some embodiments, processing, in association with the remote system, the data comprises: generating, based on the user request, a first request for the operating at the remote system, the first request including the first identifier; and transmitting the first request to the remote system.
In some embodiments, the operating includes reading the data, and processing, in association with the remote system, the data further comprises: receiving the data from the remote system; and transmitting the data to the client.
In some embodiments, the remote system is a first remote system, and the operating includes updating the data at the first remote system, in which processing, in association with the remote system, the data further comprises: converting, based on an identifier mapping between the intermediate system and a second remote system, the intermediate identifier of the data into a second identifier of the data in the second remote system; generating a second request for updating the data at the second remote system, the second request including the second identifier; and transmitting the second request to the second remote system.
In some embodiments, the updating comprises at least one of: creating, deleting, and modifying.
In some embodiments, generating the first request comprises: generating the first request using a grammar different from that of the user request.
In some embodiments, at least one of the user request and the first request includes a key associated with the data.
In some embodiments, the remote system is a first remote system, and processing, in association with the remote system, the data comprises: converting, based on an identifier mapping between the intermediate system and a third remote system, the intermediate identifier into a third identifier of the data in the third remote system, the third remote system being different from the first remote system; obtaining the data from the third remote system using the third identifier; and storing the data into the first remote system using the first identifier.
In some embodiments, processing, in association with the remote system, the data comprises: deleting the data from the third remote system in response to at least one of: determining that the data has been completely stored in the first remote system, and all unprocessed requests for the data having been processed.
According to a second aspect of the present disclosure, an electronic device is proposed. The electronic device comprises: at least one processing unit; and at least one memory coupled to the at least one processing unit and storing machine-executable instructions which, when executed by the at least one processing unit, causes the at least one processing unit to be configured to: obtain an intermediate identifier of data to be processed in an intermediate system; convert, based on an identifier mapping between the intermediate system and a remote system, the intermediate identifier into a first identifier in the remote system; and process, in association with the remote system, the data at least partially based on the first identifier.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the present disclosure, nor is it intended to limit the scope of the present disclosure.
Through the following detailed description of example embodiments of the present disclosure with reference to the accompanying drawings, the above and other objectives, features, and advantages of the present disclosure will become more apparent. In the example embodiments of the present disclosure, like reference numerals usually indicate similar elements.
The preferred embodiments of the present disclosure will be described in more details below with reference to the accompanying drawings. Although the accompanying drawings show preferred embodiments of the present disclosure, it should be understood that the present disclosure may be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to make the present disclosure more thorough and complete and to convey the scope of the present disclosure to those skilled in the art.
As used herein, the term “includes” and its variants are to be read as open-ended terms that mean “includes, but is not limited to.” The term “or” is to be read as “and/or” unless the context clearly indicates otherwise. The term “based on” is to be read as “based at least in part on.” The term “one example embodiment” and “one embodiment” are to be read as “at least one example embodiment.” The term “another embodiment” is to be read as “at least another embodiment.” The terms “first”, “second” and the like may refer to different or same objects. Other explicit and implicit meanings may also be included hereinafter.
As illustrated in
Additionally, in some cases, the client 110 expects that the data it stores on the remote systems 130-1, . . . , 130-N may flow between different remote systems as needed. However, because different remote systems have different configurations, in order to cause the data to flow between different remote systems, the remote systems usually has to be configured separately by the user, which makes it difficult to achieve data flowing.
In addition, before the user decides whether to store the data, a method for measuring different remote systems from a plurality of aspects such as cost, performance, and SLA (Service Level Agreement) is needed. For example, an enterprise user is highly demanding on the performance and SLA of the remote system, while is less demanding on cost control. Through the measurement, the enterprise user may select an optimal selection for a specific storage need of an enterprise application, for example. However, in the existing storage system 100, the user cannot measure a plurality of aspects of the remote systems 130-1, . . . , 130-N, such that an optimal selection cannot be made.
Therefore, when the user uses the existing storage system, the user will face a risk of being locked to a specific storage service, which makes it difficult to achieve data flowing between different remote systems and to measure the remote system to make an optimal selection. Consequently, the efficiency and security of the existing storage system cannot be guaranteed, which directly reduces user experience.
In order to solve the above and other potential problems and deficiencies, embodiments of the present disclosure provide a data processing solution.
Particularly, in the discussion below, an example of using a data object as an object to be operated will mainly be described. However, it should be understood that this is only an example, and not intended to limit the scope of the present disclosure in any manner. In other embodiments, the data may be stored by any appropriate technique, regardless of whether the technique is currently known or future developed.
Similar to the storage system 100 as shown in
The intermediate system 220 may cause the client 110 to operate transparently the data on the remote systems 130-1, . . . , 130-N. Specifically, the intermediate system 220 may provide a set of interfaces compatible with the remote systems 130-1, . . . , 130-N, such that to the client 110, the intermediate system 220 behaves like the remote systems 130-1, . . . , 130-N. In addition, the intermediate system 220 may generate a universal intermediate identifier for the data (for example, each data object), the intermediate identifier being independent of the remote systems 130. Accordingly, the intermediate system 220 may maintain a mapping between the intermediate identifier of the data and remote identifiers of the data in the remote systems 130-1, . . . , 130-N, and the mapping is referred to as “an identifier mapping.” In some embodiments, the intermediate system 220 may also store metadata of the data. The metadata may be used for identifying the data and may include the intermediate identifier, the remote identifiers, and other information for describing the data. Moreover, the intermediate system 220 may measure a plurality of aspects of the remote systems 130-1 . . . , 130-N, such as cost, performance, and SLA, to facilitate the user to make an optimal selection.
Hereinafter, several example operations/functions of the intermediate system 220 will be described with reference to
As illustrated in
In response to the request, the intermediate system 220 converts (420) the intermediate identifier of the data included in the user request in the intermediate system 220 into the identifier of the data in the target remote system (the first remote system 130-1 in this example), referred to as “a first identifier.” In some embodiments, the intermediate identifier is included in the user request. In such embodiments, the intermediate system 220 may extract the intermediate identifier from the user request, and converts the intermediate identifier into the first identifier based on the identifier mapping (for example, the mapping table 300 as shown in
In some embodiments, in order to read the data stored in the first remote system 130-1, the user request may also include a key needed for accessing the first remote system 130-1. In such embodiments, the intermediate system 220 may also extract the key from the user request.
Next, the intermediate system 200 transmits (430) a request for reading the data to the first remote system 130-1 as the target, and the request is referred to as “a first request” The first request includes at least the first identifier. Additionally, as described above, in some embodiments, the intermediate system 220 may extract a key needed for accessing the data from the user request In such embodiments, the extracted key may also be included in the first request. Additionally, in some embodiments, the key needed for accessing the data may be stored in the intermediate system 220, and the intermediate system 220 may automatically include the key in the first request.
In particular, in some embodiments, the intermediate system 220 may perform the conversion of the format/grammar of the request, so as to adapt the characteristics and/or requirements of the destination remote systems. For example, the intermediate system 220 may generate the first request based on the requirements of the first remote system 130-1. The first request may have a grammar and/or format different from that of the original user request, but have a same meaning as that of the user request. In this way, the differences between the remote systems 130 are handled by the intermediate system 220, such that the differences are transparent to the client 110, which facilitates simplifying operations at the client.
After receiving the first request, the first system 130-1 will return the data to be read to the intermediate system 220. Accordingly, the intermediate system 220 receives (440) the data from the first remote system 130-1. In some embodiments, the received data may include its first identifier in the first remote system 130-1 for error checking and log processing or the like. Of course, this is not necessary. The received data may not include the first identifier, or may include other information for similar purposes to that of the first identifier. Then, the intermediate system 220 will provide (460) the data to the client 110.
As shown, in those embodiments where the received data include the first identifier, the intermediate system 220, for example, may convert (450) the first identifier of the data back to the intermediate identifier based on the identifier mapping 310. In such embodiments, when transmitting (460) the data to the client 110, the intermediate system 220 may include the intermediate identifier in the data. In this way, the client 110 may confirm that the obtained data are the requested data. Of course, this is not necessary. In other embodiments, the action 450 may be omitted.
The process 400 of reading the data by the client 110 from the first remote system 130-1 through the intermediate system 220 is described above. In some embodiments, when a plurality of duplicates of the data are stored in a plurality of remote systems, the intermediate system 220 may select any one of the duplicates in the plurality of remote systems, because the duplicates each have a same value. In some embodiments, the intermediate system 220 may measure a plurality of aspects of the plurality of remote systems, so as to select an optimal remote system. In one example, when the intermediate system 220 detects that one of the plurality of remote systems is unavailable, the intermediate system 220 may transmit the user request from the client 110 to other available remote systems, so as to improve availability of the storage system. In another example, the intermediate system 220 may measure network latency of the plurality of remote systems and transmit the user request from the client 110 to the remote system having the lowest latency, so as to improve performance of the storage system.
In response to the request, the intermediate system 220 converts (520) the intermediate identifier of the data included in the user request in the intermediate system 220 into an identifier of the data in the target remote system (the first remote system 130-1 in this example), referred to as “a first identifier.” In some embodiments, the intermediate identifier is included in the user request. In such embodiments, the intermediate system 220 may extract the intermediate identifier from the user request, and convert the intermediate identifier into the first identifier based on the identifier mapping (for example, the mapping table 300 as shown in
In some embodiments, in order to update the data stored in the first remote system 130-1, the user request may also include a key needed for accessing the first remote system 130-1. In such embodiments, the intermediate system 220 may also extract the key from the user request.
Next, the intermediate system 200 transmits (530), to the first remote system 130-1 as a target, a request for updating the data, and the request is referred to as “a first request.” The first request includes at least the first identifier. In addition, as mentioned above, in some embodiments, the intermediate system 220 may extract a key needed for accessing the data from the user request. In such embodiments, the extracted key may be included in the first request. Additionally, in some embodiments, the key needed for accessing the data may be stored in the intermediate system 220, and the intermediate system 220 may automatically include the key in the first request.
Further, in response to the request, the intermediate system 220 converts (540) the intermediate identifier of the data included in the user request in the intermediate system 220 into an identifier of the data in the target remote system (the second remote system 130-2 in this example), referred to as “a second identifier.” In some embodiments, the intermediate identifier is included in the user request. In such embodiments, the intermediate system 220 may extract the intermediate identifier from the user request, and covert the intermediate identifier into the second identifier based on the identifier mapping (for example, the mapping table 300 as shown in
In some embodiments, in order to update the data stored in the second remote system 130-2, the user request may also include the key needed for accessing the second remote system 130-2. In such embodiments, the intermediate system 220 may also extract the key from the user request.
Next, the intermediate system 200 transmits, to the second remote system 130-2 as a target, a request for updating the data, and the request is referred to as “a second request.” The second request includes at least the second identifier. In addition, as mentioned above, in some embodiments, the intermediate system 220 may extract a key needed for accessing the data from the user request. In such embodiments, the extracted key may also be included in the second request In addition, in some embodiments, the key needed for accessing the data may be stored in the intermediate system 220, and the intermediate system 220 may automatically include the key in the second request.
In particular, in some embodiments, the intermediate system 220 may perform the conversion of the format/grammar of the request, so as to adapt the characteristics and/or requirements of the target remote system. For example, the intermediate system 220 may generate a first request based on the requirements of the first remote system 130-1 and generate a second request based on the requirements of the second remote system 130-2. The first request and the second request may have a grammar and/or format different from that of the original user request, but have a same meaning as that of the user request. In this way, the differences between the remote systems 130 are handled by the intermediate system 220, such that the differences are transparent to the client 110, which facilitates simplifying operations at the client.
The process 500 for updating the data at the first remote system 130-1 and the second remote system 130-2 by the client 110 through the intermediate system 220 is described above. The updating may comprise one of creating, deleting, and modifying. In one example, when the client 110 creates the data through the intermediate system 220, the client 110 defines the data in the intermediate system 220 and configures a plurality of remote systems for the data. If the intermediate system 220 receives a user request for creating the data from the client 110 and identifies that the data is defined to have a plurality of duplicates in a plurality of remote systems, a plurality of user requests are transmitted to the plurality of remote systems, respectively, so as to create the data on the plurality of remote systems. In another example, when the client 110 modifies or deletes the data through the intermediate system 220, if the intermediate system 220 receives a user request for modifying or deleting the data from the client 110 and identifies that the data is stored in a plurality of remote systems, a plurality of user requests are transmitted to the plurality of remote systems, respectively, so as to modify or delete the data on the plurality of remote systems. The advantage of managing a plurality of duplicates of the data in the plurality of remote systems through the intermediate system 220 is that the user does not need to configure each remote system separately, which reduces the user workload and improves the efficiency. It may also improve the security of the storage system. For example, if a duplicate of the data in a certain remote system is damaged, duplicates of the data in other remote systems are still available.
In response to the request, the intermediate system 220 converts (620) the intermediate identifier of the data included in the user request in the intermediate system 220 into an identifier of the data in the target remote system (the first remote system 130-1 in this example), referred to as “first identifier.” Additionally, the intermediate system 220 converts the intermediate identifier of the data included in the user request in the intermediate system 220 into an identifier of the data in the target remote system (the third remote system 130-3 in this example), referred to as “a third identifier.” In some embodiments, the intermediate identifier is included in the user request. In such embodiments, the intermediate system 220 may extract the intermediate identifier from the user request and convert the intermediate identifier into the first identifier and the third identifier based on the identifier mapping (for example, the mapping table 300 as shown in
In some embodiments, in order to migrate the data from the third remote system 130-3 to the first remote system 130-1, the user request may also comprise a key needed for accessing the first remote system 130-1 and the third remote system 130-3. In such embodiments, the intermediate system 220 may also extract the key from the user request. In addition, in some embodiments, the key needed for accessing the data may be stored in the intermediate system 220.
Next, the intermediate system 220 obtains (630) the data from the third remote system 130-3 using the third identifier, and stores (640) the data returned from the third remote system 130-3 to the first remote system 130-1 using the first identifier.
The process 600 of receiving, by the intermediate system 220, a user request for migrating the data from the client 110 is described above. During the migration of the data, the intermediate system 220 is a normal client from the perspective of the remote systems 130, and the intermediate system 22 may still obtain the data from the remote systems. Further, the intermediate system 220 may also transmit, to the first remote system 130-1, a new request for the data, after determining that the migrated data is completely stored in the first remote system 130-1. Furthermore, the intermediate system 220 may also transmit, to the third remote system 130-3, a request for deleting the data from the third remote system 130-3, after determining that the existing requests for the migrated data has been completely processed in the third remote system 130-3.
The client 110 may migrate the data between different remote systems as required by using the intermediate system 220, which improves availability and performance of data access, improves security of data protection, and eliminates risks of locking the client 110 to a specific remote system. As a result, the users will have a full control over their data.
Except for the implementation shown in
In some other implementations, the intermediate system 220 may also receive a system request for migrating the data between the remote systems from one or more remote systems 130. For example, the intermediate system 220 may receive, from the remote system 130-3, a system request for migrating the data from the remote system 130-3 to the remote system 130-1. The intermediate system 220 may extract, from the system request, a third identifier of the data in the remote system 130-3. The intermediate system 220 may convert the third identifier into the intermediate identifier in the intermediate system 220 based on the identifier mapping, and further convert the intermediate identifier into the first identifier in the remote system 130-1. The intermediate system 220 obtains the data from the remote system 130-3 using the third identifier, and stores the data into the remote system 130-1 using the first identifier. Using the intermediate system 220 not only benefits the client, but also simplifies interoperations between the remote systems and improves flexibility of the entire storage system.
Next, at 720, the intermediate identifier is converted into a first identifier in the remote system based on the identifier mapping between the intermediate system and the remote system. The identifier mapping, for example, may be implemented by the mapping table 300 shown in
At 730, the data is processed, in association with the remote system, at least partially based on the first identifier. In some embodiments, processing, in association with the remote system, the data comprises: generating, based on the user request, a first request for the operating at the remote system, the first request including the first identifier; and transmitting the first request to the remote system. In some embodiments, generating the first request comprises: generating the first request using a grammar different from that of the user request. In some embodiments, at least one of the user request and the first request includes a key associated with the data.
In some embodiments, the operating comprises reading the data. At this point, at 730, the data may be received from the remote system; and the data may be transmitted to the client.
For example, in some embodiments, the remote system is a first remote system, and the operating comprises updating the data at the first remote system. In some embodiments, updating, for example, may comprise at least one of: creating, deleting, and modifying. In such embodiments, at 730, the intermediate identifier of the data may be converted, based on an identifier mapping between the intermediate system and a second remote system, into a second identifier of the data in the second remote system; and a second request for updating the data at the second remote system may be generated, where the second request includes the second identifier; and the second request may be transmitted to the second remote system.
In some embodiments, the remote system is a first remote system. In such embodiments, at 730, the intermediate identifier may be converted, based on an identifier mapping between the intermediate system and a third remote system, into a third identifier of the data in the third remote system, the third remote system may be different from the first remote system; the data may be obtained from the third remote system using the third identifier data, and the data may be stored into the first remote system using the first identifier
In some embodiments, at 730, the data may be deleted from the third remote system in response to at least one of: determining that the data has been completely stored in the first remote system, and all unprocessed requests for the data having been processed.
The identifier obtaining unit 810 is configured to extract an intermediate identifier of data to be processed from a user request received by a client.
The identifier mapping unit 820 is configured to convert, based on an identifier mapping between the intermediate system and a remote system, the intermediate identifier into a first identifier in the remote system. The data processing unit 830 is configured to process, in association with the remote system, the data at least partially based on the first identifier.
In some embodiments, the identifier obtaining unit 810 is configured to receive, from the client, a user request for operating the data at the remote system; and extract the intermediate identifier of the data from the user request.
In some embodiments, the data processing unit 830 is configured to generate, based on the user request, a first request for the operating at the remote system, where the first request includes the first identifier; and transmit the first request to the remote system. For example, in some embodiments, the data processing unit 830 is configured to generate the first request using a grammar different from that of the user request. Alternatively or additionally, in some embodiments, at least one of the user request and the first request includes a key associated with the data.
In some embodiments, the operating includes reading the data. In such embodiments, the data processing unit 830 is configured to receive the data from the remote system; and transmit the data to the client.
In some embodiments, the remote system is a first remote system, and the operating includes updating the data at the first remote system. In such embodiments, the data processing unit 830 is configured to: convert, based on an identifier mapping between the intermediate system and a second remote system, the intermediate identifier of the data into a second identifier of the data at the second remote system; generate a second request for updating the data at the second remote system, where the second request includes the second identifier; and transmit the second request to the second remote system.
In some embodiments, the remote system is a first remote system. In such embodiments, the data processing unit 830 is configured to convert, based on an identifier mapping between the intermediate system and a third remote system, the intermediate identifier into a third identifier of the data in the third remote system, where the third remote system is different from the first remote system; obtain the data from the third remote system using the third identifier; and store the data into the first remote system using the first identifier.
In some embodiments, the data processing unit 830 is also configured to delete the data from the third remote system in response to at least one of: determining that the data have been completely stored in the first remote system, and all unprocessed requests for the data having been processed.
The units included in the apparatus 800 may be implemented in various ways, including software, hardware, firmware or any combination thereof. In one embodiment, one or more units may be implemented using software and/or firmware, for example, machine-executable instructions stored on a storage medium. In addition to the machine-executable instructions or as an alternative thereto, part or all of the units in the apparatus 800 may be at least partially implemented by one or more hardware logic components. By way of an example, and not limitation, the example types of hardware logic components that can be used comprise programmable field gate array (FPGA), application specific integrated circuit (ASIC), application specific standard product (ASSP), system on chip (SOC), complex programmable logic device (CPLD) and so on.
A plurality of components in the device 900 are connected to the I/O interface 950, comprising: an input unit 960, such as a keyboard, a mouse or the like; an output unit 970, such as various types of displays, loudspeakers or the like; a storage unit 980, such as a magnetic disk, an optical disk or the like; and a communication unit 990, such as a network card, a modem, a radio communication transceiver or the like. The communication unit 990 allows the device 900 to exchange information/data with other device through a computer network such as Internet and/or various kinds of telecommunication networks.
Various processes and processing described above, for example, processes/methods 400, 500, 600, and 700, may be executed by the processing unit 910. For example, in some embodiments, the processes/methods 400, 500, 600, and 700 may be implemented as computer software programs that are tangibly embodied in the machine-readable medium, for example, the storage unit 980. In some embodiments, part or all of the computer programs may be loaded and/or installed onto the device 900 via the ROM 920 and/or communication unit 990. When the computer program is loaded onto the RAM 930 and executed by the CPU 910, one or more acts of methods 400, 500, 600, and 700 described above may be carried out. Alternatively, in other embodiments, the CPU 901 may also be configured to implement the processes/methods described above in any other appropriate manner.
Through the teachings offered by the above description and relevant drawings, many modifications and other implementations of the present disclosure given herein will be appreciated by those skilled in the art. Therefore, it is understood that the embodiments of the present disclosure are not limited to the specific implementations disclosed herein, and the modifications and other implementations are intended to be included within the scope of the present disclosure. Additionally, although the above description and relevant drawings describe the example implementations in the context of some example combinations of components and/or functions, it should be noted that different combinations of components and/or functions may be provided by alternative implementations without departing from the scope of the present disclosure. At this point, for example, other combinations of components and/or functions different from what has been explicitly described above are also anticipated to fall within the scope of the present disclosure. Although specific terms are used herein, they are only used in general and descriptive meanings, but not intended to be limiting.
Number | Date | Country | Kind |
---|---|---|---|
CN201610453751.8 | Jun 2016 | CN | national |