The present disclosure is related to software as a service (SaaS), and more particularly, to a data protection method and associated apparatus such as a host server system.
SaaS technologies may provide software through Internet, which may be regarded as a software delivery model in which software may be licensed on a subscription basis and may be centrally hosted. SaaS may have become a common delivery model for many business applications, such as office collaboration tools like Google G Suite™. Although SaaS vendors seem to be pretty secure, data loss may occur because of human mistakes, such as programmatic errors or malicious activity. As a result, there is a need to back up SaaS data in a way that facilitates a fast return to operational readiness.
One of the objectives of the present disclosure is to provide a data protection method and associated apparatus such as a host server system, in order to solve the related art problems.
According to at least one embodiment of the present disclosure, a data protection method is provided, where the data protection method may include: running a data protection application on a host server system, the data protection application being configured to protect a data set stored in a tenant server system, in which the host server system and the tenant server system are administered by different entities; receiving a plurality of versions of the data set from the tenant server system; and issuing at least one version request to get at least one specific version of the data set from the tenant server system, in which the at least one specific version and the plurality of versions of the data set form a sequential version order of the data set.
According to at least one embodiment of the present disclosure, a host server system is provided, where the host server system may include a network interface circuit, a storage device interface circuit, and a processing circuit that is coupled to the network interface circuit and the storage device interface circuit. The network interface circuit may be arranged to couple the host server system to at least one network, and the storage device interface circuit may be arranged to install at least one storage device for storing information. In addition, the processing circuit may be arranged to control operations of the host server system, for example, the operations may include: running a data protection application on the host server system, the data protection application being configured to protect a data set stored in a tenant server system, in which the host server system and the tenant server system are administered by different entities; receiving a plurality of versions of the data set from the tenant server system; and issuing at least one version request to get at least one specific version of the data set from the tenant server system, in which the at least one specific version and the plurality of versions of the data set form a sequential version order of the data set.
One of the advantages of the present disclosure is that the present disclosure can properly backup all versions of target SaaS data. In comparison with the related art, the present disclosure can achieve the goal of continuous data protection of a SaaS system without side effects or in a way that is less likely to introduce side effects.
These and other objectives of the present disclosure will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
Embodiments of the present disclosure provide a data protection method and associated apparatus such as a host server system, for protecting user data on a tenant server system, and more particularly, performing continuous software as a service (SaaS) backup. The term “SaaS” may refer to a software distribution model in which a third-party provider hosts applications and makes them available to customers over the Internet. The user data to be protected, such as that of the applications hosted by the third-party provider, may be regarded as SaaS data. There are some advantages of SaaS, such as agility and staffing. Regarding agility, a SaaS vendor may provide various supports in ways that on-premises management (e.g. associated maintenance, etc.) is unable to do so. SaaS vendors can adapt to rapid change for users' needs. Regarding staffing, SaaS applications may reduce the need for on-premises management, such as updates, patches, and maintenance. Although SaaS vendors seem to be pretty secure, data loss may occur because of human mistakes. The present disclosure can back up the SaaS data in a way that facilitates a fast return to operational readiness (e.g. Recovery Time Objective (RTO)). For example, the host server system may include at least one network storage server (e.g. one or more network storage servers, such as one or more network attached storage (NAS) servers), and may obtain and store a series of continuous versions of the SaaS data into the network storage server. The series of continuous versions may include more versions than normal versions, and more particularly, may include some hidden versions of the SaaS data that are typically not accessible (e.g. not viewable) on a SaaS user interface (UI). As a result, the present disclosure can properly protect the SaaS data since there is no data loss of any version.
For example, under control of the processing circuit 110, the host server system 10 may provide at least one network-based UI to allow controlling the host server system 10 to have a continuous SaaS backup configuration regarding a set of cloud drives, in which the set of cloud drives are provided by at least one SaaS vender, and the set of cloud drives are accessible through a set of SaaS accounts, respectively. Based on the continuous SaaS backup configuration, the processing circuit 110 may control the host server system 10 to monitor the set of cloud drives through the set of SaaS accounts, respectively, and to perform backup on the set of cloud drives to store versions of each file of each cloud drive of the set of cloud drives into the host server system 10, without omitting any change in the cloud drive. Under control of the processing circuit 110, the host server system 10 may monitor all events related to changes in the cloud drive, and the events may include at least one change event regarding file change of the file. For example, the events may include a delete event regarding file deletion, a user defined event regarding the cloud drive, etc. In addition, during controlling the host server system 10 to have the continuous SaaS backup configuration, the processing circuit 110 may obtain identification information associated with the set of SaaS accounts through the aforementioned at least one network-based UI, to make the host server system 10 to have authorization of at least one portion of the set of SaaS accounts.
According to some embodiments, the aforementioned at least one SaaS vender may include a plurality of SaaS venders, such as a first SaaS vender and a second SaaS vender. The set of cloud drives may include multiple first cloud drives provided by the first SaaS vender, and include multiple second cloud drives provided by the second SaaS vender. In addition, the set of SaaS accounts may include multiple first SaaS accounts and multiple second SaaS accounts, in which the first cloud drives are accessible through the first SaaS accounts, respectively, and the second cloud drives are accessible through the second SaaS accounts, respectively.
According to this embodiment, the processing circuit 110 is capable of running the data protection application on the host server system 10, and the data protection application is configured to protect a data set stored in the tenant server system 5, such as at least one portion (e.g. a portion or all) of the SaaS data, in which the host server system 10 and the tenant server system 5 are administered by different entities. Examples of the data set may include, but are not limited to: public cloud documents, mail data, calendar data, etc. Under control of the processing circuit 110, the host server system 10 may receive a plurality of versions of the data set from the tenant server system 5, and may issue at least one version request to get at least one specific version of the data set from the tenant server system 5, in which the aforementioned at least one specific version and the plurality of versions of the data set form a sequential version order of the data set. As a result, the host server system 10 receives each version of these versions of the data set (e.g. the aforementioned at least one specific version and the plurality of versions), to protect the data contents of the above-mentioned each version. For example, the plurality of versions of the data set may represent the whole of the SaaS data, but the present disclosure is not limited thereto. In addition, a command for issuing the aforementioned at least one version request may vary, for example, depending on an application programming interface (API) as suggested by the SaaS vendor. Additionally, regarding the sequential version order of the data set, the aforementioned at least one specific version and the plurality of versions of the data set constitute continuous versions of the data set (e.g. the SaaS data regarding file, image, mail, etc.), such as the versions having continuous version numbers. For example, the plurality of versions of the data set forms a non-sequential version order of the data set, and after obtaining the aforementioned at least one specific version, the host server system 10 owns the continuous versions of the data set. As the host server system 10 obtains the continuous versions of the data set and can recover the SaaS data according to the continuous versions when needed, no data loss should occur. According to some embodiments, the host server system 10 may be administered by a first entity (such as a home user or an enterprise who owns host server system 10, an administer of the host server system 10), and the tenant server system 5 may be administered by another entity that is typically different from the first entity (such as SaaS providers, in which the first entity may subscribe or freely use the services of the SaaS providers).
In addition, the public cloud handler 320B may include some program sub-modules such as some handlers for handling associated operations regarding the public cloud(s) (e.g. the one or more tenant server systems {5} of the one or more third-party providers). For example, the public cloud handler 320B may include an account handler, a drive handler, a mail handler, a calendar handler, and a contact handler. The account handler may include an authentication (or “Auth” in
Additionally, there may be multiple groups of service handlers corresponding to multiple services, respectively. For example, the drive handler may include a file change handler, a file backup handler, and a file restore handler, arranged to handle file change detection operations, file backup operations, and file restoring operations, respectively. The mail handler may include a mail change handler, a mail backup handler, and a mail restore handler, arranged to handle mail change detection operations, mail backup operations, and mail restoring operations, respectively. The calendar handler may include a calendar change handler, a calendar backup handler, and a calendar restore handler, arranged to handle calendar change detection operations, calendar backup operations, and calendar restoring operations, respectively. The contact handler may include a contacts change handler, a contacts backup handler, and a contacts restore handler, arranged to handle contacts change detection operations, contacts backup operations, and contacts restoring operations, respectively. Furthermore, the version manager 330 may include some program sub-modules for handling version issues, such as a path based versioning unit, a single instance handler, and a dedupe handler, arranged to perform path-based versioning operations (e.g. maintaining contents and metadata of different versions of a file, in which the metadata including the directory structure of different versions of a file), single instance maintaining operations (e.g. maintaining single instance such as the same data set commonly owned or shared by multiple users), and dedupe operations, respectively.
According to some embodiments, in the architecture shown in
Specifically, the identification information may include a plurality of identifiers, such as an access identifier and a data set identifier. In order to establish connection between the host server system. 450 and the tenant server system. 460, the tenant server system 460 needs to authorize the host server system 450 to access data set (such as files) stored in the tenant server system. 460. The authorization operation is implemented according to the identification information.
After receiving the backup request message 481, the host server system 460 may forward a setting page to the tenant server system (message flow 482). For example, a setting page shown on the browser that is originally used to manage the data protection application may be forwarded to a setting page of the tenant server system 470 to enter the authorization information, such as account number and password. The tenant server system. 470 may verify the authorization information obtained from the client device 450. After verifying the authorization, the tenant server system 470 may transmit the access identifier (message flow 483) to the host server system 460 in order to allow the host server system 460 to access the data set stored in the tenant server system 470. In an example, the access identifier may be an access token, and the host server system 460 may exchange messages with the tenant server system 470 according to the access token.
After the host server system 460 has a permission to access the data set stored in the tenant server system 470, the host server 460 may begin to back up a plurality of versions of the data set, and the host server 460 may further need to request a specific version of the data set, so as to form a sequential version order of the data set in the host server system 460 (Message flow 484). In an embodiment, when initiating backup of the versions of the data set, the tenant server system 470 may transmit at least one data set identifier to the host server system 460 (Message flow 485). The host server system 460 may download the data set from the tenant server system 470 according to the data set identifier. For example, when the host server system 460 requests to download a specific version of the data set, the tenant server system 470 may transmit a plurality of data set identifiers to the host server system 460. The plurality of the data set identifiers are associated with the specific version of the data set. That is, the plurality of the data set identifiers are utilized to download the specific version of the data set. Each one of the data set identifiers may be used to request (or download) a portion of the specific version of the data set. For example, if there are three data set identifiers used for requesting the specific version of the data set, then each data set identifier can only download one third portion of the specific version of the data set. After the download of the specific version of the data set is complete, the plurality of the data set identifier may be discarded. By using a plurality of data set identifiers, the data set backup efficiency from the tenant server system 470 can be improved. For example, the plurality of portions of the data set can be transmitted from the tenant server system 470 to the host server system 460 simultaneously according to the plurality of the data set identifiers, and if one of the portions of the data set fails to download to the host server system 460, only the portion unsuccessfully downloaded needs to be downloaded again, instead of downloading the whole data set. In an embodiment, the data set identifier may include a page token, but the present disclosure is not limited thereto.
Regarding the JobBased task framework, some implementation details may be described as follows. In the beginning, the data protection application 410 may start working. Taking the user data on the public cloud as an example of the SaaS data to be protected (e.g. the data set), assume that a task for backup of the user data on the public cloud has been established. For example, there may be three user accounts on the public cloud whose user data should be protected by the host server system 10, and there may be three backup jobs corresponding to the three user accounts. Therefore, the job manager may notify the job workers of the three backup jobs. The job worker manager may manage job workers. For example, the job manager may create a job worker to process a backup job, or the job manager may end the job worker when the backup job is complete. In some embodiments, the job workers may detect any SaaS data change (e.g. a change of SaaS data, such as a change of a file) for each of these user accounts. In an ideal case, it is best that a job worker corresponds to a user account, but the present disclosure is not limited thereto. For example, it is also workable that a job worker corresponds to multiple users. Although version loss may occur, the host server system 10 is capable of restore the lost version, and therefore is reliable. In comparison with this, the related art lacks a reliable architecture to do so. In addition, when detecting a SaaS data change such as a file change, the job worker may generate an event, and this event may be transmitted to the event manager. The event workers may pull event from the event manager, so the event works is aware of the existence of the latest version of the data set (such as that of the files of the three user accounts). When detecting the existence of the latest version, the event works may download the latest version. The authentication (or “Auth” in
After the latest version is downloaded, the file change handler may detect whether the version numbers are continuous or not. When detecting that the version numbers are not continuous, the file change handler may determine that version loss occur and generate other event(s) to the event manager, and the event manager may take charge of issuing a command to get the lost version(s). In addition, the path based versioning unit such as the versioning framework 432 may manage version architecture of files (e.g. the architecture of the directories for storing the files), the single instance handler such as the single instance framework 434 may maintain single-instancing among a plurality of versions of a data set and the data dedupe handler such as the block level dedupe framework 436 may store only changed block to avoid block duplication regarding the data set.
The single instance handler and the data dedupe handler can save the storage space of the host server system 10. More specifically, in a scenario of file collaboration environment through the internet, different user accounts may edit on the same file on the tenant server system, and produce a lot of versions of the file under every user account. The plurality of versions of the file may have a lot of duplicated parts under the same user account and also under the different user accounts. The present application can de-duplicate data across different user accounts.
When the host server system 710 firstly receives the file A associated with the user account A, the host server system 710 may full download the entire data set of the file A. In an embodiment, the file A may be divided into a plurality of data block 711. When the first version of file A is transmitted to the host server system 710, the host server system 710 receives each and every piece of the data block 711 of the file A (the full download). Furthermore, the host server system 710 may receive a characteristic value of the first version of the file A, and may also record the source file path of the file A. In an embodiment, the characteristic value may be a hash value (such as the hash value “abc” in
In addition to the data set of the user account A, the present disclosure can also download the data set associated with the user account B. For example, in an embodiment of
In an embodiment, the characteristic value of the first version of the file B is transmitted to the host server system 710 instead of transmitting the entire data set of the first version of the file B. If the host serve system 710 already stores the same characteristic value of the first version of the file B, then there is no need to transmit other portions of the first version of the file B, so as to save the bandwidth. In an embodiment, the host server system 710 may issue a HTTPS command to download or receive the file B.
Referring to
According to some embodiments, the processing circuit 110 may store the plurality of versions of the data set and the at least one specific version of the data set to the host server system 10 (e.g. the one or more storage devices therein). Specifically, under control of the processing circuit 110, the host server system 10 may monitor a change event of a directory including a predetermined version of the data set. All the data changed in the directory is monitored. For example, once the predetermined version of the data set has changed, the change event will be detected by the processing circuit 110. The processing circuit 110 receives a latest version of the data set after the change event is detected, in which the latest version of the data set may be revised from the predetermined version of the data set, and the predetermined version may represent a version having a latest version number at a certain time point. For example, when the SaaS data such as a file is changed, the version number is increased, more particularly, with the increment of one. The version number of the latest version is typically greater than the version number of the predetermined version, and the version number difference between the latest version and the predetermined version (e.g. the difference between the version number of the latest version and the version number of the predetermined version) is greater than or equal to one. For example, when the version number difference is equal to three, some intermediate versions of the data set are lost. As the latest version is the latest in comparison with the predetermined version, the latest version may be changed or revised from the predetermined version. When the version number difference is equal to one, the latest version is changed or revised from the predetermined version directly. When the version number difference is greater than one, the latest version is changed or revised from the predetermined version indirectly, for example, through the revision of the intermediate version(s). If there are any lost version, for example the intermediate version, the host system 5 will get the intermediate version. As a result, the host server system 10 can receive each version of these versions of the data set (e.g. the aforementioned at least one specific version and the plurality of versions) to protect the data contents of the above-mentioned each version, and more particularly, stores the data contents of these versions of the data set into the one or more storage devices of the host server system 10, to prevent data loss of any of these versions.
According to this embodiment, under control of the processing circuit 110 running the data protection application, the host server system 10 may monitor whether there is any change of the SaaS data (e.g. the files on the SaaS application). When detecting any change of the SaaS data (e.g. the files on the SaaS application), the host server system 10 may get the new version(s) of the data set (e.g. the latest version, and the intermediate version(s) if exist) from the tenant server system 5. As a result of holding the new version(s), the host server system 10 can prevent data loss. For example, the host server system 10 may issue a content request to get the latest version of the data set. Regarding the associated advantages of issuing the content request to get the latest version of the data set, as the file getting operation is triggered by the data protection application, the host server system 10 has the privilege to control the whole backup in an active manner. As a result, it is safer for the host server system 10 (e.g. the host server system 10 can only back up some user accounts, and will not receive any unexpected file such as a file that belongs to the user accounts that are not in the back up list), and the host server system 10 has the chance to adjust, and the host server system 10 can save bandwidth (e.g. prevent unnecessary file transmission). In addition, the host server system 10 may parse the latest version of the data set to determine whether the latest version of the data set and the predetermined version of the data set are received in sequence. As a result, the host server system 10 may determine whether the latest version and the predetermined version are continuous versions of the data set, and more particularly, determine whether there is any intermediate version between the latest version and the predetermined version. When the latest version and the predetermined version are not continuous versions of the data set, the host server system 10 may get all of the intermediate version(s) between the latest version and the predetermined version from the tenant server system 5.
According to some embodiments, the host server system 10 may receive a identification information including a plurality of identifiers from the tenant server system 5, and the identification information may be associated with the data set, in which the issuing of the aforementioned at least one version request may include sending the data set identifiers to the tenant server system 5. In an embodiment, the data protection application may provide multiple binding methods for binding the SaaS applications such as that of the public cloud with the host server system 10 through one or more setting pages of the data protection application according to the access identifiers. According to an embodiment, in the beginning when establishing a backup task, the host server system 10 (e.g. the authentication handler) may import the access identifier provided by the public cloud provider from outside of the host server system 10. For example, the data protection application may guide the user with some hint messages, to make the user login onto a certain site of the public cloud provider with the public cloud account and password, and establish the backup task for the user with a task name. The data protection application may transmit the access identifier to the public cloud, which may verify the access identifier after authentication is completed. Afterward, the data protection application may have permission to download the SaaS data of the public cloud. According to another embodiment, when the user is interacting with the data protection application for establishing the backup task, as guided by the data protection application, it may be forwarded to an account authentication page of the service provider, in which when the account and password are correct, the service provider may give the access identifier. In the embodiments of the present application, the user's account and password will not leak to the host server system 10. As a result, the user's account and password can be protected.
According to some embodiments, under control of the processing circuit 110 running the data protection application (more particularly, the single instance handler such as the single instance framework 434), the host server system 10 may record a plurality of mapping relationships between keys and values into a hash table, in which the keys of the hash table include at least hashes of files in the tenant server system 5, and the values of the hash table include paths of the files in the tenant server system 5. For example, the keys of the hash table include size plus hash information of the files in the tenant server system 5, in which the size plus hash information of the files includes combinations of sizes of the files and the hashes of the files, respectively, but the present disclosure is not limited thereto. In some embodiment, when obtaining the latest version of the data set (e.g. a file), based on the hash table, the host server system 10 (e.g. the single instance handler running thereon) may check whether all the versions of the data set in the host server system 10 have the same characteristic information (e.g. the same size and the same hash value) as that of the latest version, to generate a first checking result, in which the first checking result indicates whether the latest backup version has the same characteristic information (e.g. the same size and/or the same hash value) in the host server system. According to the first checking result, the host server system 10 may determine whether to skip downloading the latest version. When the first checking result indicates that the latest backup version has the same characteristic information as that of the latest version, the host server system 10 may skip downloading the latest version; otherwise, the host server system 10 may download the latest version. For example, the aforementioned same characteristic information may include the same size and the same hash value. For another example, the aforementioned same characteristic information may include the same hash value. In some embodiments, when it is determined according to the first checking result to skip downloading the latest version, the host server system 10 may create pointing information regarding the latest backup version.
Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.
This application claims the benefit of U.S. provisional application No. 62/510,236, which was filed on May 23, 2017, and is included herein by reference.
Number | Date | Country | |
---|---|---|---|
62510236 | May 2017 | US |