1. Field of the Invention
This invention is related to a system for saving a file and a control method therefor.
2. Description of the Related Art
In a system for multiply saving a file, there has been proposed a technique for saving a file requested to be saved to a file server with a low load. Japanese Patent Laid-Open No. 2000-207370 discloses a technique for regularly reporting load information of the file server itself by each server to another server and identifying the server with the low load by referring to the load information when saving the file to save the file to the server with the low load.
Here, an information processing system for receiving a file and multiple savings is assumed. This information processing system is, for example, a system for receiving a file and multiple savings from a client and saving the received file to a plurality of file servers with the instructed multiple savings to improve availability of the server.
When the file server for saving the file according to the multiple savings is determined according to the technique disclosed by Japanese Patent Laid-Open No. 2000-207370, the following problems may occur. Specifically, in Japanese Patent Laid-Open No. 2000-207370, each file server reports a loaded state to a management server, and the file is saved to the server with a low load. If there are many files to be saved and the like, it is necessary to shorten the reporting intervals due to the intense change in the load of each file server. However, if the reporting intervals are too short or the number of the file servers is large, the load may be concentrated on the management server.
This invention provides an information processing apparatus for efficiently a file save processing without concentrating a load on a network or single apparatus as far as possible.
According to a system of the present invention, a system comprising an external device and a plurality of information processing apparatuses, wherein the external device comprises: a distributing unit configured to distribute a file saving request to the plurality of information processing apparatuses in request units and perform the file saving request to a single information processing apparatus, wherein the single information processing apparatus comprises: a saving unit configured to save a file of the request to the single information processing apparatus; and an instructing unit configured to repeat processing for instructing the information processing apparatus that should subsequently save the file among the plurality of information processing apparatuses to save the file based on multiple savings.
Further features of the present invention will become apparent from the following description of exemplary embodiments (with reference to the attached drawings).
Firstly, as a product of replications of the file, for example, a description will be given of a function for creating a replication in a Distributed File System (®) (hereinafter, referred to as “DFS”) from Microsoft (hereinafter, referred to as a “DFS replication function”). When a user uses the DFS replication function, the user registers a plurality of servers beforehand in an Active Directory (®) (hereinafter, referred to as “AD”). Then, the DFS replication function is used to manually create a replication rule, such as “replicate a file saved in a server A to a server B” and share the rule between all of the servers via an AD server to enable replicating the file. However, if one apparatus fails, the DFS replication function does not dynamically switch a replication address to reduce the number of replications of the file. As a result, the file cannot be saved with the instructed multiple savings to reduce the availability of the server. To solve the above problem, it is necessary not to fix a replication rule and to dynamically select a saving address server of the file from among the operating servers.
However, a bottleneck may be caused by the algorithm if the saving address server is dynamically selected and the performance of the server is reduced. The following, for example, can be considered as a selection algorithm for the saving address server:
(1) a method for selecting a server with less capacity from among the operating file servers; and
(2) a method for selecting a server at random from among the operating file servers.
In method (1), the used capacity of each server can be equally distributed to use the file server efficiently. In contrast, it is necessary to manage the capacity of all files and the saving address thereof with a database or the like and calculate how much the files are saved and which file server is saved every time the file is saved. Also, if the file server is scaled out, the file is saved intensively and a disk I/O is intensively performed since there are fewer files in the server immediately after the scale out compared to the other servers.
In method (2), a load is concentrated to some extent since access to the database is generated only in acquiring the operating server. Thus, the level of the load is less than that in the method (1). Also, if a network load, a CPU, and a memory are taken into account, although the access is equally distributed to each server in the long run, the load can be concentrated on one server at an instant. Accordingly, since the bottleneck is generated no matter which algorithm of the method (1) or (2) is selected, the occurrence of the bottleneck can cause a reduction in the performance of the system. Hereinafter, a description will be given of a configuration that enables resolving such an event.
In an aspect for performing each process at the server computer side, there has been proposed techniques such as a cloud computing system or SaaS (Software as a Service). Also, in cloud computing, it is possible to simultaneously process requests from many clients by utilizing numerous computing resources, and by conducting distributed execution of data conversion and data processing. In addition, to fully utilize this feature of cloud computing, the present specification considers a method for implementing a series of processes on the server by connecting finely defined tasks, and simultaneously processing the tasks in parallel to scalably process a large number of jobs.
Here, the “task” refers to processing content comprising a job or a process on the software to implement the processing content in the present specification. In this case, a temporary file that should be processed by a task at the head and a temporary file generated as a result of the processing in each task are considered to create the replication to a plurality of file servers to assure the availability of the system in a job processor.
A job management service server that controls a job comprising one or more task(s), information related to the job, a job execution order, and the like are contemplated. Each task can start up a plurality of respective instances. In addition, each instance asynchronously acquires the job from the job management service server, and performs, for example, image processing such as black dot removal, or a process of storing data to a shared folder. A file management service server group manages binary data to be processed by each task. Each task acquires data to be processed from the file management service server group as needed and saves a processing result. The data input to the file management service server group as a result of the task processing is called “data resulting from task processing” in the present specification. Also, the data is information included in the file. In the present specification, an application for inputting the job to the job management service server is called a “service application”.
The service application inputs the job to the job management service server, while the data to be processed is input to the file management service server group. The data input to the file management service server group at the same time as the job input is called “initial data” in the present specification. Also, the data is information included in the file.
The initial data and the data resulting from the task processing are saved to the plurality of file servers to retrieve the data after being saved. Thereby, even if a failure or the like is generated in the server, the initial data and the data resulting from the task processing can be retrieved to improve the availability of the system by restarting the processing based on the initial data.
However, one or more of the database, the network, the CPU or the disk I/O is (are) determined to be the bottleneck depending on the saving address selection algorithm when saving the file to the plurality of file servers. Thereby, the performance of the temporary file management service server group can be reduced. In the examples described below, a description will be given of a method for selecting the saving address of the file without reducing the performance of the system.
The configuration as shown in
The networks 110 to 112 are referred to as “communication networks” implemented, for example, by any of a LAN, WAN, telephone circuitry, dedicated digital circuitry, ATM or frame relay circuitry, cable television circuitry, data broadcasting wireless circuitry of the Internet and the like, or a combination thereof. The LAN stands for “Local Area Network”. The WAN stands for a “Wide Area Network”. The ATM stands for an “Asynchronous Transfer Mode”.
The networks 110 to 112 may be communication networks implemented by the combination of the LAN to the data broadcasting wireless circuitry as described above. Specifically, the networks 110 to 112 can transmit/receive the data. In this example, because the information processing system of the present embodiment is the cloud system, the networks 110 and 112 are the Internet, and the network 111 is a network within a corporation or a network of a service provider.
The scan server 101, the flow server 102, and the task servers 103 and 104 are executed on the server computer by a virtual server, and these service server groups provide a cloud service to the user. Also, the cloud service server 108 is publicly available on the Internet, and the cloud service server 108 is also executed on the server computer.
Hereinafter, each function of the server in the present specification may be realized by single server or single virtual server, or by a plurality of servers or a plurality of virtual servers. Alternatively, the plurality of servers may be executed as the virtual server on single server.
The client terminal 106 comprises, for example, a desktop personal computer, a notebook personal computer, a mobile personal computer, a PDA (personal data assistant), or the like. However, the client terminal 106 may also be a mobile phone incorporating a program execution environment. The client terminal 106 incorporates an environment in which a program such as a Web browser (an internet browser, a WWW browser, a browser provided for World Wide Web use) is executed.
The CPU 202 controls entire apparatus. The CPU 202 executes an application program, OS and the like stored in the HDD 205, and controls the information and the file and the like that is required in the execution of the program to be stored temporarily in the RAM 203. The OS stands for an “Operating System”. The ROM 204 is a storing unit configured to store each type of data such as a basic I/O program. The RAM 203 is a temporary storing unit configured to function as a main memory of the CPU 202, work area or the like. The HDD 205 is one of external storing units configured to function as a large-capacity memory and store application programs such as Web browsers, service group programs, OS, related programs, and the like.
The display 206 is a displaying unit configured to display a command and the like input from the keyboard 207. The interface 208 is an external device I/F, and connects a printer, USB equipment, and peripheral equipment. The keyboard 207 is an instruction inputting unit. A system bus 201 conducts the flow of the data within the apparatus. The CPU 202 to the interface 208 is connected to the system bus 201. The NIC 209 exchanges the data to the external device via the interface 208 and the networks 110 to 112. Note that the configuration of the apparatus as shown in
Next, a description will be given of the scan server 101, the flow server group 102, the task servers 103 and 104 that provide the cloud service.
The Web application 501 provides an application program that provides a scan function. A ticket creation unit 511 realizes a series of functions to create a scan ticket by the user. The scan ticket records a setting during the scan of a manuscript with the image forming apparatus 107, a definition of a subsequent processing flow, a parameter for a task performed in each processing flow, and the like.
An external I/F 514 communicates to a scan software unit (not shown) that operates on the image forming apparatus 107. From the scan software unit, access to a function of a ticket list unit 512 and a scan receiving unit 513 is performed via the external I/F 514. The ticket receiving unit 512 generates a ticket list based on ticket information saved in a ticket management unit 515 and returns the generated list to the image forming apparatus 107 in accordance with the request from the image forming apparatus 107.
The file saving library 502 is a library used when saving data to the flow server group 102. The detail description thereof will be described as below. The scan receiving unit 513 receives the scan ticket and the image data from the image forming apparatus 107. Then, the scan receiving unit 513 transmits the received scan ticket and the image data to a file saving unit 521.
Next, a description will be given of the flow until an input of the scan job as illustrated in
In S705, the scan software unit of the image forming apparatus 107 performs acquisition of the ticket list to the ticket list unit 512 via the external I/F 514. The image forming apparatus 107 may be an apparatus with functions of both the scan and the print, and may also be a dedicated scan apparatus with function of only the scan. The ticket list unit 512 generates a list of the scan ticket by using the ticket management unit 515 and returns the generated ticket list to the scan software unit as a response. The image forming apparatus 107 receiving the response displays the acquired ticket list on the user interface.
In S707, the user selects any of the tickets displayed on the user interface of the image forming apparatus 107 and places a paper in a scan device equipped with the image forming apparatus 107 to carry out the scan. Thereby, the scan software unit transmits the scanned image data and the scan ticket to the scan receiving unit 513 via the external I/F 514 (S708).
In S714, the scan server 101 transmits the received image data to the flow server group 102 and requests saving the data. In this processing, the file saving unit 521 inputs the file information including the multiple savings to the flow server group 102, together with the image data. The file information is described as below. Thereby, the file management service server group 803 of the flow server group 102 receives the file (the image data in the present embodiment) and the file information related to the file.
After receiving the image data correctly, the flow server group 102 responses with an ID (a file group ID) uniquely representing the image data to the scan server 101 in S715. Then, in S716, the scan receiving unit 513 transmits the file group ID, the scan ticket, a tenant ID, and the multiple savings as the job information to the flow server group 102. In the processing, the tenant ID is an ID to which the user who inputs the job belongs and is unique to the tenant. The above processing describes the system configuration of the scan server 101 and the flow until the input of the scan job.
As shown in
The file management service server group 1203 manages saving of data present at the time of the job input and data resulting from the respective task processing. More specifically, the file management service server group 1203 stores a file depending on a request from the scan server and 101 and the task servers (103, 104), and manages a path to the storage destination of the file. If the task server requires the file acquisition, the scan server 101 returns binary data of the saved file to the task server. Also, if the task server or the job management service server group 1202 requests the deletion of the file, the file management service server group 1203 deletes the saved file. By using the function of the temporary file management service server group 1203, the scan server 101 and the task server can perform file saving, acquisition, and deletion irrespective of the path to the file storage destination or the status of the file server.
Next, a description will be given of the file management service server group 1203.
Note that the file management service servers A1401 to X1403 may be implemented as virtual servers on one or more server computer(s). If the servers are implemented as the virtual servers on the one server computer, the network 1410 is implemented by a system bus on the server computer.
Next, referring to
The file management server managing DB unit 1531 manages information about the file management service servers A1401 to X1403, which are storage destinations of the file. Also, the file management server managing DB unit 1531 receives a request from a saving address server priority determining unit 1522 and accesses a DB common to each server to acquire information about the file management service server while start-up.
The path management DB unit 1532 manages information about a temporary file saved in the data storing area unit 1541 of the file management service servers A1401 to X1403 as an entry managed by the file management service server group 803. The temporary file refers to a file of the initial data saved from the scan server 101 and the result of the task processing saved from the task servers 103 and 104.
A path 1614 refers to a full path of the storage destination of the temporary file corresponding to each entry and is used in accessing the entity via the back-end unit 1502 by the Web application unit 1501. A host name 1615 refers to a host name of the file management service server for the storage destination of the temporary file corresponding to each entry. A creation date 1616 refers to a time when the storage of the temporary file to the data storing area unit 1541 is completed. An expiration date 1617 refers to an expiration date of the temporary file, and the temporary file corresponding to the entry is deleted if the expiration date of the temporary file is passed. A tenant ID 1618 refers to a tenant ID of a tenant to which the user saving the temporary file belongs.
Next, a description will be given of each function of the Web application unit 1501. The Web application unit 1501 comprises a file saving unit 1511 and a file acquisition unit 1512. The file saving unit 1511 implements a function for multiplexing a file with the instructed multiple savings and saving the file to the data storing area unit 1541 depending on the request from the scan server 101 or the task servers 103 and 104. The request from the scan server or the task servers 103 and 104 comprises information related to the saved file, such as a task ID 1612 and NO1613, the expiration date 1617, and the tenant ID 1618, which are managed as the entry of the path management DB unit 1532. As a whole, the above information is called “file information” in the present specification.
Next, a description will be given of each function of the back-end unit 1502. The back-end unit 1502 comprises a file save processing unit 1521, a file acquisition processing unit 1523, and a saving address server priority determining unit 1522. From the scan server 101 or the task servers 103 and 104, the file save processing unit 1521 receives a file saving request via the file saving unit 1511. The file save processing unit 1521 that receives the request performs an acquisition request for the priority of the file management service server to which the data storage area unit 1541 that is set as the file saving address to the saving address server priority determining unit 1522 belongs. In the present specification, the file management service server to which the data storage area 1541 that is set as the saving address of the file belongs is called a “file saving address server”.
Next, the file save processing unit 1521 extracts the file saving address server to the amount equivalent to the instructed multiple savings starting from the higher priority of the file saving address servers, and writes a file to the data storing area unit 1541. Then, the file save processing unit 1521 adds the entry performing the file writing to the path management DB unit 1532. Finally, the file save processing unit 1521 responds to the scan server 101 or the task servers 103 and 104, which are request sources, with a notification of the normal save via the file saving unit 1511.
Referring to
In S631 of
A priority 911 is used in determining the file saving address server by the file save processing unit 1521. A host name 912 is a host name of the file saving address server corresponding to the priority 911. An active flag 913 is a true/false value illustrating whether or not the connection to the file management service server that exists in the host name 912 can be performed, and if the connection can be performed, the value is set as “True” and if the connection cannot be performed, the value is set as “False”. In this example, the priority 911 sets the ID 1601 to be an “own server, a server larger than the own server (ascending order), a sever smaller than the own server (descending order)” from the higher priority (smaller priority 911) of the file saving address server. ID 1601 is used in determining the priority, but any one may be used even if all of the file management service servers are included.
After determining the priority, the file save processing unit 1521 on the file management service server D611 performs file save processing S633 to the file management service server D611 of the priority 1. Next, the file save processing unit 1521 performs file save processing to a file management service server E612 of the priority 2. (S634). In the file save processing, the file management service server D611 receiving the saving request instructs that the file be saved to the file management service server E612. Accordingly, if the multiplicity is three, the file management service server D611 instructs that the file be saved to the file management servers E612 and E613. Thereby, constant processing in which the file management server D611 saves a file, and then instructs another file management service server in which the file should be saved after that to save the file is repeated just in an amount equivalent to the multiple savings. Note that considering a communication error and the like, the present embodiment may be configured to wait for a notification of file saving completion from a file management service server E612, and transmit the saving instruction to the file management service server F613.
The priority information 601 and the priority information 651 comprise all of the file management service servers D611 to G614 and are determined to shift the priority one by one respectively. A description of the priority will be omitted for the case in which the file saving request is distributed to the file management service server F613 or G614 by SLB 621. Also, all of the file management servers D611 to G614 are included in the present embodiment. Additionally, the priority is determined to be shifted one by one respectively. The term “ring-shaped” in “determine the priority to be ring-shaped” denotes a circle of the file management server E612→F613→G614→D611→E612 . . . When the priority “1” of the file management service server is determined, the priority is automatically determined in the order of this circle.
Since the load can be concentrated in one file saving address server if two or more priorities are shifted, or initially, the order is random, depending on a distribution address of the file saving request 631, and preferably, the priority is shifted one by one.
Referring back to
A first effect of the first embodiment is that the reduction of the performance due to the occurrence of a bottleneck can be prevented even if the file saving is required from the scan server 101 or the task servers 103 and 104.
A description will be given of a second effect due to the first embodiment referring to
When the file management service server L825 has recovered from the failure, the processing transits from
In the first embodiment, a case is supposed in which a large number of file saving requests are executed from the scan server 101 or the task servers 103 and 104 to the file saving unit 1511. At this time, a large number of the priority acquisition requests are generated to the saving address server priority determining unit 1522. Specifically, access is concentrated on the file management server managing DB for managing a startup state and a non-startup state in each apparatus. Therefore, the load on the database is increased, and the performance of the server is reduced.
When via the file saving unit 1511, the file save processing unit 1521 receives the file saving request from the scan server 101 or the task servers 103 and 104, a difference between the second embodiment and the first embodiment is the method for determining the priority of the saving address server priority determining unit 1522. In the second embodiment, a description will be given of a method for determining the priority of the file saving address server by the saving address server priority determining unit 1522.
Next, referring to the flow chart as shown in
If the priority information has not been determined to be within the expiration date in S1201, similar to the first embodiment, in S1211, the saving address server priority determining unit 1522 acquires the file saving address server for which the active flag 1603 is “True” from the file management server managing DB unit 1531. Next, in S1212, the saving address server priority determining unit 1522 sets the priority to be ring-shaped from the file saving address server list acquired in S1211.
Then, in S1213, the saving address server priority determining unit 1522 updates the priority information held in the memory by the priority information holding unit 352 to the priority information set in S1212, and the expiration date of the expiration date holding unit 351 is extended in S1214. For example, the expiration date is updated to the time 1 minute after the current time and the like. After S1214 or if the processing is determined to be within the expiration date in S1210, the file save processing unit 1521 returns as-is the priority information 901 held on the memory in the priority information holding unit 352.
Next, a description will be given of processing in a case that an error occurs on the file saving operation from the file save processing unit 1521 to the file saving address server and the processing fails, referring to the flow chart of
Also, the case in which the error occurs denotes a case in which the file save processing unit 1521 cannot save the file to the file saving address server due to the occurrence of the failure in, for example, the file saving address server. If the file save processing unit 1521 cannot save the file to the information processing apparatus except for the information processing apparatus with the file save processing unit 1521, the information processing apparatus that cannot save the file is set so as to be in a non-startup state, and the image processing apparatus that is set so as to be in the non-startup state is excluded from the arrangement order. More specifically, when an error is generated during the file saving, the file save processing unit 1521 determines that a failure has been generated in the file saving address server to which the file is intended to be saved. Then, in S1311, the file save processing unit 1521 alters the master information of the server corresponding to the file saving address server in which the failure is generated to “False”. Furthermore, the file save processing unit 1521 alters the temporary information of the server corresponding to the file saving server in which the failure is generated to “False” via the saving address server priority determining unit 1522 in S1312.
As described above, even if the file saving request to the file saving unit 1511 is concentrated, the saving address server priority determining unit 1522 does not always refer to the file management server managing DB unit 1531 if it is within the expiration date. Accordingly, an effect acquired by the second embodiment is that the request can be executed without the bottleneck of the file management server managing DB unit 1531.
In the second embodiment, a case is supposed in which only a connection between a certain file management service server and another file management service server is not possible. For example, the case comprises a case as shown in
If the connection-disabled information 1011 is determined to be beyond the expiration date in S1102, the file save processing unit 1521 clears the connection-disabled information 1011 of all the file management service servers in S1103. The reason for the clearing is that only the master information of the file management service server that has been determined to be “unconnected” from all servers within the certain period (expiration date) is set to be “False”.
After S1103, or in S1102, if the connection-disabled information 1011 is determined to be within the expiration date, 01 is added to the connection-disabled information 1011 of the file management service server B1402 for which the ID 1601 is 02 in S1104. This processing illustrates a failure of the file saving to the file storing area unit 1541 of the file management service server B1402 from the file save processing unit 1521 of the file management service server A1401 due to the occurrence of the communication error.
Next, the file save processing unit 1521 determines whether the connection-disabled information 1011 of the file management service server B1402 in which the ID 1602 is 02 in S1105 comprises all of the operating file management service servers. In S1105, if the file save processing unit 1521 determines that the connection-disabled information 1011 does not comprise all of the operating management service servers, the processing is stopped. If the file save processing unit 1521 determines that the connection-disabled information 1011 comprises all of the operating management service servers in S1105 and is not a communication error in S1101, the processing proceeds to S1106. In 51106, the file save processing unit 1521 alters the master information of the file management service server B1402 in which the ID1602 is 02 to “False”.
In the third embodiment, only the connection from a certain file management service server to another file management service server cannot be executed. However, if another file management service server is operating, the master information can be prevented from being altered to be “False”.
In the third embodiment, irrespective of the operation of the file management service server, the master information can be prevented from being altered to “False”. However, for example, if the connection to the file management service server B1402 cannot be executed because the setting of the firewall for the file management service server A1401 is wrong due to operation mistake, an operator cannot provide notification about the operation mistake.
So, in the fourth embodiment, character strings held in the connection-disabled information 1011 are set to connect “ID1601”: “a number of times for the generation of the error, except for writing” with the comma separated value for each file management service server as {01:10,02:5,03:4, . . . ,N:7}. Then, if the number of times for the generation of the writing error to one or more file management service server(s) exceeds a threshold within the expiration date of the connection-disabled information 1011, the file save processing unit 1521 outputs an error log.
In S1504, the file save processing unit 1521 performs the processing in the connection-disabled information 1011 of the file management service server B1402 in which the ID 1601 is 02 as described below. The file save processing unit 1521 performs an increment for a number of times for the generation of the error, except for the writing, when it saves the file from the file management service server A1401 in which the ID1601 is 01. For example, the connection-disabled information 1011 is incremented from {01:1} to {01:2}. Next, in S1507, the file save processing unit 1521 determines whether or not there is a server with a number of times for the generation of the error, except for the writing, greater than or equal to a threshold, during the writing from a certain file management service server to another certain file management service server. If the server with the number of times for the generation of the error, except for the writing, greater than or equal to the threshold is present in S1507, the file save processing unit 1521 outputs the error log in S1508.
According to the fourth embodiment, the error log is output even if the connection to the file management service server B1402 cannot be executed because the setting for the firewall of the file management service server A1401 is wrong due to, for example, an operation mistake. Accordingly, notification about the operation mistake can be provided to the user by monitoring the error log.
Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2014-014822, filed Jan. 29, 2014, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2014-014822 | Jan 2014 | JP | national |