This non-provisional application claims priority under 35 U.S.C. § 119(a) on Patent Application No(s). 108144322 filed in Republic of China on Dec. 4, 2019, the entire contents of which are hereby incorporated by reference.
This disclosure relates to a host system supporting fault tolerance.
A virtual machine in a host completely backs up states of peripheral inputs/outputs of the virtual machine and a state of a memory of the virtual machine to a backup host without interruption, so that a backup virtual machine which is identical to the virtual machine is formed in the backup host, thereby achieving a fault tolerance of the virtual machine. When the virtual machine wants to send data packets to a client device, in order to keep states of the backup virtual machine to be consistent with external states, the virtual machine monitoring layer in the host temporarily stores the data packets which need to be transmitted, and then the virtual machine monitoring layer transmits the data packets to the client device until states of peripheral inputs/outputs of the virtual machine and the state of the memory are completely backed up to the backup host. After the client device receives the data packets from the host, a client application of the client device returns a confirmation packet to the host.
However, when the fault tolerance mechanism of the host is activated, the round trip time of the data packets will be increased sharply. The added round trip time is the sum of time for executing the running state of the fault tolerance mechanism, the snapshot state of the fault tolerance mechanism, the transfer state of the fault tolerance mechanism, and the flush output state of the fault tolerance mechanism. According to the current transmission control protocol (TCP) related to the network congestion control, when the round trip time of data packets becomes longer, the network transmission rate will be greatly reduced. It can be seen that although the host can achieve the purpose of state backup after the fault tolerance mechanism is activated, it actually causes a problem of decreasing the network transmission rate.
In view of the above situations, there is indeed a need for an improved system supporting fault tolerance, the improved system not only achieves the purpose of state backup, but also improve the problem of decreasing the network transmission rate.
According to one or more embodiment of this disclosure, a control method of a system supporting fault tolerance is provided, wherein the system comprises a first host and a second host and the first host and the second host are configured to connected to a client device via an internet, the first host stores a virtual machine and a transmission control protocol agent, and with the control method comprising: via the first host, executing the transmission control protocol agent to receive a data stream from the client device; via the transmission control protocol agent, in response to the data stream from the client device, transmitting an acknowledgement packet to the client device via the transmission control protocol agent; via the transmission control protocol agent, determining whether a fault tolerance mechanism of the virtual machine is activated; via the transmission control protocol agent, determining whether the virtual machine operates in a running state when the transmission control protocol agent determines that the fault tolerance mechanism of the virtual machine is activated; via the transmission control protocol agent, temporarily storing the data stream when the transmission control protocol agent determines that the virtual machine is not in the running state; and via the transmission control protocol agent, transmitting the data stream to the virtual machine when the transmission control protocol agent determines that the virtual machine operates in the running state.
According to one or more embodiment of this disclosure, another control method of a system supporting fault tolerance is provided, wherein the system comprises a first host and a second host and the first host and the second host are configured to connected to a client device via an internet, the first host stores a virtual machine and a transmission control protocol agent, and with the control method comprising: via the first host, executing the transmission control protocol agent to receive a data stream from the virtual machine; in response to the data stream from the virtual machine, transmitting an acknowledgement packet to the virtual machine via the transmission control protocol agent; via the transmission control protocol agent, determining whether a fault tolerance mechanism of the virtual machine is activated; via the transmission control protocol agent, determining whether states of the virtual machine are completely backed up to the second host when the transmission control protocol agent determines that the fault tolerance mechanism of the virtual machine is activated; via the transmission control protocol agent, temporarily storing the data stream when the transmission control protocol agent determines that the states of the virtual machine are not completely backed up to the second host; and via the transmission control protocol agent, transmitting the data stream to the client device when the transmission control protocol agent determines that the states of the virtual machine are completely backed up to the second host.
According to one or more embodiment of this disclosure, a system supporting fault tolerance is provided. The system comprises a first host and a second host, the first host stores a virtual machine and a transmission control protocol agent and is configured to connect to a client device via an internet. The second host connects to the first host via the internet, and the first host is at least configured to execute the transmission control protocol agent to receive a data stream from the client device and to transmit an acknowledgement packet to the client device in response to the data stream from the client device.
The present disclosure will become more fully understood from the detailed description given herein below and the accompanying drawings which are given by way of illustration only and thus are not limitative of the present disclosure and wherein:
In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosed embodiments. It will be apparent, however, that one or more embodiments may be practiced without these specific details. In other instances, well-known structures and devices are schematically shown in order to simplify the drawings.
The first host 100 comprises a circuit board 10, a central processing unit 11, and a memory 12. The memory 12 could be a non-temporary memory, a volatile memory or a non-volatile memory. The circuit board 10 is, for example, a main board, and the central processing unit 11 (CPU) and the memory 12 are disposed on the circuit board 10 and the central processing unit 11 and the memory 12 are electrically connected to each other. The memory 12 stores a virtual machine 13, a virtual machine monitoring program 14, and a transmission control protocol agent 15 (TCP agent). The central processing unit 11 executes the virtual machine 13, the virtual machine monitoring program 14 and the transmission control protocol agent 15. States of the virtual machine 13 comprises states of peripheral input/output of the virtual machine 13 and a state of a memory of the virtual machine 13. The virtual machine monitoring program 14 receives an external instruction. When the external instruction is used to activate a fault tolerance mechanism of the virtual machine 13, the virtual machine monitoring program 14 drives the virtual machine 13 to activate the fault tolerance mechanism. When the fault tolerance mechanism of the virtual machine 13 is activated, the virtual machine 13 performs a migration. The migration means that the states of the virtual machine 13 are transferred to the second host 200, so that a backup virtual machine 20 is generated in the second host 200, and states of the backup virtual machine 20 are completely consistent with the states of the virtual machine 13. In other embodiments, the virtual machine 13 and the transmission control protocol agent 15 may be located at different hosts and communicate with each other through the local area network. When the client device C wants to send a data stream to the virtual machine 13 of the first host 100 (means an incoming path), the transmission control protocol agent 15 receives the data stream from the client device C. After receiving the data stream from the client device C, the transmission control protocol agent 15 transmits an acknowledgement packet to the client device C in response to the data stream from the client device C. With regard to a prior system supporting fault tolerance, the acknowledgement packet is transmitted to the client device via the virtual machine. Therefore, a time point for transmitting the acknowledgement packet via the system supporting fault tolerance according to the present disclosure is significantly earlier than a time point for transmitting the acknowledgement packet by the prior system supporting fault tolerance.
In addition, the transmission control protocol agent 15 of the first host 100 is further used to determine whether the fault tolerance mechanism of the virtual machine 13 is activated and whether the states of the virtual machine 13 are completely backed up to the second host 200. The fault tolerance mechanism of the virtual machine 13 sequentially comprises a running state, a snapshot state, a transfer state, and a flush output state. More specifically, the running state means a period during which the virtual machine 13 of the first host 100 continues to operate, the snapshot state means a period during which the states of the virtual machine 13 are backed up, the transfer state means that a period during a backup of the states of the virtual machine 13 is transferred to the second host 200, and the flush output state means a period during the states of the virtual machine 13 are completely transferred to the second host 200. In this embodiment, the fault tolerance mechanism of the virtual machine 13 is implemented by performing multi-threading. Therefore, for the virtual machine 13, the running state and the snapshot state are continuously circulated, and the transfer state and the flush output state are executed in background.
A step S102 is adding a first identification stamp to the first data stream via the transmission control protocol agent 15, and the first identification stamp indicates a time point when the transmission control agent 15 receives the first data stream. A step S103 is transmitting a first acknowledgement packet to the client device C via the transmission control protocol agent 15 in response to the first data stream from the client device C after the transmission control protocol agent 15 completely receives the first data stream from the client device C, wherein the first acknowledgement packet is read by a client application of the client device C. A step S104 is determining whether the fault tolerance mechanism of the virtual machine 13 is activated via the transmission control protocol agent 15. When the transmission control protocol agent 15 determines that the fault tolerance mechanism of the virtual machine 13 is activated, then a step S105 is performed. The step S105 is determining whether the virtual machine 13 operates in the running state via the transmission control protocol agent 15. When the transmission control protocol agent 15 determines that the fault tolerance mechanism of the virtual machine 13 is not activated, then a step S106 is performed. The step S106 is transmitting the first data stream to the virtual machine 13 via the transmission control protocol agent 15. After the virtual machine 13 completely receives the first data stream from the transmission control protocol agent 15, the virtual machine 13 sends a second acknowledgement packet to the transmission control protocol agent 15.
When the transmission control protocol agent 15 determines that the virtual machine 13 does not operate in the running state, then a step S107 is performed. The step S107 is temporarily storing the first data stream via the transmission control protocol agent 15, then the step S105 is performed. When the transmission control protocol agent 15 determines that the virtual machine 13 operates in the running state, then a step S108 is performed. The step s108 is transmitting the first data stream to the virtual machine 13 via the transmission control protocol agent 15.
A time difference between a first time point and a second time point is the round trip time of the data stream, wherein the client device C receives the first acknowledgement packet from the transmission control protocol agent 15 at the first time point, and the client device C starts transmitting the first data stream to the first host 100 at the second time point. Under the network congestion control of the transmission control protocol, the shorter the round-trip time is, the faster the network transmission speed is.
Since the transmission control protocol agent 15 processes multiple network packets every fixed period, the state of the fault tolerance mechanism of the virtual machine 13 which are read by the transmission control protocol agent 15 are recorded in the latest inter-process communication packet when the data process schedule of the transmission control protocol agent 15 comprises processing the inter-process communication packet. It is assumed that a state of the fault tolerance mechanism recorded in the latest inter-process communication packet and a state of the fault tolerance mechanism recorded in at least one inter-process communication packet which is before the latest inter-process communication packet are flush output states, the transmission control protocol agent 15 delays to transmit the data stream to the client device C when it does not process each of the inter-process communication packets in in real time. For solving the problems described above, the transmission control protocol agent 15 should be designed to process each of the inter-process communication packets in real time.
When the transmission control protocol agent 15 determines that the states of the virtual machine 13 are not completely backed up to the second host 200, then a step S307 is performed. The step S307 is temporarily storing the second data stream via the transmission control protocol agent 15. When the transmission control protocol agent 15 determines that the states of the virtual machine 13 are completely backed up to the second host 200, then a step S308 is performed. The step S308 is transmitting the second data stream to the client device C via the transmission control protocol agent 15, then a step S309 is performed. The step S309 is, transmitting the third acknowledgement packet to the transmission control protocol agent 15 via the client device C in response to the second data stream from the transmission control protocol agent 15 after the client device C completely receives the second data stream from the transmission control protocol agent 15, then a step S310 is performed. The step S310 is releasing the second data stream via the transmission control protocol agent 15 after the transmission control protocol agent 15 reads the third acknowledgement packet from the client device C.
Because the communication between the transmission control protocol agent 15 and the virtual machine 13 is usually through a local area network or is within the same host, and the communication between the transmission control protocol agent 15 and the client device C is usually through an internet network. Therefore, a first data transmission speed between the transmission control protocol agent 15 and the virtual machine 13 is usually much higher than a second data transmission speed between the transmission control protocol agent 15 and the client device C. With respect to a data path between the virtual machine 13 and the client device C, when too many data packets are accumulated in the transmission control protocol agent 15 without being processed, the memory resources are possibly exhausted and data packets are possibly lost. In order to solve the above problem, a system supporting fault tolerance according to a third embodiment of the present disclosure is provided.
In addition to the fault tolerance mechanism and the inter-process communication packet monitoring program 16, the control method of the system supporting fault tolerance further comprises the data transmission speed monitoring program 17.
The first transmission speed between the transmission control protocol agent 15 and the virtual machine 13 may be reduced via the transmission control protocol window algorithm. In one embodiment, when the remaining memory resources of the first host 100 is greater than or equal to a preset percentage lower limit, the transmission control protocol agent 15 does not take any data packets away from the second windows. The transmission control protocol agent 15 does not take the data packets away from the second windows until the remaining memory resource of the first host 100 is less than the percentage lower limit. In another embodiment, the transmission control protocol agent 15 does not take the data packets away from the second windows until the second windows of the transmission control protocol agent 15 are filled with the data packets.
When the system which supports fault tolerance has multiple virtual machines and periods of the fault tolerance mechanisms of the virtual machines are not completely the same, an amount of data processed by each of the virtual machines must be further controlled. A formula of data flow (bit per second) for each of the virtual machines is defined by (an amount of data transmitted from the system with fault tolerance to the client device)/(the number of the virtual machines). A formula of an amount of data processed by each of the virtual machines during one period of the fault tolerance mechanism is defined by (data flow) * (a period of the fault tolerance mechanism). In other embodiments, a priority of the virtual machine is determined by an importance degree of data processed by the virtual machine, and a priority scheduling algorithm assigns a specific data flow for a virtual machine which has the highest priority. In other embodiment, a guaranteed minimum transmission algorithm assigns a bandwidth lower limit for each of the virtual machines. When the bandwidth lower limit is “X” Megabit per second, a formula of a minimum amount of data transmitted by the virtual machine is defined by
wherein “n” is a transmission time, and “t” is a total amount of data transmitted by the virtual machine.
In view of the above description, the time needed for receiving the acknowledgement packet by the virtual machine or the client application can be greatly reduced because the transmission control protocol agent is responsible for transmitting the acknowledgement packet and storing the data packets temporarily. As a result, the round-trip time for transmitting the data packets can be significantly reduced. Conversely, when the fault tolerance mechanism of a past virtual machine is activated, the virtual machine only can receive the acknowledgement packet after the running state, the snapshot state, the transfer state, and the flush output state are finished. The time needed for receiving the acknowledgement packet via the virtual machine is increased sharply because the sum of time needed for executing the running state, the snapshot state, the transfer state, and the flush output state, and the round-trip time is increased. Under the network congestion control of the transmission control protocol, the shorter the round-trip time is, the faster a network transmission speed is. Therefore, a network transmission speed of the system supporting fault tolerance according to the present disclosure is higher that a network transmission speed of a prior system supporting fault tolerance. When the network transmission speed is faster, the time needed for transmitting data can be reduced.
Number | Date | Country | Kind |
---|---|---|---|
108144322 | Dec 2019 | TW | national |