This disclosure defines a new weighted load balancing method on data access nodes, ensuring the ability to scale system's volume horizontally and to balance the load when adding, removing data access nodes or changing nodes' hardware configuration.
To ensure heavy load handling capability of large data access system, software developers have to solve data partitioning and load balancing problems. Furthermore, it is important to rebalance the data when adding, removing data access nodes or changing node's hardware configuration, without interrupting the system.
Currently, there are many database management systems implementing different approaches to data partitioning, for example: MySQL, MongoDB, Aerospike, etc. However, these systems do not differentiate the weights of data access nodes. These data access nodes must have similar hardware configuration in order to handle the same amount of traffic. Additionally, modification in one of the data access node's hardware configuration can unbalance the system.
The weighted load balancing method referred in this disclosure will ensure the ability to horizontally scale data access system's volume, linearly increasing system's throughput when adding new data access nodes. This method also supports customization on the weight of each data access node allowing them to have different hardware configuration.
The goal of this weighted load balancing method is to address the core problem of large data access system, which is load balancing and rebalancing the data when adding, removing data access nodes or changing nodes' hardware configuration, minimizing the amount of data to be shifted between nodes.
In order to achieve this goal, the weighted load balancing method on data access nodes includes the following steps:
Step 1: update the routing table; when adding, removing nodes or changing nodes' weight, move virtual nodes from a node that has a decreased number of virtual nodes to a node that has an increased number of virtual nodes; for example, consider a system which has n virtual nodes and in nodes whose identifiers and weights in turn are ID1, ID2, ID3, . . . , IDm and W1, W2, W3, . . . , Wm.
When adding a node with identifier IDm+1, the proportion of data needs to be shifted is Wm+1÷Σk=1m+1(Wk).
When removing a node with identifier IDi having I∈[1, m], the proportion of data needs to be shifted is Wi÷Σk=1m(Wk).
When replacing nodes' weight with new values Q1, Q2, Q3, . . . , Qm, the proportion of data needs to be shifted is Σk=1m|Qk÷Q−Wk÷W|÷2.
In which m is the total number of nodes before adding or removing; Wm+1 is the weight of added node; W1 is the weight of removed node; Q is new total weight; W is old total weight; Qk is new weight of node IDk; Wk is old weight of node IDk.
Step 2: store old routing table (before being updated) on array A1 and new routing table (after being updated on step 1) on array A2.
Step 3: block access to records that need to be moved to other node:
For read access with key K, perform hash function F(x) to calculate value I=F(K); read record corresponding to key K from node having identifier A1[I] then return result.
For write access with key K, perform hash function F(x) to calculate value I=F(K); if A1[I]=A2[I], record corresponding to key K is not being moved, write record to node having identifier A1[I] then return success code; if A1[I]≠A2[I], record corresponding to key K is being moved, deny write access and return error code.
Step 4: copy records from old node to new node: for each key K, perform hash function F(x) to calculate value I=F(K); if A1[I]≠A2[I], copy record corresponding to key K from node having identifier A1[I] to node having identifier A7[I]; if A1[I]=A2[I], record corresponding to key K does not need to be moved.
Step 5: after copying all records that need to be moved, all read/write access is performed using new routing table A2: for each read/write access with key K, perform hash function F(x) to calculate value I=F(K); record corresponding to key K is accessed on node having identifier A2[I].
Step 6: clean duplicated records: for each key K, perform hash function F(x) to calculate value I=F(K); if A1[I]≠A2[I], remove record corresponding to key K from node having identifier A1[I]; if A1[I]=A2[I], record corresponding to key K is not duplicated.
The weighted load balancing method on data access nodes is composed of two methods: data partitioning method and load balancing method when adding, removing data access nodes or changing nodes' weight.
Some terms used in the following detailed description are defined as follows:
The data partitioning method is used to determine the data access node of the record corresponding to a specific key. In the case of data being rebalanced when adding, removing nodes or changing nodes' weight, data is accessed using the load balancing method as will be mentioned later.
The data partitioning method uses a routing table which is an array A of n items, the value of each item is the identifier of a node. Item with value X is considered a virtual node of node with identifier X. The number of virtual nodes of a node is determined as:
C=n×W
1
÷W
2
When accessing a record corresponding to a specific key K, perform hash function F(x) to calculate value I=F(K)∈[0, n−1].
The value of item A[I] is the identifier of the node which has the record corresponding to key K.
Refer to
Step 1: update the routing table (move virtual nodes from a node to other nodes) when adding, removing nodes or changing nodes' weight.
Consider the system which consists of n virtual nodes and m nodes having identifiers and weights in turn are ID1, ID2, ID3, . . . , IDm and W1, W2, W3, . . . , Wm.
Refer to
For each node that has identifier IDi with I∈[1, m], odes from node IDi to node IDm+1. The number of virtual nodes to be moved is:
Refer to
For each node having identifier IDj with J∈[1, m] and J≠1, move virtual nodes from node IDi to node IDj. The number of virtual nodes to be moved is:
Refer to
For each node having identifier IDi with I∈[1, m], the number of virtual nodes to be moved is:
C=n×(Qk÷Q−Wk÷W)
Move virtual nodes from nodes having number of virtual nodes decreased to nodes having number of virtual nodes increased.
Step 2: Store old routing table (before being updated) on array A1 and new routing table (after being updated on step 1) on array A2.
Step 3: block access to records that needs to be moved to other node:
Refer to
For read access with key K:
For write access with key K:
Step 4: copy records from old node to new node:
Refer to
For each key K:
Step 5: after copying all records that need to be moved, all read/write access is performed using new routing table A2.
Refer to
Step 6: clean duplicated records:
Refer to
For each key K:
Benefits of the invention include:
The data partitioning method: ensure the ability to horizontally scale the data access system, the load handling capacity is increased linearly according to the number of data access nodes.
The load balancing method when adding, removing data access nodes or changing nodes' weight: minimize the amount of data to be shifted, minimize service interruption and ensure data integrity during the migration.
Differentiate the weights of data access nodes, allowing data access nodes to have different hardware configuration.
Number | Date | Country | Kind |
---|---|---|---|
1-2019-06742 | Nov 2019 | VN | national |