With the rise of virtual machines, the amount of information needed by bridges and other components in the communications (e.g., Ethernet) network about the other components in the network is increasing. In order to manage this information, many of the network components utilize data stores or databases. These databases are continually evolving during network use, with individual records changing and the overall database expanding in size.
When records in a database change, those changes are transmitted to each of the network components so that the network components can update their databases to reflect these changes. However, there is no guarantee that all of the information arrives intact at each of the network components. That is, retransmissions, flow control protocols, waiting in queues, and other communication glitches may result in imperfect transmission of the database updates. Over time, the databases at one or more of the network components may “walk out of synch.”
Accordingly, the entire database may be retransmitted in its entirety at various intervals. However, the database may still be out of synchronization in between retransmission of the entire database. In addition, transmitting large databases for a large number of network components can unacceptably degrade network performance by “dominating the wire” during transmission.
a-c are examples of data structures which may be used for fast synchronization failure detection in distributed databases.
a-d are ladder diagrams illustrating digest protocols.
a-b are state diagrams illustrating fast synchronization failure detection in distributed databases.
The nodes 120 and 130 may include at least some processing capability such as a processor and computer-readable storage for storing and executing computer-readable program code for facilitating communications in the network 100 and managing at least one database, such as a local database 121, 131 and a remote database 122, 132. The nodes 120 and 130 may also provide services to other computing or data processing systems or devices in the network 100. For example, the nodes 120 and 130 may also provide transaction processing services, etc.
The nodes 120 and 130 may be provided on the network 100 via a communication connection, and refers to devices used in packet-switched computer networks, such as an Ethernet network. However, the systems and methods described herein may be implemented in other level 2 (L2) networks and are not limited to use in Ethernet networks.
As used herein, a “bridge node” or “bridge” is a device that connects two networks that may use the same or a different Data Link Layer protocol (e.g., Layer 2 of the OSI Model). Bridges may also be used to connect two different networks types, such as Ethernet and Token Ring networks.
A network bridge connects multiple network segments at the data link layer. A bridge node includes ports that connect two or more otherwise separate LANs. The bridge receives packets on one port and retransmits those packets on another port. The bridge node does not retransmit a packet until a complete packet has been received, thus enabling station nodes on either side of the bridge node to transmit packets simultaneously.
The bridge node manages network traffic. That is, the bridge node analyzes incoming data packets before forwarding the packet to another segment of the network. For example, the bridge node reads the destination address from every packet coming through the bridge node to determine whether the packet should be forwarded based on information included in the local and/or remote databases (e.g., databases 121, 122 if node 120 is a bridge node), for example, so that the bridge does not retransmit a packet if the destination address is on the same side of the bridge node as the station node sending the packet. The bridge node builds the databases by locating network devices (e.g., node 130) and recording the device address.
In the example shown in
In an embodiment, the database entries are formatted in Type Length Value (TLV) encoding. TLV is an example data type, a structure which enables the addition of new parameters to Short Message Peer to Peer (SMPP) Protocol Data Unit (PDU). TLV parameters are included in the SMPP protocol (versions 3.4 and later). The TLVs specified herein include a two octet header with five bits of type and eleven bits of length and in this example, are specific to the embodiments described herein. The TLVs can be added as a byte stream in a standard SMPP PDU. A PDU is a packet of data passed across a network. A Service Data Unit (SDU) is a set of data that is transmitted to a peer service, and is the data that a certain layer will pass to the layer below. The PDU specifies the data that will be sent to the peer protocol layer at the receiving end. The PDU at one layer, ‘n’, is the SDU of the layer below, ‘n-1’. In effect the SDU is the payload of a PDU.
During operation, the Upper Layer Protocol (ULP) delivers the TLVs to the shared feature database at the node 120. Each node has a private database and uses a TLV service interface rather than direct access to the shared feature database to enter TLVs. When a new TLV is received from the local ULP, a database agent 140 at the node 120 checks to see if the TLV is new. TLV new is obscure. The agent checks to see if the new TLV changes any information within the database. The TLV may reference an existing TLV, however may have some changed information from the existing TLV. If the TLV is new, then a TLV digest 150a-b is calculated and a transmit flag is set. The calculation for a new TLV adds the new TLV digest to the database digest. However if the TLV is an update (change of an already existing digest), then the old TLV digest is subtracted from the database digest, and then the new TLV digest is added.
Periodically, the database agent 140 collects all the new or changed TLVs 155a-d from the local database, and packs these TLVs 155a-d in as many PDUs 160a-b as needed and delivers the PDUs 160a-b one at a time as the SDU (e.g., SDU 170 is shown being broadcast in
When the node 130 receives a PDU 160a-b, the database agent 145 at the node 130 checks and acknowledges (ACK) receipt of each PDU 160a-b. The database agent 145 then extracts the TLVs 155a-d and compares the received TLVs 155a-d with the TLVs of the remote database 132 at the node 130. If the database agent 145 finds new or changed TLVs 155a-d, the digest is updated. The database agent 145 also receives and processes digest checks and voids.
Accordingly, only the updated TLVs are transmitted “over the wire”, rather than sending the entire database 121. This removes constraints on database size, speed, and reliability, and is particularly advantageous in distributed networks where the entire updated database would otherwise have to be transmitted to each of the other nodes in the network.
In order that only the updated TLVs need to be transmitted, each database record on a local node (e.g., node 120) is assigned a key locally, and the key is distributed to all remote nodes (e.g., node 130) in the network 100. The key may be a flat 16 bit (or other suitable length) integer enabling the database to contain up to 64K TLVs (or other corresponding number, depending on the key length). The range of the key may be configured with the same value on both the node 120 and the node 130. Unlike Link Layer Discovery Protocol (LLDP), the key may be dynamically assigned and then shared between the local and remote databases. The ULPs manipulating database elements use the primary key for all TLV operations. Available keys are assigned to the ULPs and may be in possession of the ULP until the ULP releases the key.
Before continuing, it is noted that dynamically directing traffic through the multiple paths in a routable fabric, as just described, is for purposes of illustration and is not intended to be limiting. In addition, other functional components may also be provided and are not limited to those shown and described herein.
a-c are examples of data structures which may be used for fast synchronization failure detection in distributed databases. The data structures shown are TLV format, consistent with the example described above for
That being said,
b shows an example of organization-specific TLV 220, which includes a 3 octet organization identifier, and unique identifier subtype. It is noted that the example TLV shown in
c shows examples of ULP control TLVs. Shown in this example are: LostSync TLV 230, Sync TLV 231, Digest TLV 232, Void TLV 233, End TLV 234. It is noted that the database digest is shown in Digest TLV 232 in field 240. The digest is a summary of the entire database (which may be as large as many megabytes or more) after having been compressed to 16 octets in this example.
It is noted that both the local and remote databases are keyed with an index with a maximum value negotiated between the station node and the bridge node. For example, index values between 0 and 127 are reserved for TLVs, while the rest of the available index values are dynamically assigned to ULPs.
For each TLV, the database also has five local variables. These are the Valid Boolean, Stale/Void Boolean, Touched Boolean, Changed Boolean, and the TLV hash. A single digest variable exists for each database. Every database TLV is keyed with an index. This index is known to the ULP and used by the ULP for access to the TLV. The Boolean arrays are not visible to the ULP.
The Valid Boolean array indicates the presence or absence of a valid TLV on the index.
The Stale/Void Boolean array is set to True for all valid TLVs for the remote database whenever the database has lost sync. The Stale variable is set to False whenever the TLV is updated. For the local database, True is set for TLVs whenever they are voided from the database.
The Touched Boolean array is set to False every time the database TLV lease time expires, and set to True whenever the TLV is updated. The ULP is responsible for updating TLVs.
The Changed Boolean array is set to True to indicate the TLV was updated with a change in content, and set to False if the TLV has not changed since the last time the TLV was received (remote database) or transmitted (local database).
The TLV hash array is the digest calculation (e.g., SHA-256 truncated to 128 least significant bits for the current TLV).
In an example, a high quality digest may be based on a cryptographic hash function, such as but not limited to, SHA-256, MD5, or other suitable algorithm. Also in an example, the records are hashed as TLVs to generate individual feature TLV hashes for each of the TLVs. The feature TLV hashes are then XOR'ed to generate a 128 bit truncated database digest 300.
The hash 300 includes a hash of all TLV fields. The digest 300 may be generated in hardware and/or program code (e.g., firmware or software). The digest 300 is order independent, supports incremental updates, and supports any size database. The digest 300 also enables incremental calculations. Each TLV hash may be generated as updates to the TLV arrive. Deleting a TLV may be by a single XOR. Adding a TLV may be by hashing a single TLV and a single XOR. Updating a TLV may be by hashing a single TLV and two XORs. Again, it is noted that while TLVs are used in the example shown in
a-d are ladder diagrams 400, 410, 420, and 430, respectively, illustrating digest protocols. It is noted that while only one station node and one bridge node are shown in
The example in
The example in
The example in
The example in
a-b are state diagrams illustrating fast synchronization failure detection in distributed databases.
In
Also in
In
Also in
Before continuing, it is noted that the example dialogs shown in
In operation 610, a digest of a database stored at a sending node in a network is received by a receiving node. The digest may be broadcast by the sending node to N number of nodes in the network, including the receiving node. In operation 620, a digest of a database stored at a receiving node in the network is generated.
It is noted that each node in the network may include a local feature database and a remote feature database. The remote database may include N number of elements corresponding to N number of nodes in the network. The digest of the database stored at the sending node is a digest of the local feature database, and the digest of the database generated at the receiving node is a digest of the remote feature database.
In an embodiment, the sending node and the receiving node may be a station node or a bridge node. The databases may include a plurality of Type Length Value (TLV) fields, each TLV corresponding to a feature.
In operation 630, the generated digest is compared at the receiving node to the received digest. In operation 640, a lost synchronization signal is issued by the receiving node when the comparison indicates a change in the database stored at the sending node.
The operations shown and described herein are provided to illustrate exemplary implementations of fast synchronization failure detection in distributed databases. It is noted that the operations are not limited to the ordering shown. Still other operations may also be implemented.
For example, the operations may also include issuing an update to the database stored at the receiving node only in response to receiving a lost synchronization signal from the receiving node. The operations may also include generating the digest by hashing each field of the database, and then XOR-ing all of the hashes. The operations may also include removing a field from the database at the receiving node by sending a VOID from the sending node.
It is noted that the exemplary embodiments shown and described are provided for purposes of illustration and are not intended to be limiting. Still other embodiments are also contemplated for fast synchronization failure detection in distributed databases.