1. Field of the Invention
The present invention relates to replication of data. In particular, the present invention relates to replication of data using memory to memory transfers.
2. Description of the Prior Art
Replication of data across database servers is a common safeguard for protecting data. Typically, when reading or writing data, a request to perform a data operation is sent from a client to a database. The database receives the request and processes the request. Processing the request in prior art systems may include the database management (DBMS) system taking control of the data access detecting a request on the data, process the request by searching for the data and performing an operation on the data, generating a response, and transmitting the response. With large amounts of data requests, the DBMS handling of data replication related requests can cause latency issues.
Latency in memory access operations can cause database performance to suffer. To ensure that data is available and up to date as quickly as possible, any reduction in latency is highly desirable. What is needed is an improved method of replicating databases in which latency is reduced.
The present technology may provide database replication with low latency using one-sided remote direct memory access. A client may communicate with a DMBS spread across more than one server. The database may include one or more collections of data, known as tables. Each table may be composed of one or more memory data blocks of storage. Memory blocks are either in use storing data, or free for later use. In some DBMSs an in-use block is known as a database row.
Each in-use block may be uniquely identified by a descriptor known as a key. Each table may have an index which may be used to find specific data blocks quickly based on their keys. The index structure may also indicate what data blocks are used and unused. To read the data from a table associated with a certain key, the index structure is accessed to find the specific block containing the data referenced by the key.
After the location is determined, the data is retrieved by reading from the block. After the data is retried, the index must be checked again to see if another client stored a new set of data associated with the key in a different block and updated the index to point to the new block.
An embodiment may perform a method for replicating data. A memory location may be allocated in a first database. A remote direct memory access command may be sent from a client to a first database and a second database to write data to the memory location. An index structure for each of the first database and second database may be updated with information regarding the data.
An embodiment may include a system for displaying data. The system may include a processor, a memory, and one or more modules stored in memory. The one or more modules may be executed by the processor to allocate a memory location in a first database, send a remote direct memory access command from a client to a first database and a second database to write data to the memory location, update an index structure for each of the first database and second database with information regarding the data.
The present technology may provide database replication with low latency using one-sided remote direct memory access. A client may communicate with a DMBS spread across more than one server. The database may include one or more collections of data, known as tables. Each table may be composed of one or more memory data blocks of storage. Memory blocks are either in use storing data, or free for later use. In some DBMSs an in-use block is known as a database row.
Each in-use block may be uniquely identified by a descriptor known as a key. Each table may have an index which may be used to find specific data blocks quickly based on their keys. The index structure may also indicate what data blocks are used and unused. To read the data from a table associated with a certain key, the index structure is accessed to find the specific block containing the data referenced by the key.
After the location is determined, the data is retrieved by reading from the block. After the data is retried, the index must be checked again to see if another client stored a new set of data associated with the key in a different block and updated the index to point to the new block. FI
G. 1 is a system for replicating data. The system of
In some embodiments, database 110 may communicate with the servers by remote direct memory access (RDMA). RDMA is a form direct memory access from the memory of one computer to that of another without involving either computer's operating system. This process of access permits high-throughput, low latency networking.
The system of
Network 120 may communicate with clients 110, server 130 and server 140. Network 120 may be comprised of any combination of a private network, a public network, a local area network, a wide area network, the Internet, an intranet, a Wi-Fi network, a cellular network, or some other network.
Server 130 and 140 may each include one or more servers for storing data. The data may be structured data or unstructured data, and be replicated over the two databases. The memory of each of server 130-140 may be accessible by RDMA module 115 and/or database 110 via RDMA commands.
Data table 230 may include an index structure for storing information about data blocks within database 210. In embodiments, the index structure of data table 230 may include pointers to data block locations in memory currently in use. If a particular data block is not being used, the index structure of data table 230 will not include a pointer. In some embodiments, the index structure for data table 230 may be a bit map.
The DBMS may have a management process to coordinate security checks and aiding setting up initial access to the table, such as to data block 220. This helps maintain serialization in writing data from multiple sources.
A writer and reader may exist outside of the DBMS container. Each table in the DBMS can only have one writing client at a time, and may have any number of threads or other reading clients at a time.
An unused data block may be marked as a used data block in the index structure of the data table at step 320. To mark a data block in the index structure, database 110 may send an RDMA command to a database write process to update the index structure for a particular data block. The data block that is marked used will be the data block that is being written to by database 110.
Data is written to the memory block of a first server using an RDMA command at step 330. Database 110 may send an RDMA command to the write process to write data to the memory block. By using the RDMA command, database 110 does not involve any processes of the server being written to. Rather, the data is written directly from the memory of database 110 to the memory of the particular data base server. The server has no control over any portion of the process.
Data in the memory block of the second server may be written using RDMA commands at step 340. By writing the data in a memory block of a second server, the data is replicated for durability. The index structure of the tables at each database server is updated at step 350. The update may include adding a pointer to the memory block at which data was just written.
After receiving the data, the index structure may be accessed again and a determination is made as to whether there is a change in the index structure pointer associated with the memory block read at step 440. If any change occurred between the time when the index structure was first accessed and the time that data was retrieved, the data received by database 110 may not be the most up-to-date data. Therefore, if a change is detected, the method of
The components shown in
Mass storage device 530, which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit 510. Mass storage device 530 can store the system software for implementing embodiments of the present invention for purposes of loading that software into main memory 520.
Portable storage device 540 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk or Digital video disc, to input and output data and code to and from the computer system 500 of
Input devices 560 provide a portion of a user interface. Input devices 560 may include an alpha-numeric keypad, such as a keyboard, for inputting alpha-numeric and other information, or a pointing device, such as a mouse, a track ball, stylus, or cursor direction keys. Additionally, the system 500 as shown in
Display system 570 may include a liquid crystal display (LCD) or other suitable display device. Display system 570 receives textual and graphical information, and processes the information for output to the display device.
Peripherals 580 may include any type of computer support device to add additional functionality to the computer system. For example, peripheral device(s) 580 may include a modem or a router.
The components contained in the computer system 500 of
The foregoing detailed description of the technology herein has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the technology to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. The described embodiments were chosen in order to best explain the principles of the technology and its practical application to thereby enable others skilled in the art to best utilize the technology in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the technology be defined by the claims appended hereto.