The present invention relates to optimizing power usage and/or a measure of system performance (e.g., throughput) while maintaining data coherency, and more specifically, to an operating load of components involved in a clustered system having multiple thread processing capability.
In a clustered application like a database management system with a shared data architecture, the individual nodes of the database have to send messages to each other to maintain shared data structures in a coherent state. This messaging introduces latencies and creates wait queues which, if not managed well, may introduce degradation in the overall system throughput, waste processing cycles of the nodes, and increase power consumption. Systems that have predetermined values of timed waits, polling and processor yields may cause degradation of system throughput if the system is operated under a load profile for which the load profile configuration does not apply. Production systems having dynamic load profiles may yield poor or negative throughput when using such a predetermined, hard configuration.
Operating systems provide facilities for applications to determine a load profile from within software using an application programming interface (API). A query or function call to standard API's may be resource intensive, and sometimes involves systems calls that perform computation to arrive at a returned value. Some queries or functions calls to standard API's may involve burdensome averaging over long periods of times and may be counter-beneficial and cause further performance degradation for optimization purposes.
Computing systems provide power management facilities that may allow aspects of the system, including a processing unit or processor, to be throttled to optimize power consumption. Throttling may require the hardware to operate within a power or thermal envelop, whereby the system may adjust its processing characteristics and performance to operate within the prescribed envelope. Computing systems are capable of disabling portions of its processor or reducing the effective speed of the processor or portions thereof when the system is essentially idle.
According to one exemplary embodiment of the present invention, a method is provided for dynamically selecting active polling or timed waits by a server in a clustered database, the server comprising a processor and a run queue having at least a first runnable thread that occupies the processor and requires a message response, by determining a load ratio of the processor as a ratio of an instantaneous run queue occupancy to a number of cores of the processor, determining whether power management is enabled on the processor, determining an instantaneous state of the processor, wherein the instantaneous state is determined based on the load ratio of the processor and whether power management is enabled on the processor and executing, a state process, wherein the state process corresponds to the determined instantaneous state, wherein the first runnable thread occupies the processor and requires a message response.
According to another exemplary embodiment of the present invention, a server is provided for dynamically selecting active polling or timed waits, the server comprising a processor, the processor having a plurality of hardware threads, a network interface, a memory in communication with the network interface and the processor, the memory comprising a run queue, wherein the run queue has a first runnable thread that occupies the processor and requires a message response, the memory being operable to direct the processor to: determine a load ratio of the processor, the load ratio being calculated as a ratio of an instantaneous run queue occupancy to a number of cores of the processor, determine whether power management is enabled for the processor, determine an instantaneous state of the processor, and execute a state process, wherein the state process corresponds to the determined instantaneous state.
According to another exemplary embodiment of the present invention, a computer program product is provided for dynamically selecting active polling or timed waits by a server in a clustered database, the computer program product comprising a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising computer readable program code configured to instruct a database management system to: determine a load ratio of a processor, wherein the processor is occupied by a first runnable thread that requires a message response, and wherein the load ratio is calculated as a ratio of an instantaneous run queue occupancy to a number of cores of the processor; determine a power management state of the processor; determine an instantaneous state of the processor; and execute a state process, wherein the state process corresponds to the determined instantaneous state.
These and other features, aspects and advantages of the present invention will become better understood with reference to the following drawings, description and claims.
The following detailed description is of the best currently contemplated modes of carrying out exemplary embodiments of the invention. The description is not to be taken in a limiting sense, as the scope of the invention is defined by the appended claims.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wired, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server or as part of the monitor code. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
Broadly, embodiments of the present invention provide a method, apparatus, and computer program product for dynamically selecting active polling or timed waits by a server in a clustered database including, for example, determining an instantaneous run queue occupancy, determining a number of cores of a processor, determining a load ratio of the processor by calculating a ratio of the instantaneous run queue occupancy to the number of cores, determining whether power management is enabled on the processor, determining an instantaneous state of the processor, and executing a state process, wherein the state process corresponds to the determined instantaneous state.
Embodiments of the present invention may be implemented in systems that include a distributed application or a clustered solution such as in a database management system, for example. With reference now to
Referring now to
A network interface 224 may provide communication between server 200 and, for example, a network 118. Network interface 224 may include a network interface card that may utilize Ethernet transport as well as emerging messaging protocols and transport mechanisms or communications links including Infiniband, for example. An input/output (I/O) device 226 may interface with a user, with computer readable media, or with external devices (e.g., peripherals) including, for example, a keyboard, a mouse, a touchpad, a track point, a trackball, a joystick, a keypad, a stylus, a floppy disk drive, an optical disk drive, or a removable storage device. I/O device 226 may be capable of receiving and reading non-transitory storage media. Server 200 may have a memory 228, which may represent random access memory devices comprising, for example, the main memory storage of server 200 as well as supplemental levels of memory (e.g., cache memories, nonvolatile memories, read-only memories, programmable or flash memories, or backup memories). Memory 228 may include memory storage physically located in server 200 including, for example, cache memory in processors 208, storage used as virtual memory, magnetic storage, optical storage, solid state storage, or removable storage.
Server 200 may have an operating system (OS) 230 loaded into memory 228 that may provide a basis for which a user or an application may interact with aspects of server 200. OS 230 may have an application programming interface (API) 232 that may facilitate an interaction between an application and OS 230 or other aspects of server 200. A database management system (DBMS) 234 may reside in memory 228 and may utilize API 232 to interact with aspects of serve 200. DBMS 234 may have a plurality of subsystems including, for example, a data definition subsystem, data manipulation subsystem, application generation subsystem, and data administration subsystem. DBMS 234 may maintain a data dictionary, file structure and integrity, information, an application interface, a transaction interface, backup management, recovery management, query optimization, concurrency control, and change management services. DBMS 234 may process logical requests, translate logical requests into physical equivalents, access physical data and respective data dictionaries. DBMS 234 may manage a database instance that may require communication with other database instances when operating in a clustered or distributed environment to maintain data coherency. Maintaining data coherency may require passing messages among the database instances, which may require transmitting messages and receiving messages. Communication among servers 108, for example server 200, in a clustered system may include remote direct memory access (RDMA), which may be used by servers 108 to directly communicate with a memory 228 of another server. RDMA communications may involve sending a message from a first server to a second server, and receiving a message response, by the first server, from the second server. According to certain application configurations (e.g., a clustered or distributed computing configuration), the message and the message response may be related or may have dependencies there betweeen (e.g., applications 116 operated by servers 108 may be synchronous), and therefore, a waiting period may be required before server 200 may continue processing a process or runnable thread. RDMA messaging requests may require a low latency to be computationally efficient, and thus, excessive waiting may be costly or detrimental to performance or power consumption.
A poll manager 236 may be configured to manage an interaction between processes (e.g., aspects of applications including DBMS 234) and regions or segments of memory 228, which may include, for example, message queues. Poll manager 236 may include scheduling semantics provided by operating system 230 (e.g., API 232), or any form of polling provided by DBMS 234 or the underlying server 200 architecture. A run queue 238 may logically manage any number of instructions or sets of instructions (hereinafter referred to as runnable threads) in memory 228 that may be waiting to be processed by threads 216. Run queue 238 may organize a plurality of runnable processes or instructions, (also referred to herein below as runnable threads) in a logical array that may have an occupancy measured as a length, size, or index that may indicate a number of runnable threads waiting to be processed. Run queue 238 may organize a list of software threads that may be in a ready state waiting for a hardware thread to become available. The length of run queue 238 may be a meaningful measure of a load on server 200. Run queue 238 may also include an empty run queue 238, having a zero length or size, for example. A scheduler 240 may determine which process from run queue 238 to execute next. According to some embodiments of the present invention, each core of processors 208 may have an associated run queue 238.
Referring now to
It should be appreciated that system 100, server 200, and server 300 are intended to be exemplary and not intended to imply or assert any limitation with regard to the environment in which exemplary embodiments of the present invention may be implemented.
Referring now to
In some exemplary embodiments, runnable threads may be specifically allocated to an individual processor or set of processors of processors 208, which may have a respective poll manager 236, run queue 238, and scheduler 240 for implementing aspects of exemplary embodiments of the present invention.
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
According to another exemplary embodiment of the present invention, a load register 218 may be used to track a run queue 238 occupancy or depth. Load register 218 may be read and modified by scheduler 240 in exemplary embodiments of the present invention. Scheduler 240 may increment load register when a process becomes runnable (i.e., a runnable thread), and decrement load register 218 when a runnable thread is scheduled on processors 208, thereby reducing the cost of determining the instantaneous run queue occupancy.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, an and the are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.