The present invention provides for an electrical computer or digital data processing system or corresponding data processing method including apparatus or steps for exchanging data or messages between two executing programs or processes, independent of the hardware used in the communication. More particularly, the present invention comprises apparatus or steps for exchanging data to support a parallel programming model in a distributed processing environment.
Traditionally, software has been designed for serial processing of data. In serial processing, a single computer having a single processor executes one software instruction at a time. Sometimes, though, large computational or data-driven problems may be made more tractable by breaking the problem down into smaller tasks that can be processed in parallel using multiple computing resources. Parallel processing resources typically include single computers with multiple processors, any number of networked computers, or any combination of both. In general, processes that comprise a parallel application need to communicate with each other, i.e. exchange data with each other. Accordingly, a parallel processing system must provide some mechanism for inter-process communication.
Message passing is one popular programming model that supports inter-process communication. The Message Passing Interface (MPI) has become the de facto standard for message passing. MPI is the first standardized, vendor-independent specification for message passing libraries.
MPI uses objects called “communicators” and “groups” to define which processes may communicate with each other. A group is an ordered set of processes. A communicator is a group of processes that may communicate with each other. The differences between a communicator and a group are subtle. From a programmer's perspective, groups and communicators are virtually indistinguishable.
MPI libraries generally support at least two common message passing patterns. The first often is described as a “scatter/gather” pattern, and the second is a “collaborative” or “peer-to-peer” pattern.
Thus, message passing technology provides a programming model for the inter-process communication necessary in most parallel processing applications. But conventional message passing technology, such as MPI, also requires a complete library infrastructure dedicated exclusively to routing the inter-process communications of those applications.
Many application serving environments, however, do provide an infrastructure for routing data to targeted computing resources. A popular example of such an application serving environment is the WEBSPHERE Application Server marketed by International Business Machines Corp. In such an application serving environment, client requests are routed to various computing resources for the purpose of balancing resource workloads and ensuring resource availability. Contemporary application servers, though, do not support the inter-process communication required for parallel processing applications.
Accordingly, the state of the art could be advanced if message passing systems could leverage the existing routing infrastructure of an application serving environment to enable inter-process communications using shared resources.
The invention is a useful improvement to a process, machine, and manufacture for communicating data between two programs or processes executing in parallel, independent of the hardware used in the communication.
In alternate embodiments, the invention is a message-passing process for routing communications between a transmitting parallel process and a receiving parallel process executing in an application server environment, or a machine or computer-readable memory having the message-passing process programmed therein, the message-passing process comprising: linking a context key to an addressable computing resource in the application server environment; linking the receiving parallel process to the context key; receiving a communication from the transmitting parallel process, wherein the communication transmits the context key; and routing the communication to the addressable computing resource linked to the context key.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will be understood best by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
The principles of the present invention are applicable to a variety of computer hardware and software configurations. The term “computer hardware” or “hardware,” as used herein, refers to any machine or apparatus that is capable of accepting, performing logic operations on, storing, or displaying data, and includes without limitation processors and memory; the term “computer software” or “software,” refers to any set of instructions operable to cause computer hardware to perform an operation. A “computer,” as that term is used herein, includes without limitation any useful combination of hardware and software, and a “computer program” or “program” includes without limitation any software operable to cause computer hardware to accept, perform logic operations on, store, or display data. A computer program may, and often is, comprised of a plurality of smaller programming units, including without limitation subroutines, modules, functions, methods, and procedures. Thus, the functions of the present invention may be distributed among a plurality of computers and computer programs. The invention is described best, though, as a single computer program that configures and enables one or more general-purpose computers to implement the novel aspects of the invention. For illustrative purposes, the inventive computer program will be referred to as the “context key manager” program. Preferably, the context key manager is a component of an application serving environment, which is described in more detail below.
The Application Serving Environment
In a two-tier computer system, a server tier stores and manages data, while a client tier provides a user interface to the data in the server tier, as illustrated in
Probably the most prolific example of a tiered, client/server architecture is the World Wide Web (“the web”). Originally, the web comprised only two tiers—web servers and web clients (more commonly known as web “browsers”).
Although the two-tier architecture has enjoyed much success over the years, sophisticated multi-tier client/server systems slowly have displaced this traditional model. As
A middleware component that implements business logic is referred to commonly as an “application server.” More generally, though, an application server is any program that is capable of responding to a request from a client application. An exemplary application server is a JAVA Virtual Machine (JVM), from Sun Microsystems, Inc. As used herein, an “application serving” environment is any multi-tier computer system having at least one application server.
Clearly, there is some functional overlap between clients, web servers, application servers, and database servers, with each component exhibiting unique advantages. In particular, ubiquitous web browsers such as MOZILLA, NETSCAPE, and INTERNET EXPLORER provide inexpensive (if not free), cross-platform user interfaces that comply (usually) with standard formats (e.g. HTML) and protocols (e.g. HTTP). Similarly, web servers generally offer a cross-platform, standard-compliant means of communicating with the browsers; application servers provide cross-platform access to customized business logic; and database servers provide cross-platform access to enterprise data. Today, an enterprise information system (EIS) generally integrates each of these components, thus capturing the best of all worlds and providing an architecture for implementing distributed, cross-platform enterprise applications.
Application server 605 supports asynchronous messaging. In an embodiment of the invention wherein application server 605 is a JVM, the messaging infrastructure is based on the JAVA Message Service (JMS). The JMS functions of the default message service in application server 605 are served by one or more messaging engines that run within application server 605.
In an EIS, a “node” is a logical grouping of servers. A node usually corresponds to a logical or physical computer system having a distinct network address. Nodes cannot span multiple computers.
A “node group” is a logical grouping of nodes. A node may belong to more than one node group. Each node within a node group needs to have similar software, available resources, and configuration to enable servers on those nodes to serve the same applications.
A “cluster” also is a logical grouping of servers. Each server in a cluster is referred to as a cluster “member.” A cluster may contain nodes or individual application servers. Each member may reside on a different host, but all members of a given cluster must belong to the same node group. Thus, a node group defines a boundary for cluster organization.
Likewise, a “cell” also is a logical group of one or more nodes. A cell is a configuration concept—a way for administrators to logically associate nodes with one another. Administrators define a cell according to the specific criteria of a given enterprise. A cell may have any number of clusters, or no clusters.
A cell must have at least one “core group” of clusters, though. By default, a cell has a single core group, referred to here as the “default core group.” All members of one cluster included in a core group also are members of the core group. Individual application servers that are not members of a cluster also may be defined as a member of a core group.
Core groups (within or across cells) communicate with each other using the “core group bridge service.” The core group bridge service uses access point groups to connect the core groups. A core group access point is a collection of server, node, and transport channel chain combinations that communicate for the core group. Each core group has one or more defined core group access points. The default core group has one default core group access point. The node, server, and transport channel chain combinations that are in a core group access point are called “bridge interfaces.” A host having a bridge interface is referred to as a “core group bridge server.” The transport channel chain defines the set of channels that are used to communicate with other core group bridge servers. Each transport channel chain has a configured port that the core group bridge server uses to listen for messages from other core group bridge servers. Each core group access point must have at least one core group bridge server. The core group bridge server provides the bridge interface for each core group access point.
Workload Management
Workload management is a familiar concept in an application server environment. Workload management optimizes the distribution of client requests to application servers. A workload management router program distributes incoming requests to the application servers that can most effectively process the request.
High Availability
Workload management also can provide a “high availability” (HA) environment. In an HA environment, an HA manager program provides failover services when an application server is not available, improving application availability. An HA manager instance runs on every application server in an HA environment, managing HA groups of cells and clusters. As already described, a cell can be divided into more than one core group. An HA group cannot extend beyond the boundaries of a core group. Each HA manager instance establishes network connectivity with all other HA manager instances in the same core group, using the core group bridge service. The HA manager transport channel provides mechanisms that allow an HA manager instance to detect when other members of the core group start, stop, or fail.
Within a core group, HA manager instances are elected to coordinate HA activities. An instance that is elected is known as a core group “coordinator.” The coordinator is highly available itself, such that if a process that is serving as a coordinator stops or fails, another instance is elected to assume the coordinator role without loss of continuity. The coordinator is notified as core group processes start, stop, or fail, and knows which processes are available at any given time. The coordinator uses this information to ensure that the component keeps functioning.
An HA manager also provides a messaging mechanism (commonly referred to as the “bulletin board”) that enables processes to exchange information about their current state. Each process sends or posts information related to its current state to the bulletin board, and can register to be notified when the state of the other processes change. The WLM router uses the bulletin board to build and maintain routing table information. Routing tables built and maintained using the bulletin board are highly available.
An HA group is created dynamically when an application calls the HA manager to join a group. The calling application must provide the name of the HA group. If the named HA group does not exist, the HA manager creates one.
Every HA group has a unique name. Because any application can create a high availability group, it is the HA group name that ties a given cell or cluster to a particular HA group.
An HA manager keeps track of the state of each member of an HA group. An HA group member may be idle, active, or disabled. Typically, an HA group member is either idle or active. A member that is idle is not assigned any work, but is available as a backup if a member that is active fails. A member that is active is designated as the member to handle the HA group's workload.
Partition Facilities
A “partition” is another useful concept in a high availability, application server environment. A partition is a uniquely addressable endpoint within a cluster. A partition is not a server, though. A partition does have a life cycle, and is managed by an HA manager. A partition is created dynamically at startup during a server's initialization, and then available for client applications to use as a target endpoint when in an active state. To become active, the HA manager moves the partition from an idle state to an active state through a management transition.
A partition may be activated on any cluster member. The HA manager guarantees there is a single instance of an active partition in the cluster at a given time. The HA manager also may move a partition from one cluster member to another. When the HA manager moves a partition, the partition changes states on each cluster member. For example, the partition can be deactivated on the original cluster member, and it can be activated on the new target cluster member.
Optionally, a partition can be associated with a “partition alias.” A partition alias provides more flexible context-based routing for a partition.
A “partition facility” supports the concept of partitioning for enterprise beans, web traffic, and database access. It is both a programming framework and a system management infrastructure. The primary advantage of partitioning is to specifically control resources during cluster member activities. A partition facility can route requests to a specific application server that has exclusive access to some computing resource, such as a dedicated server process or a database server that handles a specific data set. The endpoint receiving the work is still highly available. Consequently, a partitioning facility offers functionality to route work to a particular cluster member.
Relationship between High Availability & Partitioning
A single partition is actually an HA group. For example, when an application creates a partition, it is created on each member of an HA group.
Thus, given the above description of a preferred embodiment of an EIS, it should be clear that an HA manager manages highly available groups of application servers and partitions. As cluster members are stopped, started, or fail, the HA manager monitors the current state and adjusts the state as required. The core group, group coordinator, and policy functions enable the key functions that an HA manager provides.
Routing
The Context Key Manager
The context key manager of the present invention leverages the environment described above—namely, a highly available application server environment with workload management and partitioning facilities—to enable message passing between parallel processes. Before the context key manager is called, though, and application is parallelized either through automated means or program design, identifying the number of parallel application members (PAMs) that is appropriate for the application. Each PAM represents an addressable computing resource capable of participating in a distributed (i.e. parallel) computation.
Thus, as
More than one application may be may be running in the same HA group, so context keys must be unique for each PAM. One useful approach is to use a standard hierarchical name convention for context keys. For instance, a name convention may be “/Parallel/ApplicationName/AppInstanceID/PAM-ID.” Flexibility is the most significant advantage of using context keys over direct references to message caches. The infrastructure of the application serving environment is free to associate a context key with whatever routing information is necessary to ensure availability and differentiation. One alternative to the hierarchical naming convention would be to link a context key to a network port, and use the context key itself to identify the message cache.
Additionally, new relationships may be formed among PAMs by creating new HA groups. In this additional embodiment, an exemplary naming convention would be “/Parallel/ApplicationName/AppInstanceID/GroupName/PAM-ID.” Additional HA groups would allow PAMs to participate logically in various configurations based on the original group. Each additional group would logically have unique message caches related to the default group message caches.
Alternatively, the context key manager sets up a partition for each PAM instead of an HA group of PAMs. If a partition is used, then the context key manager creates a context key for each partition, links each partition to a context key, and links each context key to a PAM. A partition may represent a message cache. The partition router handles communications between PAMs.
In an alternative embodiment, a PAM can initiate a broadcast message destined for all PAMs in a specified group.
A preferred form of the invention has been shown in the drawings and described above, but variations in the preferred form will be apparent to those skilled in the art. The preceding description is for illustration purposes only, and the invention should not be construed as limited to the specific form shown and described. The scope of the invention should be limited only by the language of the following claims.