Operation-partitioned off-loading of operations in a distributed environment


Selected server operations that affect objects in a distributed computing system can be off-loaded from servers at which the objects are stored to other servers without the requirement of vertical partitioning of the affected objects and without off-loading entire affected objects. A client environment process that requests an operation on an object is notified of a task server to which selected off-load operations should be sent. The client preferably stores the task server identifier and thereafter sends such operation request directly to the identified task server. The object metadata information can be stored in the client environment, if desired. The object metadata at the owning repository server is maintained, if affected by the requested operation. A single task server can perform off-loaded functions from several other repository servers at the same node and at other nodes, and in that way reduce the workload of many servers. The functions that can be off-loaded include named pipe functions and byte range file lock operations.


1. Field of the Invention

This invention relates generally to distributed computer processing systems and, more particularly, to management of server operations in distributed processing systems.

2. Description of the Related Art

A distributed computer processing system comprises two or more nodes connected through a communication link called a network. A processing system can be placed at each node of the network and can include one or more computer machines having a central processing unit (CPU). It is desirable to permit a computer machine at one node of the network to gain access to data files located at a remote node of the network. The term “client” is generally used to refer to a processing system that might desire access to a data file and the term “server” is generally used to refer to the processing system at a node where a desired data file is located. Often, distributed computer processing systems include dedicated servers that have no function other than to satisfy data file access requests from clients at the same node or at different nodes of the system.

A data file is a named set of bytes or records that are stored and processed as a unit by a process. A process comprises a set of program instructions that can be stored in addressable program memory storage of a computer machine and loaded into CPU registers that cause the instructions to be executed. A process whose instructions are being executed is said to be running or to be current. A data file that is being accessed (and therefore is potentially being modified) by a process is said to be open. The data file is otherwise said to be closed. Each node of a distributed computer processing system includes one or more operating system processes that provide an operating environment in which program execution, data transfer, and interprocess communication can take place.

Generally, the available computer machine memory is insufficient to provide an actual memory location at which every program instruction and desired data file record can be stored. Multiple processes and the data files they utilize can share the adressable memory available to a computer machine by the concept of virtual storage, which exists relative to an addressable address space of the computer machine. Virtual storage defines an address space of a computer machine memory to comprise fictitious (or “virtual”) memory locations at which program instructions and data files can be stored. Each virtual location of the address space is temporarily mapped onto an actual physical computer memory location that is used only while a process is running or a data file is open and actively using that portion of memory. When a process is not running or a data file is not open, it is stored in an auxiliary storage device, such as a disk drive.

Thus, the virtual storage address space is not limited by the actual number of memory locations of computer machine memory. Rather, virtual storage is limited only by the addressing scheme of a computer machine and the amount of auxiliary storage available. As a result, a distributed computer processing system can include a vast number of processes being executed in an essentially simultaneous fashion. Such concurrent processes can request data file access from servers at a very high rate.

To facilitate communication between the various processes and network users, the distributed computer processing system typically provides an operating environment that includes pipes. Pipes are data structures that are used by processes to provide a means of storing data on a first-in-first-out (FIFO) basis so the data can be shared among the processes of an operating system. That is, a portion of a running user process (also called an application program) creates output data that it writes to a pipe and another portion of the same, or a different, user process reads the data from the pipe. Pipes permit processes to read and write data to and from some shared media, such as a common server memory, and permit such data to be shared with other processes. The operating system that supports such pipes typically includes read-write synchronization features to provide orderly read and write activity between processes. For example, a process might wait for a pipe write operation to occur before reading from the pipe.

Many operating systems for distributed system application support both named pipes and unnamed pipes. Unnamed pipes typically are implemented through storage queues or memory buffers to support local, tightly coupled communications within a processing system at a single network node. Named pipes typically are implemented as defined data objects in that they comprise object names with which data can be associated. The object names provide a reference for processes and therefore named pipes can support more flexibly coupled communications with more distant, remote network recipients.

Two or more processes communicate with named pipes by agreeing on a pipe name, defining a pipe by that name, and eventually opening a pipe having that defined name. As each process carries out such opens, as well as subsequent pipe reads and writes, the pipe operations are coordinated by the pipe server such that the pipe operations are synchronized between the participating processes. This synchronization through a named pipe data object, and passing of pipe data through the mutually opened named pipe data object, allows effective inter-process communications. This synchronized communication is entirely based on the selection of a name known by the participating application processes and the definition of a pipe by that name in a common server repository at a network node.

It is not necessary nor desirable that the pipe server permanently store the pipe data as it would file data. Pipe data is transitory and typically is simply stored in server memory only while the named pipe is open. When all instances of one particular named pipe are closed, the associated data is discarded. That is, memory for holding the pipe data and its status can be freed. This is different from normal file data, which is retained in permanent storage (such as occurs when data is written to a direct access storage device (DASD), including disk drives and the like).

Despite the difference between data files and named pipes relative to the permanent storage of data, they do have in common the concepts of:

1. being named objects, which are objects that are defined in a server where processes can share them and are objects that require permanent storage of information about the object definition (called metadata) in the server repository; and

2. allowing a set of functional operations, including open, write, read, and close operations, by processes that share these objects.

Named pipe operations typically involve transfer of relatively small amounts of data that are necessary to support data file read and write activities. For example, a pipe might be used to communicate the name and storage address location of a data file to be used by a process. The transfer of such a small amount of data consumes almost as much in the way of system resources for each pipe read or write operation as larger data transfer operations of the server. Thus, pipe operations can interface with primary data file transfer and handling operations of a server and can thereby adversely affect server efficiency.

System performance can be improved by storing information about objects, such as data files, within a local cache of a processing system at a client node, or at least in local cache of a network node relatively close to the client node. The second data can include not just information about data objects but also can include the data objects themselves. Such a scheme is called local data object caching.

Local data object caching can be very effective in minimizing server communications from the server at which a data object is stored, called the owning server. Local caching, however, still can eventually require processing of cached data objects by the owning server when the scope of a reference to a cached object exceeds the bounds of the cached location. Thus, server processing is often not reduced through local data object caching. In fact, server processing can be increased when the requirement of managing what is being locally cached is considered. In this way, the local cache can provide the benefit of reducing end user reference time but does not necessarily reduce server resource loads imposed from caching operations.

System performance also can be improved by moving some tasks from the owning sever to another server, thereby preventing excessive loading of resources at a single server. For example, one server might be assigned to perform all operations relating to a particular group of files and another server might be assigned to perform all operations relating to another group of files. Such a scheme could be referred to as “vertical partitioning” because the responsibility for operations on a list of files is divided among servers. Generally, vertical partitioning is used to store data objects in a repository that is available to all users in a distributed system, but in a relatively optimal storage device location (such as a disk drive location) relative to the most common expected users of the data objects.

A type of operational partitioning, or operational off-loading, occurs when one or more particular, self-contained server tasks are delegated from an originating server to another server or process to minimize the operating load on the originating server. For example, input/output spooling is an example of server operational off-loading in which a printing task, which includes parallel execution of several elements of a complex algorithm, is given to a process that implements printing output data without further end-user interaction or involvement.

More particularly, input/output spooling creates a process that receives data records to be transferred from an originating process to another. For example, if a data file is to be printed at a network printer, an output spooling process at a network node processing machine receives the data file from an originating server and independently completes the operating steps necessary to ensure that the data file is printed. It is not necessary for named pipe processes to be created at the originating server to handle the printing operation. Rather, in accordance with output spooling, pipe processes will be automatically created at the implementing server. The originating server is free to execute other tasks after it has sent the data file to the server owning the output spooling process. Spooling is an example of a method of operational off-loading that is made possible by the independent nature of the function that is performed. There is no dependence of the spooling operation on the server repository object definitions.

Vertical partitioning entails off-loading object definitional information (that is, metadata) as well as off-loading the responsibility for permanently storing information such as file data. In contrast, operational partitioning entails off-loading only the execution of specific operations (without moving object definitions to another server). There remains a class of operations and objects where off-loading is required, but:

1. there is dependence on object definition information by the object operations (unlike spooling operations), and

2. it is not practical to permanently off-load the object definition (metadata) along with the currently requested operations on that object.

The reason for the latter impracticality is that such object definitions generally have a defined hierarchical affinity with other objects in the repository. A named part, for example, is typically created (defined) in a directory along with other applications objects that are not necessarily pipes. It is not practical for a server to unilaterally off-load such an object definition to another server without consulting the owning client and it would not be practical to off-load such a pipe object without also off-loading other objects in the same hierarchical directory to which the pipe object has been associated by the owning client.

Off-loading of byte range locking operations to another server is yet another example of operational off-loading where there is a dependence on object definitions and where it is not practical to apply a vertical partitioning method. The reasons for the latter impracticality are the same as for named pipes.

From the discussion above, it should be apparent that there is a need for a distributed computer processing system that permits server task off-loading independently of vertical partitioning of data repositories, thereby reducing server task loading for a greater number of data object types. The present invention satisfies this need.


In accordance with the invention, a set of resource-consuming operations on objects in a distributed computer processing system if off-loaded from the sever at which the objects are stored, called the repository server, to a secondary server, called a task server, without relocating the affected objects. That is, the operations are off-loaded in the sense that they conventionally would be performed by the repository server but instead are performed by the designated task server. The off-loading of the operations occurs dynamically as operations are requested and does not affect where or how the objects are defined or are permanently stored. Thus, the repository server, also called the owning server, does not relinquish ownership of the affected object. The off-loaded operations are generated by request from end-user or application processes executing in client environments of the computer processing system, but the off-loading is transparent to the end-user or source application that originated the request for the operation. Only an administrator process of the repository server is aware of the resource re-balancing.

In one aspect of the invention, for application in a distributed computer processing system, a client process generates a requested operation on an object, such as an operation on a file stored in a direct access storage device (DASD) of a server or a pipe operation of a named pipe in the server. The operation request is received at a client environment router, which determines if the operation is one of an off-load operation set, comprising operations that ordinarily would be performed at the owning repository server but which will instead be performed at a task server of the system. At initial connection of the client environment to the repository server, or at each request, the client environment is notified of an identifier that identifies the task server at which the off-load operation set will be performed. The client preferably is notified at initial connect and stores the task identifier, thereafter directing all off-loaded requests to the identified task server. The off-loaded operations are then performed at the identified task server, and the object information in the repository server affected by the requested operation is updated, where necessary, through server-to-server operations.

If the invention is implemented in conjunction with named pipe operations, for example, a designated pipe server handles the opening, reading, writing, and closing of pipes in response to pipe operations requested from a client environment. In the case of other off-loaded file operations, such as advisory byte range locking, the requesting client opens the data file using the repository server, but uses a byte range lock server to which lock operations are off-loaded to handle all lock and unlock requests.

In yet another aspect of the invention, the primary pipe functions of more than one server in the computer processing system are performed by a single designated task (pipe) server. That is, a single pipe server at a node of the network performs off-loaded pipe functions from several other repository servers at the same node and from servers at other nodes, and in that way reduces the workload of many servers at many nodes.

In this way, access to a data file is achieved without vertical partitioning of the affected data object and thereby promotes off-loading of functions other than the named pipes. For example, byte range file lock, unlock, and query operations on a data file can be off-loaded to a different byte file server from the server that manages file read and write operations on the same data file. A client that requests byte range locking opens a data file as described above, so that the metadata cache contains information that identifies which server can be consulted locally for the file information and so that the client receives an open token for the file. The requesting client sends the open token with a byte range lock request to permit handling of the request. When a byte range lock operation is concluded, the requesting client closes the data file and the owning repository server notifies the byte range lock manager (task) server, which purges all associated locks for the file.

Other features and advantages of the present invention should be apparent from the following description of the preferred embodiments, which illustrate, by way of example, the principles of the invention.


FIG. 1

shows a functional block diagram representation of a distributed data processing system


in which remote clients




communicate with a server


over a network


. Computer processes and operating modules are represented by blocks within each of the respective clients and server. In the first client


, for example, multiple processes


within the client environment request access to data objects. The data objects can comprise data files or pipe objects


, or other system objects, that are stored in a repository comprising one or more permanent storage or direct access storage devices (DASD)


that are associated with a repository server


. The repository server


is the repository for a set of objects such as files and named pipes. The information that defines and describes these objects is called metadata and is stored on DASDs


that are owned and maintained by the repository server


for those objects, i.e., objects are said to be owned by a particular repository server where they are stored on DASD that is also owned by (or connected to) that same repository server. It should be understood that the DASD can be any recordable storage device, such as a disk drive, tape drive, or other device that can record data in non-volatile memory storage.

Some operations are off-loadable to a separate task server for the purpose of reducing the load on the repository server. This off-loading is optional and depends on input from the repository server administrator process or equivalent server loading algorithms in a repository server. When a repository server does not off-load operations to a task server, but instead, performs those operations itself in the same systems environment, the repository server and the task server can be thought of as a single server (


) because they share a common server operational environment. Primary service operations include those operations that have a direct dependency on repository objects. Task server operations, on the other hand, involve off-loadable operations that do not have extensive dependencies on the repository server object repository.

In the preferred embodiment, a look-up operation is performed by a repository server when a client requests an operation on an object such as a data file or a named pipe. Accordingly, the repository server


provides a response to that look-up operation that includes (1) object metadata to be stored in the metadata cache


in the requesting client environment and (2) optionally a task server identification. The object metadata provides information about the object on which the operations will be performed, while the task server identification specifies the server node to which the operations have been off-loaded, if they were off-loaded.

Data operations are off-loaded from a repository server in the sense that the operations are performed by a task server different from the repository server that conventionally would perform the operations. The off-loaded data operations can include, for example, named pipe operations such as pipe open, pipe read, pipe write, and pipe close. With respect to the preferred embodiment, these pipe operations will be referred to as repository server pipe operations. An administrator process at the repository server determines if operations are to be off-loaded to a task server, determines which server is to be that task server, and informs a requesting client that operations are to be forwarded to a designated task server node location, where they will be performed.

FIG. 2

illustrates a generalized distributed computer processing network


constructed in accordance with the invention. The system includes three systems






that are labelled System A, System B, and System C, respectively. It should be understood that the three systems represent interconnected networks that operate in accordance with operating environments such as provided, for example, by the “VM/ESA” operating system product provided by International Business Machines Corporation (IBM Corporation). Thus, each of the three systems illustrated in

FIG. 2

could provide different operating environments.

Each of the three systems






includes one or more servers, each of which potentially comprises a primary file server and/or a task server. A task server handles a particular set of off-loadable operations, such as pipe operations, whereupon it is also called a pipe server. Then a pipe server is a particular type of task server. When such off-loadable operations are not actually off-loaded to another server, then the same server performs both primary services and task services. This is illustrated by the server


, which acts as both a repository server


and a pipe server


. However, when off-loading occurs, the pipe server for a repository server is relocated to another server environment. This is illustrated by the server


, which has off-loaded its pipe server to the server environment



Any server environment has the potential to perform either repository services or off-loadable task services (such as pipe services). These server environments are illustrated in

FIG. 2

as Byte File Servers (BFSs), whether or not off-loading has occurred.

Thus, System A


includes a byte file server


designated BFS A


, which comprises both a repository server


that performs most conventional data operations expected of conventional file servers and a pipe server


that performs named pipe operations. Similarly, the third system, System C


, includes a byte file server


(designated BFS C


) that comprises a repository server


and a pipe server



System B


includes four byte file servers








designated B


, B


, B


, and B


, respectively. The first System B server


includes a repository server


and a pipe server


. It should be noted that the B


and B


byte file servers do not include their own pipe servers. Rather, B


and B


make use of the fourth byte file server


, designated B


, as their dedicated pipe server. It also should be noted that a client


(Client A


) associated with the first system


, System A, has both its primary and its pipe operations performed by the BFS B




of the first System B byte file server


. This is indicated by the line connecting the Client A



to the B




. Thus, it should be clear that the present invention contemplates byte file servers that can perform general repository server functions for associated client machines in common with conventional file servers, can perform off-loaded operations as task server for servers within the same operating environment, and can perform any operations for client machines, in like operating environments and in different operating environments.

FIG. 2

shows that other distributions of data operations can be accommodated by the byte file servers of the present invention. For example, the presence of a connecting line from the Client B



to the System B server




) and from the Client B


to the second System B server




) indicates that the Client B


uses BFS B


as its pipe file server and uses BFS B


as its primary file server.

FIG. 2

also illustrates that a client such as Client B



can utilize multiple repository servers, each of which may or may not have off-loaded its pipe operations to a separate pipe server. Client B


, for example, goes to server BFS B



or BFS B



for primary services, depending on which repository server owns the object that is associated with its current operation. In this same example, Client B



does to the same server, BFS B



, for pipe operations as for primary service operations. However, for pipe objects stored in the repository managed by the repository server BFS B



, it must direct its pipe operations to the server BFS B



, where pipe operations have been off-loaded from BFS B




FIG. 2

also illustrates that a single server, such as BFS B



, can be the pipe server for more than one repository server, as illustrated by repository servers BFS B



and BFS B




FIG. 2

illustrates other configurations of repository server/task server operation sharing that can be accommodated by the byte file servers constructed in accordance with the present invention. For example, the connecting lines leading from the client machine


called Client B


indicate that this machine uses BFS B


as its repository server and uses BFS B


as its pipe server. A similar arrangement holds for the client machine


called Client B


. The client machine


called Client B


uses BFS


of System B and BFS C



of System C as its repository servers. For pipe operations on pipes that are owned by Repository Server B


, the client machine Client B


goes to the pipe server BFS B



in System B, while for pipe operations on pipes that are owned by Repository Server C



, the client machine Client B



goes to the same server BFS C



in System C as where the pipe object is owned. Similarly, the client machine


called Client C


in System C uses BFS B



of System B and the Repository Server


of BFS C



in System C as its repository servers. For pipe operations on pipes that are owned by Repository Server B



, the client machine Client C


goes to the pipe server BFS B



in System B, while for pipe operations on pipes that are owned by Repository Server C



, the client machine Client C


goes to the same server BFS C



in System C as where the pipe object is owned. Thus, the present invention contemplates sharing byte file servers across operating system boundaries.

Thus, if a byte file server is lightly used for its repository server functions, such as opening data files for operations by client processes, then it can be used as a pipe server for its own (local) repository server and as a pipe server for other byte file servers as well. More particularly, any byte server is functionally capable of performing either the repository (primary) server functions associated with managing and storing objects and data, or the pipe server functions, or both sets of functions. The loading and balancing of these functions is administered as needed by a server administrator person or process of the respective byte file servers. In this way, a lightly loaded byte file server can be designated as a dedicated pipe file server for multiple machines, taking on the load of the pipe services of other byte file servers as well as the pipe services defined in its own repository.

A server administrator process



FIG. 1

) is illustrated in the byte file server


of the system


. Those skilled in the art will appreciate that each of the byte file servers is a computer system having a central process unit (CPU) and operating memory, which typically comprises some type of random access memory (RAM). Those skilled in the art also will appreciate that the repository servers and task servers constructed in accordance with the present invention comprise operating system processes that are implemented as machine instructions loaded into server operating memory and executed by the CPU of a subject byte file server to cause the desired machine actions to take place. In a similar fashion, the server administrator process typically will be provided by execution of a set of instructions that cause the byte file server to take desired machine actions. Those skilled in the art will understand without further explanation how to implement the server administrator process so as to perform load balancing between repository servers and task servers in accordance with the particular programming environment in which the invention is implemented.

The byte file server CPU can comprise a variety of well-known processors by manufacturers such as Intel Corporation, Texas Instruments, Inc., and Motorola, Inc. The programming language in which the process instructions are provided can comprise any one of several alternatives, depending on the application, operating environment, and choice of processor manufacturer.

A byte file server can use its own associated pipe server for named pipe operations. This is illustrated in

FIG. 3

, which shows a client


that generates data operation requests comprising pipe operations that are sent to its byte file server


, which includes both a repository server and pipe server. As indicated in

FIG. 3

, the pipe operations include pipe open (abbreviated popen), pipe read and pipe write (abbreviated pread and pwrite, respectively) and pipe close (abbreviated pclose) operations. This arrangement is the default configuration administered by a byte file server of the preferred embodiment.

The preferred embodiment incorporates a distributed processing computer system such as provided by the “VM” product from IBM Corporation. Accordingly, when a client requests access to a data object comprising a named pipe or data file at a server, the client must first make a connection with the server and then the server must perform a look-up function to resolve the data object path name and determine if that data object is a pipe. If the pipe functions of the server are being performed by a task server, then the repository server will return notifying information to the client at the time of initial connection to the repository server. The server look-up function performed by the byte file server returns the look-up information, commonly referred to as metadata, to the requesting client machine. Should the decision to off-load operations to the task server change after the time of connection, this change is reflected by the metadata returned as a response to a look-up operation to the repository server. At the client machine, the information is stored in metadata cache memory. The information includes an indication of the system node (the pipe server) to which the pipe operations should be directed, as well as other associated operations that will be known to those skilled in the art without further explanation.

Thus, a client is informed of off-loaded operations for a server when that client first connects to that server and is advised of changes in the off-load status through cache metadata. For example, that client need not thereafter send requests for pipe operations to that server. Rather, a client router process can determine that all pipe requests for that server can be sent instead to the task server of the server.

Referring to

FIG. 4

, which illustrates the case where pipe operations have been off-loaded to a separate pipe server, the repository server


handles look-up operations but the client router directs pipe operations to a different byte file server


. The client


selects a repository server


of a byte file server


based on the path name used in the pipe operation. Generally, the object path name indicates which server is the repository manager for the object to which the operation applies. Either when connecting to the server


or when receiving metadata response from the pipe object look-up operation to the server


, the client router determines that any subsequent pipe operations are to be routed to the pipe server


of the byte file server


that has been designated by the repository server


as the task server for its pipe operations. It should be understood that the first byte file server


can optionally include a pipe server and the second byte file server


can optionally include a repository server. With respect to the

FIG. 4

configuration, it also should be noted that it is not necessary to send the pipe open operations to the repository server that is associated with a pipe object. Thus, only the look-up functions are indicated as being performed in the repository server


, while the remaining pipe operations are all performed in the second byte file server, which is acting as the pipe server


for the repository server



The present invention also contemplates accommodating data object access control provisions.

FIG. 5

shows a configuration in which a pipe open operation requires client authorization verification by the repository server. This is accomplished by having the pipe server temporarily act as a client. In

FIG. 5

, a client machine


uses a first byte file server


as a repository server and uses a different byte file server


as a pipe server. The box


in the pipe server labelled client indicates that the pipe server is acting as a client, which comprises the pipe server making a service request called a pipe access request (abbreviated “paccess”) to the repository server. The processing steps executed in the course of performing the access control are indicated by the numbered connecting lines between the elements of FIG.


. The initial step is for the repository server to receive a look-up request from the client machine (indicated by the numeral 1) and for the client machine to initiate a pipe open operation (indicated by “popen” and the numeral 2). The numeral “3” indicates the request for access, which the pipe server


implements by the pipe access request from the server administrator process described above.

In the preferred embodiment, the system is implemented in accordance with the well-known POSIX standard and the pipe server is defined as a “super user” in the POSIX standard. Those skilled in the art will understand that such a designation gives the pipe server the capability to connect with the repository server and perform privileged operations, such as gain access to data objects. The pipe open processing (popen) operation in the pipe server


passes authorization identification information associated with the connection from the client machine as parameters in the pipe access request (indicated by the numeral “4”). The repository server uses this information to do an authorization check on the pipe open originator, the client machine. The check operation is indicated with the numeral “5”.

The results of the authorization check are returned to the pipe server


and indicate authorization to proceed with the pipe open. This is what is referred to as an “open instance”. After the pipe open is completed, an open token is returned to the client machine


. Thereafter, only the open token is validated. In this way, no further authorization checking need be done. Those skilled in the art will understand that the type of authorization scheme described herein presumes that the system can pass authorization information to the repository server


when the pipe server


, acting as a client, connects to the repository server. The authorization information cannot be influenced by the connecting “client”


. The authorization information is established by a client machine


that is designated a super user, that is, an administrator for the data repository.

FIG. 6

illustrates a pipe close operation (pclose) from a pipe server to a separate repository server, where the pipe close operation is necessary to update time stamps that are part of the POSIX operating standard. Those skilled in the art will appreciate that time stamps for pipe objects must be kept in the repository server's object repository. The pipe server operation for this pipe function is illustrated in FIG.


and is abbreviated as “putime”. Thus, the operating steps carried out in

FIG. 6

begin with a pipe close request from a client machine


to a pipe server


, indicated by the numeral “1”, which is received at the pipe server, indicated by the numeral “2”. As before, the pipe server acts as a client machine relative to the repository server


and generates an appropriate update time request to the repository server, as indicated by the numeral “3”.

Putime request may be invoked periodically to update the relevant time stamps and statistics, and then to cause corresponding updates to the objects in the repository owned by the repository server.

The preferred embodiment also implements a “get attribute” pipe operation from a requesting client machine to the repository server.

FIG. 7

illustrates the configuration wherein the pipe server


of a client machine


is separate from the repository server


, the data object is a pipe, and the repository server initiates a request to obtain the latest time stamp and pipe size from the pipe server. The request is indicated and abbreviated in

FIG. 7

as “pstat” (pipe stat). Such an operation again requires “super user” authority under the POSIX standard and requires the requesting repository server


to act as a client machine by requesting a pipe status service from another server


. As indicated in

FIG. 7

, the initial operating step is a “get attribute” request from the client machine, indicated by the numeral “1”, which is received at the repository server (indicated by the numeral “2”) and which results in the repository server acting as a client to the pipe server. Thus, the repository server generates a pstat request to the pipe server, indicated by the numeral “3”. The pipe server receives the information and updates the time stamps and other statistics for the pipe data object and provides information that the repository server uses to update its metadata on DASD to reflect the new time stamps and other statistics for the pipe object.

The operation of the byte file servers described above will be better understood with reference to pseudo code that describes the operating steps followed by the byte file servers for various data operations. For example, the operating steps performed by a server in a pipe open operation are illustrated in the pseudo code listed in Table 1 below, comprising lines




. That is, pipe servers execute program logic described by the pseudo code below in processing a pipe open request. The pseudo code is explained in greater detail in the text following the code listing:



Begin (155) Popen logic in server for pipe open procedure


 *Function—open a named pipe is selected




  + Repository server file server identification


   (file server id)


  + Pipe object identifier returned from look-up done


   by the repository server


  + Mode—Read or write


  + Wait Option—wait for complementary open




  Open Token




 If (116) Repository Server = Pipe Server


  Validate that client has token (has done a look-up)


  Validate permission for the current operation


  Read metadata from DASD (record for the pipe object)


 Else (125) Repository Server Pipe Server


  Send PACCESS request to the Repository Server; passing


   Pipe Object Identifier and name information


   Mode (Read or Write)


   Note—Information for identifying the originating


   server that is needed for permission checking


   at the Repository Server is obtained securely


   from the control program when the connection


   is completed.


 Endif (112) Repository Server = Pipe Server


 Get storage for OTB (Open (pipe) Token Block)


  Holds open pipe information, represents an open instance for


  a pipe, and includes a buffer that is used for this open and


  for subsequent pipe operation responses that are associated


  with this pipe open instance.


 Initialize the OTB with:


  Open Type (read or write indicator)


  Originating Client Identifying Information


  Originating Client connection path (for current




  Repository Server ID


  Current Response Control Information


  Pipe token and other identification information.


 Establish access to OTB via a hash table


 Generate an Open Token.


 If (142) No previous OTBs exist for this pipe object


  Pipe token = 0


 Else (145) previous OTB(s) exist for this pipe object


  Get Pipe token from one stored in one of the


   existing OTB(s) for this pipe.


 Endif (140) No previous OTBs exist for this pipe object


 Call Pipe Processor to process the pipe open (initialize


  the pipe), passing


  operation = OPEN


  Pipe Token (0 only for first open); value set by Pipe Processor.


  Open Token (OTB pointer)


  Mode (read/write) and wait option from popen parms


  Pipe Processor returns a pipe token.


 Store pipe token in the OTB.


 Note—Pipe Processor above takes care of notifying any


  other clients whose opens may be waiting for this open.


 Return the open token to the originating client.


End (100) popen

The pseudo code of Table 1 specifies the operating sequence for performing a pipe open operation. These are the operating steps that would be performed by one of the pipe servers


illustrated above in

FIG. 6

after receipt of a pipe open request. The process input parameters are listed in lines




. The repository server name in lines




is returned from a look-up operation or return information from the repository server when the originating client first connects to that server. The wait option input parameter permits client pipe open sychronization with other pipe open operations for the same pipe, as will be understood by those skilled in the art. The output is an open token, which is passed with read, write, and close operations to the same repository server. That is, the output token permits a requesting client machine to continue with operations on the same open pipe instance.

In the pseudo code of Table 1, the steps of the byte file server pipe open logic are laid out beginning at line


. Initially, the pipe server determines if it is also the repository server for the pipe operations (line


). If the server is acting as its own pipe server, then there is no need to execute a PACCESS command and the pipe must be opened and maintained by this repository server itself. This is the case when the pipe operations have not been off-loaded to a separate pipe server, defaulting to requiring the repository server to do its own pipe server operations. Accordingly, operating steps are followed as specified in lines




, which show that pipe request validation steps are performed and the pipe metadata is read from DASD where the pipe object resides. If the repository server is not also acting as its pipe server (line


), then it must be the case that this processing is being carried out because this server is a pipe server to another byte file server. Therefore, this server must gain access to the named pipe by first validating access permission to the requested pipe. The pipe server therefore sends a PACCESS request to the repository server (lines




). These steps comprise making a connection to the task server if connection does not already exist to that server, and passing parameters that are needed to check the authorization of the originating client to execute the pipe open. A positive response for this pipe access operation indicates that the pipe open operation is permitted to continue in the pipe server.

Independent of the results of the conditional logic in steps




, an open token block (OTB) must be allocated for the open pipe information corresponding to the current named pipe open instance. This operation is specified in lines




of the pseudo code. Initializing the OTB includes obtaining information from the open request that indicates the mode of the open (read or write), the originating client to whom response will be sent, connection path, repository server name, control information, and a pipe token and other information, as specified in lines




of the pseudo code. Establishing access using a hash table (line


) is a conventional operation that should be known to those skilled in the art without further explanation. In the step specified by line


B, an open token is created for return to the originating client. This token will be used as input by subsequent read/write and close operations for this same pipe open instance where the receiving server will use the open token to find the appropriate OTB for the open instance.

Pipe tokens are used to identify a pipe object for which any open instance exists in the server. It is used to identify the pipe object for purposes of all current opens for that pipe object. Detailed pipe data operations are managed by a Pipe Processor, which is a component of a pipe server. The Pipe Processor moves data in and out of a pipe as a result of pipe writes and reads. It manages synchronization of pipe waits where wait options are used. It also handles other details for pipe operations that are beyond the scope of this preferred embodiment. The Pipe Processor is invoked for each pipe operation (pipe open, pipe write, pipe read, and pipe close) so that it can manage the pipe data and synchronization are required by the pipe object opens. In response to a pipe open invocation of the Pipe Processor, the Pipe Processor returns a pipe token that identifies the current pipe object for all current pipe open instances.

In lines




of the pseudo code, the server checks for the existence of a previous open token block. If such a block exists, indicating the named pipe has already been opened, the existing pipe token is retrieved from a previous OTB (lines




). In lines




, the remaining pipe open functions are performed by the invocation of the Pipe Processor component of the pipe server. In line


the pipe token is stored in the open token block (OTB) of the pipe server and the pipe open processing is concluded. In the preferred embodiment, a wait option is accommodated in which clients waiting for a pipe are informed of open processing for the pipe they are waiting on (lines




). Pipe open processing concludes at line



The operating steps of the pipe read operation described above are illustrated by the pseudo code listed in Table 2 below, comprising lines







Begin (149) Pread logic in the task server for pipe read procedure


 *Function—read from a named pipe




  + Open Token


  + Byte Length of data requested for the pipe read


  + Wait/NoWait Option (may be different than specified


   with the open)


 *Output -


  + Effective Length (May be < Requested, including


   0, is set when there is an error condition).


  + Pipe data (that was read)


 *Logic -


 Validate the input open token


 Use open token to find the OTB.


 If (118) pread length > allowed by Base Buffer


  Allocate additional buffers for the pipe read


  Format pointer list in Base Buffer area for passing


   buffer pointers to the Pipe Processor


 Else (121)


  Set up pointer list in Base Buffer area for pointing


   only to the Base Buffer itself.


 Endif (114) pread length > allowed by Base Buffer.


 Call Pipe Processor to process the pipe read, passing


  operation = READ.


  Pipe Token (from OTB)


  Open Token


  Requested pipe read data length.


  Mode and wait option from pread parameters.


  Address of the pointer list to the buffers for the


   output of the read.


  Address of the field where Pipe Processor is to post


   the effective length of the read upon completion.


  Response will indicate effective length of the read


   which can be less than that requested. Effective


   length is initially stored in the OTB.


 If (138) wait required


  Response is handled by the Pipe Processor when a pwrite


   or pclose satisfies the condition of the pread wait.


 Else (148) read is satisfied immediately.


  If (142) Effective Length > 0


   Set ReadDone indicator in OTB so that we will know to


    update time stamps at close.


  Endif (139) Effective Length > 0


  Note: Pipe Processor will respond to waiting writers as


   satisfied by this successful read. Such writers may


  be waiting for a read to empty out the pipe sufficient


   for additional write operations to complete.


  Respond to the read request.


 Endif (135) wait required


End (100) Pread

As indicated in the pseudo code (lines




), the read processing requires input of the open token (for authorization control), the data length, and a wait option accommodated by the preferred embodiment. The output is the effective length and pipe data (lines




). After validating the open token and finding the OTB (lines




), steps are taken to ensure adequate read buffer space for the pipe (lines




). The actual pipe read operation, to be performed by the above-mentioned pipe processor, is described in line




of the pseudo code, with the wait option steps comprising lines





The operating steps for the pipe write operation described above are illustrated in the pseudo code listed in Table 3 below, comprising lines







Begin (132) Pwrite logic in the task server for pipe write procedure


 *Function—write to a named pipe




  + Open token


  + There is a wait option for writes whereby a write


    may wait for the pipe to empty to make room


    for the write to complete. The pipe empties


    via pipe read completions.


  + Length of pipe data


  + Pipe data itself


 *Output -




 *Logic -


 Validate input open token.


 Use Open token to set up addressability to OTB.


 Set up the input pipe write data buffer (list)


  to call the Pipe Processor.


 Call Pipe Processor to do the write operation, passing


  Operation = WRITE


  Pipe token


  Open token


  Wait indicator (input—indicates willingness to wait


   for completion).


  Pointer list for pipe write data.


  Wait result (indicates that request must wait for later




 Pipe Processor takes care of responses to waiting pipe


  reads or closes that are to take place as a result


  of a successful write to the same pipe.


 Set WriteDone indicator in OTB that write completed so


  that timestamp will be updated by close.


 Send response to the requestor.


End (100) Pwrite

The input and output parameters should be apparent from the listing. In the processing logic, validation and initialization takes place in lines




. The pipe processor component of the pipe server is called in lines




and takes care of pipe write operations in lines




. The write completion is indicated in line




and the response is sent to the requesting client machine in line



The operating steps for the pipe close operation described above are illustrated in the pseudo code listed in Table 4 below, comprising lines







Begin (130) Pclose logic in the task server for pipe close procedure.


 *Function—close a named pipe




  + Open Token








 Validate the Open Token.


 Use Open Token to address OTB.


 Call Pipe Processor to do the close, passing


  Operation = CLOSE


  Pipe Token


  Open Token


 The Pipe Processor will take care of responses to waiting


  pipe reads or writes that are completed by this close.


 If (117) ReadDone indicator is set ON


    Generate time stamps for changes affected by pipe reads.


 Endif (115) ReadDone indicator is set ON.


 If (120) WriteDone indicator is set ON


    Generate time stamps for changes affected by pipe writes.


 Endif (118) WriteDone indicator is set ON.


 If (126) this Pipe Server is Repository Server


   Send PUTIME request to the Repository Server, passing


    the new time stamps for updating metadata in the


    Repository Server and the pipe object identifying




 Else (128) Pipe Server and Repository Server are same


   Update timestamps via local metadata update (DASD write).


 Endif (121) Pipe Server and Repository Server are separate.


 Free the OTB.


 Send normal Pclose response.


End (100) Pclose

The programming logic for the pipe close operation should be self-explanatory, where again there are operation validation steps and the actual close communication operations are performed by a dedicated pipe processor component of the pipe server.

The operating steps for the pipe access operation described above are illustrated in the pseudo code listed in Table 5 below, comprising lines







Begin (126) Paccess logic in the repository server for pipe access procedure.


 *Function—Paccess for a named pipe.


 *This function is processed by the Repository Server where


  the Repository Server is not also the Pipe Server.




  + User/group identifiers for validating permission for


   doing the pipe operation


   comes from connect information to the pipe




  + Pipe Identifying information (path and pipe name)


  + Mode—read or write—comes from input parm to


    the popen request.


  + Client Identifying information


 *Output -


  + Return information indicating pass or failure.



 *Logic -


 Check that originating server has special


  privileges for doing this function.


 Validate pipe identifier (that it is defined)


  and read (DASD) metadata record for the object.


 Verify that User/group information passed to this operation


  is one that has permission for the named pipe according


  to metadata records.


 Return information indicating pass or fail of the above




End (100) paccess

The programming logic of Table 5 describes an inter-server operation for validating access to a name pipe. The operation comprises verification of valid privileges and identifying information in lines




. Such steps will be dependent on the particular operating system configuration in the system of implementation and are listed here for exemplary purposes. Thus, the details of such operations are conventional and will be known to those skilled in the art.

The operating steps for the pipe “utime” (Putime) operation described above are illustrated in the pseudo code listed in Table 6 below, comprising lines







Begin (118) Putime logic in the repository server for pipe update procedure.


 *Function—update timestamps in repository server, originated


  by Pclose in a separate Pipe Server (see Pclose).




  + read time stamp (if 0, no change)


  + write time stamp (if 0, no change)


  + Pipe identifying information


 *Output -




 *Logic -


 Validate that requestor has special privilege to do this




 Validate the pipe object identifying information and


  fetch the metadata record for the object.


 If (117) validations pass


   Update time stamps given on input and write metadata


    record (to DASD) with the changes.


 Endif (114) validation pass.


End (100) putime

Table 6 describes timestamp update procedures that comprise validation steps (lines




) followed by the timestamp procedure (lines





The operating steps for the pipe stat (Pstat) operation described above are illustrated in the pseudo code listed in Table 7 below, comprising lines







Begin (142) Pstat logic in the pipe server for request timestamp procedure.


 *Function—request timestamp for a pipe (called from


  Repository Server as a result of a get attribute request


  for a pipe object—


  sent to the Pipe Server if Pipe server not same as


  the Repository Server). Repository Server updates timestamps


  in catalogs as a result.




  + Pipe Path and Object Name Information


  + Note—originating Repository Server id is needed but is


    not passed because it is available via connection






  + Response


  + -read time stamp (0 indicates no change or pipe not


    currently open for read.


  + -write time stamp (0 indicates no change or pipe not


    currently open for write).


  + -pipe size—current number of bytes in the pipe


 *Logic -


 Check that originating server has special privileges for


  doing this function.


 If (139) OTBs exists for the pipe (found through hash table)


  If (138) caller's server id is set in one (or


   more) of the OTBs.


  If (129) ReadDone indicator in the OTB is ON


   Get current time and set it in read time stamp of the


    response area.


   Reset ReadDone indicator


  Endif (125) ReadDone indicator in the OTB is ON.


  If (134) Write Done indicator in the OTB is ON.


   Get current time and set it in write time stamp in the


    response area.


   Reset WriteDone indicator


  Endif (130) WriteDone indicator in the OTB is ON.


  Call the Pipe Processor to get the


   number of bytes currently in the pipe (pipe size).


  Set result in the Response.


  Endif (123) caller's file server id is Set in one (or


 Else (141) no OTB exists for the pipe.


  Return 0 values in time stamps.


 Endif (122) OTBs exists for the pipe.


End (100) Pstat

Where Table 6 listed the program logic in the repository server for performing a timestamp update, Table 7 above lists the program logic in the pipe server responding to a request from the repository server for latest time stamps and pipe size. In the program logic, validation is first performed (lines




) and then the timestamp logic begins (line


). If the OTB exists, then the timestamp operation can be performed (lines




). If no OTB exists, no timestamp can be provided (lines





The distributed processing system described above provides a client with access to pipe data at a server without vertical partitioning of the affected pipe objects and with off-loading affected pipe objects. The server off-loads pipe operations that are used on pipe objects, specifically the operation set of pipe open, pipe read, pipe write, and pipe close. All other pipe operations are performed by the repository server. In this way, the distributed computer processing system supports named pipes by permitting operational (task) partitioning of the server independently of vertical partitioning of data repositories and thereby reduces server task loading.

Another example of operational off-loading contemplated by the byte file server constructed in accordance with the invention is off-loading of byte range lock operations. That is, byte range lock operations otherwise performed by a repository server can instead be performed by a byte range lock type of task server in a similar manner as pipe operations otherwise performed by a repository server were performed by the pipe server type of task server described above (FIGS.





FIG. 8

shows a first server S


and a second server S


as well as a first client C


and a second client C


. A direct access storage device (DASD) is connected to the first server S


. In an exemplary operation, the first client sends a look-up operation (labelled with numeral “1”) to the first server S


, where the path name passed with the look-up operation resolves to a data file in the first server S


data repository (labelled with numeral “2”). The look-up operation is similar to that referred to above with respect to named pipe operations. The look-up operation resolves the path name for the file and returns internal identifiers for the path elements, such as directories, and the file object itself. This return information is designated with numeral “3” in FIG.


and is retrieved from metadata stored on the DASD.

An open file operation directed to the first server S


is indicated by the numeral “4” and opens the data file. This open is accomplished in conjunction with identifying information returned from the look-up operation. The open operation also retrieves metadata from the DASD and stores it in a new open control block for the data object File A in the first server S


(numeral “5”). By examining load-balancing information, the first server S


determines that byte range locking for File A can be performed by another byte file server. This can be determined, for example, by the server administrator of the first server. The first server then returns an open token to the first client C


(indicated by numeral “6”) and at the same time returns an indicator that byte range locking operations for File A have been off-loaded and those operations should be directed to the second server S


, which is designated as the task server for byte range locking. The precise structure of the indicator and name of the task server returned can take many forms and can be fashioned by those skilled in the art without further explanation. The first client C


continues with a file read operation, indicated by numeral “7”, passing the same open token, which causes the first server S


to retrieve the requested data file from the DASD (indicated by numeral “8”) and to return it to the first client C


(indicated by numeral “9”).

When the first client C


has a need to lock File A, it sends a lock operation (indicated by the numeral “10”) to the second server S


instead of the first server S


. Note that S


is the repository server and repository for the associated file and the second server is the task server for byte range locking. For the first lock of File A, the second server S


goes to the first server S


with a validate operation (indicated by number “11”). The validate operation indicates that File A has been opened by C


and that permission has been obtained for the lock operation. This validation is a server-to-server operation that is appropriate only for the first such lock operation on File A by C


. The validation is recorded in the second server S


so that is need not be repeated for other lock and unlock operations of the same scope. The second server S


records the actual lock using a byte range lock manager (BRLM), indicated by the numeral “12”, and responds to the first client (indicated by “13”).

In the system configuration of the

FIG. 8

embodiment, the second client C


sends similar look-up operations, open operations, and the like (not illustrated) to the first server S


for File A and similarly is told to use the second server S


for lock operations. Accordingly, the second client C


sends a lock operation to the second server S


(indicated by “14”), where the operation is validated as before (indicated by “15”) and is sent to the BRLM (indicated by “16”). If there is a lock conflict, the BRLM queues the lock behind the lock already recorded for C



The exemplary

FIG. 8

configuration assumes C


then unlocks in File A lock (as indicated by “17”). Because of previous validation recording in the second server S


, there is no need to repeat the validation step, so the unlock operation is passed directly to the BRLM (indicated by “18”), where it also causes the lock to be given to the second client C


, who is queued for it. C


is notified by the BRLM through a “post” function (indicated by “19”), which generates a response to C


(indicated by “20”). A response to the current unlock operation also is sent to the first client (indicated by “21”).

When the first client C


closes File A, indicated by “22”, an invalidate operation (indicated by “23”) is sent by the first server S


to the second server S


so the second server will know that the file is no longer open, require a validation for the next lock operation, and free any locks still held by C


for File A. A response to the close operation is sent to C


, as indicated by “24”.

It should be clear that similarities exist between the byte range lock off-loading described immediately above and the named pipe illustration described previously. In both cases, there are some operations that are appropriately retained in the repository server, while other operations can be off-loaded to another server because of minimal operational independence from object definition, potential for frequent occurrences, and generally minimal data transfer. Also, there may be initial open and close related operations that require occasional server-to-server communications, mostly for security or data integrity validation. Such issues will be known and can be handled by those skilled in the art without further explanation.

Other considerations may lead to alternative features for the processors described above in conjunction with the byte filer server


of the preferred embodiment illustrated in FIG.


. For example, it might be desired to dynamically determine the operations that will be off-loaded, or transferred to an alternative server. In particular, it can advantageous for the server administrator process to determine the operations that will be off-loaded in response to the load of the server. Such a design, for example, might entail off-loading pipe operations up to a first predetermined operating load of the server, as determined by collected system operating statistics, and might entail off-loading byte range locking operations if such statistics indicate that the server has reached a second predetermined operating load. Further, a byte file server might incorporate a staged partitioning in which only named pipe operations are off-loaded up to a first operating load and both pipe operations and bye range locking operations are off-loaded from the first operating load to a second operating load.

The process of performing the steps described above in connection with the drawings and the pseudo code may be performed by the system elements illustrated in

FIG. 1

comprising the client environments




and the server


by executing a series of computer-readable instructions contained on a data storage medium, such as a program product storage device


, that can be read by the respective system elements. The program product storage device may comprise, for example, a recordable medium such as a floppy disk on which the instructions are recorded. These instructions may instead be tangibly embodied on another DASD (not shown), a magnetic tape, a conventional hard disk drive, electronic read-only memory, optical storage device, set of paper “punch” cards, or other forms of non-volatile data storage as will be understood by those skilled in the art. Alternatively, the instructions can be received by the server


and distributed to the client environments over the network



The present invention has been described above in terms of a presently preferred embodiment so that an understanding of the present invention can be conveyed. There are, however, may configurations for file servers not specifically described herein but with which the present invention is applicable. The present invention should therefore not be seen as limited to the particular embodiment described herein, but rather, it should be understood that the present invention has wide applicability with respect to file servers generally. All modifications, variations, or equivalent arrangements that are within the scope of the attached claims should therefore be considered to be within the scope of the invention.

