The present invention relates to grid computing, in which a distributed network of computers is employed, and in particular to a means of providing services over a grid network.
Grid computing (or the use of a computational grid) may be regarded as the application of the resources of many computers in a network to a single problem at the same time—usually to a scientific or technical problem that 10 requires a great number of computer processing cycles or access to large amounts of data. The computational Grid aims to facilitate flexible, secure and coordinated resource sharing between participants. In a Grid computing environment many different hardware and software resources have to work together seamlessly. A specific architecture and protocols have been defined for the Grid, and are explained for example in Foster et al “The Anatomy of the Grid—enabling scalable virtual organisations”—http://www.globus.org/research/papers/anatomy.pdf.
Referring to FIGS. 7 to 9,
Terms and Standards
Grid systems are represented by the OGSA (Open Grid Services Architecture) see I. Foster, C. Kesselman, J. M. Nick, S. Tuecke. “The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration” http://www.globus.org/research/papers/ogsa.pdf;
WSRF (Web Services Resource Framework) is a standard proposal for implementing OGSA: K. Czajkowski, D. Ferguson, I. Foster, J. Frey, S. Graham, T. Maguire, D. Snelling, S. Tuecke. “From Open Grid Services Infrastructure to WS-Resource Framework: Refactoring and Evolution Version 1.11” May, 2004, http://www-106.ibm.com/developerworks/library/ws-resource/ogsi_to_wsrf—1.0.pdf.
OGSA is represented by the standard OGSI (Open Grid Services Infrastructure), S. Tuecke et al: Open Grid Services Infrastructure (OGSI) Version 1.0, June 2003, http://www.globus.org/research/papers/Final_OGSI_Specification_V1.0.pdf, and
GT3, is a reference implementation of OGSI, see Globus Team, Globus Toolkit, http:H/www.globus.org, and GT4 is a reference implementation of WSRF;
Resource Specification Language (RSL) provides a common interchange language to describe resources. The various components of the Globus Resource Management architecture manipulate RSL strings to perform their management functions in cooperation with the other components in the system.: see http://www.globus.org/gram/rsl_spec1.html
WSDL—Web Services Description Language (WSDL) (see Web Services Description Language (WSDL) Version 1.2, http://www.w3.org/TR/wsdl12). WSDL represents the service description layer within a Web service protocol stack for specifying a public interface for a web service.
Condor—A job manager—see D. Thain, T. Tannenbaum, and M. Livny, “Condor and the Grid”, in Fran Berman, Anthony J. G. Hey, Geoffrey Fox, editors, “Grid Computing: Making The Global Infrastructure a Reality”, John Wiley, 2003
Legacy Code
Grid resources can include legacy code programs that were originally implemented to be run on single computers or on computer clusters. Many large industrial and scientific applications are available today that were written well before Grid computing or service-oriented architectures appeared. One of the biggest obstacles in the widespread industrial take-up of Grid technology is the existence of a large amount of legacy code that is not accessible as Grid services. The deployment of these programs in a Grid environment can be very difficult and usually require significant re-engineering of the original code. To integrate these legacy code programs into service-oriented Grid architectures with the smallest possible effort and best performance, is a crucial point in more widespread industrial take-up of Grid technology.
There are several research efforts aiming at automating the transformation of legacy code into a Grid service. Most of these solutions are based on the general framework to transform legacy applications into Web services outlined in D. Kuebler, and W. Eibach, Adapting legacy applications as Web services, IBM Developer Works, http:H/www-106.ibm.com/developerworks/webservices/library/ws-legacy, and use Java wrapping in order to generate stubs automatically. One example for this is presented in Y. Huang, I. Taylor, D. Walker, and R. Davies, Wrapping Legacy Codes for Grid-Based Applications, in Proceedings of the 17th International Parallel and Distributed Processing Symposium (Workshop on Java for HPC), 22-26 Apr. 2003, Nice, France. where the authors describe a semi-automatic conversion of legacy C code into Java using JNI (Java Native Interface). After wrapping the native C application with the JACAW (Java-C Automatic Wrapper) tool, MEDLI (MEdiation of Data and Legacy Code Interface) is used for data mapping in order to make the code available as part of a Grid workflow. Such Java wrapping requires the user to have access to the source code. To implement a particular wrapper for grid-enabling, it is necessary to acquire a subset of code semantics and these are extracted from the source code itself. Current approaches are based on the information expressed in certain sections of the code (typically known as the header file). In well-formed code, the relevant information is expected to be located in the header file. In practice this is not always the case—crucial information can be buried or “hard-coded” in the body of the source code, and cannot easily be located. An example of this problem is in the specification of file location for file parameters. This is a major shortcoming of the approach.
A different approach from wrapping is presented in T. Bodhuin, and M. Tortorella, Using Grid Technologies for Web-enabling Legacy Systems, in Proceedings of the Software Technology and Engineering Practice (STEP), The workshop Software Analysis and Maintenance: Practices, Tools, Interoperability, September 1921, 2003, Amsterdam, The Netherlands, http://www.bauhaus-stuttgart.de/sam/bodhuin.pdf;. This describes an approach to deal with non-decomposable legacy programs using screen proxies and redirecting input/output calls. However, this solution is language dependant and requires modification of the original code. B. Balis, M. Bubak, and M. Wegiel, A Framework for Migration from Legacy Software to Grid Services, In Cracow Grid Workshop 03, Cracow, Poland, December 2003, http://www.icsr.agh.edu.pl/balis/bib/legacy-cgw03.pdf. describes a framework devised specifically for adaptation of legacy libraries and applications to Grid services environments. However, this describes a very high level conceptual architecture and does not give a generic tool to do the automatic conversion nor propose a specific implementation.
It is an object of the present invention to provide a high-level Grid application environment where the end-users can easily and conveniently create complex Grid applications.
It is an object of the present invention to provide a high-level Grid application environment where the end-users can apply any legacy code as a standards compliant Grid service when they create Grid applications.
In a first aspect, the invention provides a Grid management service for deploying legacy code applications on the Grid, the service comprising:
selection means for permitting selection of a desired legacy code application,
process means for creating a legacy code instance in response to said selection;
environment means for defining a legacy code job environment; and
submission means for submitting a job for said desired legacy code application, together with information relating to said job environment, for submission to a job management means that arranges for said job to be executed on Grid resources.
In a second aspect, the invention provides a method of providing legacy code applications as a Grid Service, the method comprising:
selecting a desired legacy code application, and creating, in response to the selection, a legacy code process instance;
defining a legacy code job environment, and
submitting a job for said desired legacy code application, together with information relating to said job environment, for submission to a job management means that arranges for said job to be executed on Grid resources.
The present invention provides a Grid environment where users are able to access predefined Grid services. More than that, users are not only capable of using such services but they can dynamically create and deploy new services in a convenient and efficient way. The present invention provides a means to deploy legacy codes as Grid services without modifying the original code. The present invention may be easily ported into WSRF Grid standards
In at least a preferred embodiment, the invention operates on the binary code, rather than the source code. It is therefore completely independent of the programming language(s) in which the code was originally developed, and pre-empts the need for any language-based intervention. The subset of code semantics necessary to implement grid-enabled version of a particular code is essentially the specification of input and output parameters, based on the use of the application. This may be documented (e.g. the user manual) or undocumented (e.g. derived from user experience). The specification of input/output includes the format and location of the parameters.
By its very nature, the specification of the input/output parameters is implicitly user-controlled. This has the advantage that the user can choose to deliberately limit the usability of the code when it is published as a grid service.
The invention, at least in a preferred embodiment, incorporates security methods for authentication and authorisations. It also incorporates mechanisms for implementing “statefulness” of the generated grid service. Specifically, it creates persistent instances of the service, each with their own state, for each call of the service.
The present invention offers a front-end Grid service layer that communicates with the client in order to pass input and output parameters, and contacts a local job manager through Globus MMJFS [(Master Managed Job Factory Service)—Globus Team, Globus Toolkit, http://www.globus.org] to submit the legacy computational job. To deploy a legacy application as a Grid service there is no need for the source code and not even for the C header files, in contrast to the prior art. The user only has to describe the legacy parameters in a pre-defined XML format. The legacy code can be written in any programming language and can be not only a sequential but also a parallel PVM (Parallel Virtual Machine) or MPI (Message Passing Interface) code that uses a job manager like Condor where wrapping can be difficult. The present invention can be easily adapted to other service-oriented approaches like WSRF or a pure Web services based solution. The present invention supports decomposable or semi-decomposable software systems where the business logic and data model components can be separated from the user interface.
In order that the present invention be better understood, a preferred embodiment will now be described with reference to the accompanying drawings, wherein:
FIGS. 7 to 9 are representations of the protocol stack for Grid services.
The present invention includes a method by which Legacy Code Applications may be transformed into services for the Grid. Throughout the following description, such method is referred to as GEMLCA (Grid Execution Management for Legacy Code Architecture).
The present invention provides a client front-end OGSI Grid service layer that offers a number of interfaces to submit and check the status of computational jobs, and get the results back. The present invention has an interface described in WSDL that can be invoked by any Grid services client to bind and use its functionality through Simple Object Access protocol (SOAP). SOAP is an XML-based protocol for exchanging information between computers (XML is a subset of the general standard language SGML). The general architecture to deploy existing legacy code as a Grid service by means of the present invention is as preferred based on OGSI and GT3 infrastructure but can also be applied to other service-oriented architectures. A preferred embodiment provides the following characteristics:
The present invention is a Grid architecture with the main aim of exposing legacy code programs as Grid services without re-engineering the original code and offering a user-friendly interface. The conceptual architecture is shown in
In order to access a legacy code program, the user executes a Grid Service client that creates, a legacy code instance with the help of the legacy code factory. Following this, the GEMLCA Resource submits the job to the compute servers through GT3 MMJFS using a job manager, such as Condor.
The invention is composed of a set of Grid services that provides a number of Grid interfaces in order to control the life cycle of the legacy code execution. This architecture can be deployed in several user containers or Tomcat application contexts.
Legacy Code deployment
Thereafter the XML file is stored and is made available to the Resource when a job is submitted
GEMLCA security and multi-user environment The invention uses the Grid Security Infrastructure (GSI) [J. Gawor, S. Meder, F. Siebenlist, V. Welch, GT3 Grid Security Infrastructure Overview, February 2004. http://www-unix.globus.org/security/.gt3-security-overview.doc] to enable user authentication and to support secure communication over a Grid network. A client needs to sign its credential and also to work in full delegation mode in order to allow the architecture to work on its behalf. There are two levels of authorisation: the first level is given by the grid-map file mechanism [L. Ramakrishnan. Writing secure grid services using Globus Toolkit 3.0. September 2003, http://www-106.ibm.com/developerworks/grid/library/gr-secserv.html]. If the user is correctly mapped, the second level comes into play, which is given by a set of legacy codes that a Grid Client is allowed to use. This set is composed of a combination of a general list of legacy codes, available to anyone using a specific resource, and a user mapped list of legacy codes, only available to Grid clients mapped to a local user by the grid-map file mechanism. The invention administers the internal behaviour of legacy codes taking into account the requirements of input files and output files in a multi-user environment, and also complies with the security restrictions of the operating systems where the architecture is running. In order to do that, The invention uses itself in a protected mode composed of a set of system legacy codes in order to create and destroy a unique process and job stateful environment only reachable by the local user mapped by the grid-map file mechanism.
Grid Client interaction with GEMLCA interfaces
Detailed Description of the Architecture
Referring now to
The front-end layer called Grid Services Layer is published as a set of Grid Services, which is the only access point for a Grid client to submit jobs and retrieve results from a legacy code program. This layer offers the functionality of publishing legacy code programs already deployed on the master node server. A Grid client can create a GLCProcess and a number of GLCJob per process that are submitted to a job manager. This allows the user extra flexibility by adding the capability of managing several similar instances of the same application using the same Grid service process and varying the input parameters.
The Internal Core Layer is composed of several classes that manage the legacy code program environment and job behaviour.
The GT3 backend Layer that is closely related to Globus Toolkit 3 and offers services to the Internal Layer in order to create a Globus Resource Specification Language file (RSL) [see http://www.globus.org/gram/rsl.html] and to submit and control the job using a specific job manager. This layer essentially extends the classes provided by Globus version 3 offering a standard interface to the Internal Layer. The Layer disconnects the architecture's main core from any third party classes, such as GT3.
More specifically, referring to
Each legacy code is deployed together with a Legacy Code Interface Description File (LCID) (
Using the GLCList Grid Service, a client can retrieve a list of available legacy code programs. A client that meets the security requirements can create a GLCProcess instances invoking the GLCProcessFactory. The factory uses the legacy code configuration file to create and set the default program environment.
A GLCProcess object represents a legacy code process in this architecture. This process cannot be submitted to any job manager if the GLCEnvironment and all the mandatory input parameters have not been created and updated. A client Grid service can submit a job using the default parameters or change any non-fixed parameter before submission. Any time that a process is submitted, a new GLCJob object is created together with a different GLCEnvironment. The process GLCEnvironment gives the maximum number of jobs that a single client can submit within a process. Each job represents a process instance.
The GLCJob uses the GLCEnvironment to create an RSL file using GLCRslFile that is used to submit the legacy code program to a specific job manager.
A Grid Service client can check the general process status or specific job behaviour using the GLCProcess instance. Also, a client can destroy a GLCProcess instance or a specific GLCJob within the process.
Thus
The Core layer has the internal administrative functions of setting the environment for a job, and for creating and handling Grid services, and processing instances.
The Back End Layer interacts with the known middleware Connectivity layer, as shown in
Urban Car Traffic Simulation
The invention described above was demonstrated by deploying a Manhattan road traffic generator, several instances of the legacy traffic simulator and a traffic density analyzer into Grid services. All these legacy codes were executed from a single workflow and the execution was visualised by a Grid portal. The workflow consists of three types of legacy code components:
1. The Manhattan legacy code is an application to generate MadCity compatible network and turn input-files. The MadCity network file is a sequence of numbers, representing a road topology, of a real road network. The number of columns, rows, unit width and unit height can be set as input parameters. The MadCity turn file, is a sequence of numbers representing the junction manoeuvres available in a given road network. Traffic light details are included in this input file.
2. MadCity [A. Gourgoulis, G. Terstyansky, P. Kacsuk, S. C. Winter, Creating Scalable Traffic Simulation on Clusters. PDP2004. Conference Proceedings of the 12th Euromicro Conference on Parallel, Distributed and Network based Processing, La Coruna, Spain, 11-13th Feb. 2004] is a discrete time-based traffic simulator. It simulates traffic on a road network and shows how individual vehicles behave on roads and at junctions. The simulator of MadCity models the movement of vehicles using the input road network file. After completing the simulation, the simulator creates a macroscopic trace file.
3. A traffic density analyzer, which compares the traffic congestion of several simulations of a given city and presents a graphical analysis.
The workflow was configured to use five GEMLCA resources each one deployed on the UK OGSA test bed sites and one server where the P-GRADE portal is deployed. The first GEMLCA resource is installed at the University of Westminster (UK) and runs the Manhattan road network generator (Job0), one traffic simulator instance (Job3) and the final traffic density analyzer (Job6). Four additional GEMLCA resources are installed at the following sites: SZTAKI (Hungary), University of Portsmouth (UK), The CCLRC Daresbury Laboratory (UK), and University of Reading (UK) where the traffic simulator is deployed. One instance of the simulator is executed on each of these sites, respectively Job1, Job2, Job5 and Job4. The MadCity network file and the turn file are used as input to each traffic simulator instance. In order to have a different behaviour in each of these instances, each one was set with different initial number of cars per street junction, one of the input parameter of the program. The output file of each traffic simulation is used as input file to the Traffic density analyzer. The described workflow was successfully created and executed by the Grid portal installed at the University of Westminster.