may the force be with you: Harvesting and reusing idle compute cycles

Monday, July 04, 2005

Harvesting and reusing idle compute cycles

How United Devices Grid MP helps this happen at the UT Grid project

Level: Introductory

Ashok Adiga (adiga@tacc.utexas.edu), Research Scientist, Texas Advanced Computing Center
Nina Wilner (new_nina@tacc.utexas.edu), Grid IT Architect, IBM

28 Jun 2005

More on the University of Texas grid project's mission to integrate numerous, diverse resources into a comprehensive campus cyber-infrastructure for research and education. In this article, the authors examine the idea of harvesting unused cycles from compute resources to provide this aggregate power for compute-intensive work. They will also place this concept in context by offering an overview of a popular commercial software package designed to help achieve this task: the United Devices Grid MP platform.

Several early grid computing projects were focused on the idea of harvesting unused cycles from compute resources and providing this aggregated computing power for work that comprised lots of tasks -- from hundreds to millions -- that could be executed individually.

Today, there are several commercial and open source grid computing software packages that support this form of distributed computing on the desktop or other nondedicated computing resources. In this article, we will take a look at a popular commercial software package designed to help execute this function: the United Devices Grid MP platform.

Grid MP has several interesting and unique features, including:

Support for heterogeneous desktops/nodes
Nonintrusive client execution
Tolerance to failures of desktop resources

We will provide an overview of the Grid MP features designed for harvesting idle cycles from nondedicated resources, and we'll describe the types of applications that can effectively use the type of "desktop grid" we're discussing.

Introducing Grid MP
The United Devices Grid MP platform is a commercially available package that can be used to access cycles on large numbers of desktop PCs and workstations, thereby providing a large-scale computing platform suitable for certain classes of high-performance computations. The motivations for using a desktop grid are twofold:

To be able to utilize idle cycles on resources that would otherwise be wasted and, thus, maximize investments in IT resources
To run applications across potentially thousands of desktop PCs concurrently, exceeding the typical sizes of dedicated cluster computing resources and enabling very high-throughput computing

The latter motivation was the driving force behind many successful projects, such as SETI@home and Grid.org, to solve very large compute problems using techniques that would not have been feasible on existing cluster resources due to the cost of the required compute cycles.

Currently, Grid MP is used as a component of the UT Grid campus grid project at the University of Texas at Austin. Although the Grid MP platform supports dedicated resources, such as high-end servers and clusters, the UT Grid deployment now only includes only desktop and workstation resources typically located in student laboratories or in faculty/staff offices. In this article, we will focus only on these nondedicated types of resources and on features in the platform that make it suitable for this purpose.

Some relevant key features of the Grid MP platform are its:

Lightweight, nonintrusive client with the ability to monitor load thresholds
Secure sandboxing and encryption of data to protect the desktop from the application and vice-versa
Smart scheduling to match jobs to resources and handle client failures
Tools to make it easy for local administrators to manage and set usage policies for their desktop machines

Grid MP provides users with several options for running applications. For running a serial batch job, users can use a command-line submission utility that executes on the local desktop and submits the job to the Grid MP server for execution on a remote desktop. Grid MP also provides support for batch parallel jobs (MPI jobs, for example) which can be run across machines on the desktop grid. (MPI stands for message passing interface, designed for high performance on massively parallel machines and on workstation clusters.) Since desktops are usually not interconnected with high-speed switches, as is the case with typical clusters, only MPI jobs that are latency-tolerant can effectively be run on the platform.

In Part 1 of the series
Part 1 of this series, Grid in action, was titled "Developing a wide grid" and introduced the vision of the University of Texas grid project: to integrate the numerous and diverse computational, visualization, storage, data, and instrument and device resources at the University of Texas into a comprehensive campus cyber-infrastructure for research and education.

It covered the goals of the project, outlined the project deployment layout, detailed some key conceptual requirements and software components of the project, explained the reasons for choosing the first target discipline and target services to integrate into the grid, illuminated that timely access to resources was the prime benefit that grid conferred to users, and unfolded a roadmap that showed us a plan for integrating more disciplines and services into the campus-wide grid.

The most effective use of the Grid MP system is obtained when running data parallel applications across thousands of desktops on the grid. (Data-parallelism can be generally defined as the concurrent application of a computation on all items in a collection of data, yielding a degree of parallelism that typically scales with problem size.) The platform supports coarse-grained parallelism for large jobs that can be decomposed into several independent pieces, which are then scheduled collectively by the Grid MP server to run on desktops that meet the minimum specified resource requirements. Enabling data-parallel applications on Grid MP typically requires the development of application scripts to manage the decomposition of the problem into independently schedulable jobs, then managing the subsequent merging of the independent results to create a single result. Application developers can register applications to be hosted on Grid MP and create application services that can then be used repeatedly by application users.

The description of the installation and configuration of Grid MP is beyond the scope of this article; this is typically done by United Devices as part of their services. However, we will offer some configuration notes that can be useful when setting up the install.

Let's look at the Grid MP architecture.

Grid MP architecture
The Grid MP system consists of a set of servers providing grid services for administrators, developers, and users, and a collection of compute resources consisting of desktops, workstations, high-end servers, and clusters. Figure 1 provides an overview of the Grid MP architecture.

Figure 1. The United Devices Grid MP architecture

The Grid MP servers provide services for managing resources like compute devices, data, applications, and user workload. Access to these services is provided through a Web services interface called the MP Grid Services Interface (MGSI). Complete documentation of this interface can be found in the MGSI reference guide distributed with the Grid MP Software Developers Toolkit. The Grid MP servers are responsible for:

Authentication of users and devices
Management of data and metadata
Job and workload management
Scheduling to match job components with available resources

Grid MP resources run a lightweight MP Agent that provides controlled access to the desktop while enforcing administrator-specified usage policies. The agent provides an environment where it can run applications received from the Grid MP dispatch services in a nonintrusive, secure sandbox environment. The current version of Grid MP supports Linux®, AIX, Windows®, Mac, and Solaris clients.

The platform provides several options for accessing and managing Grid MP resources. Grid MP has a browser-based Web console that can be used by administrators, developers, and application users to manage all Grid MP resources. Additionally, the architecture supports the deployment of application services and command-line tools that can provide customized functionality for grid users.

Desktop resources can be grouped, and individual groups can be managed by local administrators to set usage policies. Usage policies that can be specified include:

Times when the resources can be used
How much disk space can be used
Priorities for groups of users using the resources

Application services and command-line tools can also be provided to enable application users to submit jobs.

Grid MP has several features to support the efficient execution of large data-parallel applications hosted on the Grid MP servers, including application and data caching at desktop nodes, and affinity scheduling to utilize cached files and reduce network traffic overhead. (In this context, affinity scheduling is the process by which the scheduler uses information about previous executions of a computation on a resource and attempts to schedule new computations on the resource that can reuse some of the data or executable code.) Hosted applications can also include the specification of executables for heterogeneous architectures (like Windows, Linux, or Mac). If multiple executables are registered, it is possible that different components of a data-parallel application could be executed on different architectures.

Grid MP client resources are nondedicated, unreliable resources since they can be shut off by the desktop owner at any time. The system, therefore, has the ability to detect client failures and reschedule jobs running on a resource when a failure occurs.

Running an application on Grid MP
To submit applications from a user's desktop, the Grid MP Software Developer's Kit (SDK) must be downloaded and installed, preferably in a location that can be shared by all users on the desktop machine. The SDK is a package containing documentation, libraries, and tools to help develop and test applications to run on the Grid MP platform.

The user must create a file called .uduserconf in his home directory containing information about the local Grid MP installation and the SDK. A sample of this file is distributed with the SDK as uduserconf.sample (shown in Listing 1).

Listing 1. Sample user configuration file .uduserconf



MGSI_FILESVR_URL   =  https://frio-file-svc.tacc.utexas.edu:443/mgsi/filesvr.cgi
MGSI_XMLRPC_URL    =  https://frio-web-svc.tacc.utexas.edu:443/mgsi/rpc_xmlrpc.fcgi
MGSI_SOAP_URL      =  https://frio-web-svc.tacc.utexas.edu:443/mgsi/rpc_soap.fcgi
MGSI_USERNAME      =  
MGSI_PASSWORD      =  

BUILDMODULE_PATH   =  /usr/local/UDsdk_v4.1/tools/build/buildmodule
BUILDPACKAGE_PATH  =  /usr/local/UDsdk_v4.1/tools/build/buildpkg
LOADER_PATH        =  /usr/local/UDsdk_v4.1/tools/build/loader
MPBATCH_PATH       =  /usr/local/UDsdk_v4.1/tools/mpbatch

MPI_LISTEN_PORT    =  12345-12360

The MGSI_URL parameters specify the Grid MP services that will be used to submit jobs; the specific URLs can be obtained from the local Grid MP administrator. The configuration file also contains login and password information, as well as path variables pointing to utilities in the SDK package. The user can choose to leave the password definition unspecified in the configuration file and specify it each time he uses an application utility.

Application support
The platform supports three types of user jobs:

Batch jobs -- Users can use the mpsub command to run the batch jobs in which a single executable is forwarded by the Grid MP system to run on a single remote desktop.
MPI jobs -- Users can submit MPI jobs using the ud_mpirun command. The system selects a set of desktop machines and coordinates the initiation of the MPI application across this set of machines. Currently, MPICH and LAM/MPI are supported.
Data-parallel jobs -- The platform supports coarse-grained parallelism for large jobs that can be decomposed into several independent pieces. Developers can create application scripts to work in conjunction with application executables to implement a data-parallel solution. These applications can then be hosted on the Grid MP and provided as application services available to users.

Batch and MPI jobs can be run on the Grid MP platform without requiring code changes. Data-parallel applications require the creation of application scripts to manage the division of the large problem into independent parts that can be submitted to the Grid MP system for parallel execution, as well as to retrieve and merge the results from the parallel executions to obtain a single result.

As with any batch system, the overhead involved in moving data and executables to and from a remote resource must be considered when evaluating the suitability of an application for the platform. Especially in the case of data-parallel applications -- which typically consist of thousands of independently schedulable pieces of work, or work units -- the benefit of running the work units concurrently must be compared with the overhead needed to move required data to the remote desktops. The ideal data-parallel applications are those that are compute-intensive, but have relatively small input and output data files.

Running batch jobs
To run a batch job, you use the mpsub command, which can be found in the SDK in the tools/mpbatch directory. Several flavors of mpsub are distributed in the SDK for the supported platforms, along with a Perl version that can be used on any desktop that has Perl installed.

Documentation on the usage of mpsub is available in the Applications Users QuickGuide distributed with the SDK or by using the -help option included with mpsub. This is an example of a batch submission:

mpsub -input file1 -output stdout -block myprogname

The Grid MP system would forward the executable file myprogname and the input file file1 to a remote machine where the executable would be invoked using the command-line string supplied in . The mpsub command would block and wait until the completion of remote execution, then the resulting standard output (stdout) would be returned to the submitting desktop. If the mpsub command is run asynchronously (without the -block option), it returns a job ID which can then be used with the mpresult utility to retrieve the results once the remote execution has completed.

Users can submit MPI jobs using ud_mpirun or mpsub. ud_mpirun is a customized mpirun created for Grid MP; it currently supports MPICH V1.25. For MPI applications that have been compiled using other versions of MPI (such as LAM/MPI), users should use mpsub.

Running data-parallel jobs
To set up an application as a data-parallel application, the application developer needs to:

Do a feasibility analysis to see if the application is suitable to be run as a data-parallel application on the Grid MP platform.
Create an application script that takes the users' input, splits the computation into independent parts, and uploads the parts to the Grid MP system.
Retrieve the individual results from the Grid MP system once the independent work units have completed, then merge the results to obtain a single result.

The typical usage model for data-parallel applications is for an application developer to develop the scripts and create a "hosted" application on Grid MP that can then be made available to application users. Since the MGSI interface is implemented as a Web service interface, the application scripts can be developed using any language for which a SOAP client library exists.

For an application to be a suitable candidate to be run as a data-parallel application on the Grid MP platform, it should have certain properties. The application should be decomposable into parts that can be executed independently of each other (in which each part has a high compute-to-communication ratio). The overhead included with moving the executable and input and output data to a remote desktop machine should be offset by the computation requiring a relatively long execution time.

There are several types of applications usually well suited to this data-parallel solution approach:

Applications that use Monte Carlo methods are typically easy to decompose into independent parts (Monte Carlo methods are algorithms for solving various kinds of computational problems by using random or pseudo-random numbers, instead of deterministic algorithms).
Large database searches can be parallelized by searching parts of the database in parallel.
Evaluation of a population and creation of the next population generation in iterative genetic algorithms can be parallelized.
Exhaustive search techniques can utilize the full aggregated computing power of the grid resources.
Parametric design studies can be parallelized by independently evaluating a model for different combinations of parameter values.

The Java™ code snippet in Listing 2 is part of an application script that creates a job on the Grid MP system. The job descriptor is first updated with values for the application ID, state, and priority. This is followed by a MGSI call, createJob, which causes a new job to be created on the Grid MP system and returns the ID of the new job.

Listing 2. Sample Java code to submit a job to Grid MP



// Create Job
job.setApplication_Gid(appGID);
job.setState_Id(1);  // Set Job state to active
job.setPriority(10);  // Low priority
String jID = udmSession.getUdMgsi().createJob(udmSession.getAuthkey(), job);

The SDK contains application script examples in C++, Perl, and the Java language. In general, these sample scripts can be reused with modifications to create application scripts for other applications.

In conclusion
In this article, we've provided an overview of the United Devices Grid MP platform as a tool that supports computing efforts using otherwise idle cycles on nondedicated resources. Although the platform supports batch serial jobs and MPI jobs through a simple command-line interface, the most suitable applications for this platform are data-parallel applications that can be decomposed into many related, but independent tasks that can be run concurrently on the client machines.

We've also offered criteria for choosing applications that will benefit the most from using this type of setup, since this usage model requires the creation of application scripts to split the job into the independent parts and to recombine the results of the parts. It is beneficial to select applications that will be reused frequently by application users once set up for the Grid MP environment. Once the application scripts have been developed, the data-parallel applications can be hosted in the Grid MP system, making usage much easier.

Grid MP provides administration tools that make it easy for individual organizations or departments within an enterprise to configure and set policies for local desktop machines. The security and sandbox features, combined with the minimal resources required to support the platform, make Grid MP an attractive platform for enterprises and universities, benefiting both by maximizing their IT investments.

In our next installment, we'll provide a basic overview of grid meta-schedulers -- systems that balance the workload across a collection of resource managers, effectively creating a cluster of clusters or a hierarchy of schedulers. The grid meta-scheduler coordinates communication between multiple resource managers and operates on business policies and enforces service-level agreements, allowing the resource managers to efficiently manage jobs across local resources under its management. We'll also introduce some considerations for selecting an appropriate meta-scheduler for your project.

Resources

Learn more about the United Devices Grid MP Enterprise platform.
Download the United Devices Software Developers Kit (SDK).
Check out other cycle harvesting projects, such as SETI@home and Grid.org.
Learn more about the UT Grid project, led by the Texas Advanced Computing Center at the University of Texas at Austin, being conducted jointly with IBM.
"Analytics Acceleration Grid Environment" (developerWorks, November 2004), a three-part series, demonstrates cycle harvesting as a part of resource balancing, all described in the context of crafting a complete, effective grid system.
"Geographically dispersed grid" (developerWorks, November 2004), a four-part series, explains how to balance data nodes and computing nodes for the greatest productivity.
"Grid application performance -- find the sweet spot" (developerWorks, September 2004) presents a simple methodology to help you find the balance between units of work and results sets when tuning grid-enabled applications.
"Orchestrating grid workloads -- neither feast nor famine" (developerWorks, September 2004) discusses how resources can be managed into and out of a grid environment using an example infrastructure.

About the authors
Author photo

Ashok Adiga is a research scientist in the Distributed and Grid Computing Group at the Texas Advanced Computing Center at the University of Texas at Austin. He is a developer for the UT Grid campus cyber-infrastructure project, and has been responsible for deployment and support of a campus desktop grid at the university. He is involved in several software projects that develop and deploy grid middleware.

Author photo Nina Wilner is a grid IT architect at IBM in Austin. She is currently working with the University of Texas at Austin to develop a campus cyber-infrastructure project, the UT Grid. She has been with IBM since 1987, when she started working for IBM in Munich, Germany. She has a post-graduate degree in mathematics. Other focus areas are 3-D graphics, 2-D graphics and GUIs, networking and distributed computing, pSeries® and AIX®, as well as life sciences.

source:http://www-128.ibm.com/developerworks/grid/library/gr-harvest/?ca=dgr-lnxw01HarvestingGrid

# posted by dark master : 7/04/2005 10:34:00 AM

Comments: Post a Comment

<< Home

may the force be with you

Monday, July 04, 2005

Harvesting and reusing idle compute cycles

About Me

Previous Posts

Links

Archives