Contents
See also
Powered by
|
OpenPBS Support
JavaParty does support running applications within a batch system
like OpenPBS. The support consists basically of three
components: a batch file creation script, a batch file execution
script and the JavaParty environment itself. The following
describes briefly how you submit jobs to OpenPBS, and how the
internal scripts work to allow you to adjust them to your
needs.
Requirements
JavaParty assumes the following commands are available and
reachable from your path.
- qsub
-
Is part of OpenPBS and submits a job to the queue.
- ssh
-
Secure shell. This is used internally to spawn the distributed
runtime environment on your cluster when the batch system starts
your job.
Make sure, you can login from each cluster node to each other
without typing a password, otherwise your batch job will fail,
because the distributed runtime environment can not be started. You
can achieve that by using an appropriate .shosts or
/etc/ssh/shosts.equiv file. Please consult man ssh
for details.
Quick Tour
In the following it is assumed that your application classes are
compiled to a directory /home/user/classes, your main
class is called mypackage.MainClass, and you are using the
tcsh shell.
Log on the front-end machine of your cluster, where OpenPBS is
installed.
Set your CLASSPATH environment variable to your
application class path.
setenv CLASSPATH /home/user/classes
Submit your job using the script jpsub that can be
found in the bin/ directory of the JavaParty
distribution.
jpsub -np mypackage.MainClass
If everything went fine, you will find files containing the
standard output and standard error of your job in the directory
where you submitted the job.
standard output: jpsub.o
standard error: jpsub.e
Batch system internals
If something did not work as expected, you may want to consult this
section to adjust the JavaParty batch system support to your
needs. In the following the components of the batch system support
are explained:
- jpsub
-
Submits a JavaParty class to the batch system for
execution. After parsing its arguments, it invokes
qsub -l nodes=
from OpenPBS and passes an batch script that is created on the
fly. This batch script restores some environment variables and calls
jpq for actually executing the application under control of
the batch system. The created batch file looks like the following:
#!/bin/tcsh
#
# Restore environment variables
#
setenv CLASSPATH
setenv LD_LIBRARY_PATH
#
# Invoke jpq
#
jpq
- jpq
-
Executes a JavaParty class under control of the batch
system. It expects the environment variables PBS_NODEFILE
and PBS_JOBID to be set accordingly. PBS_NODEFILE
must point to a file containing a list of hosts that should be used
for the computation. This file is created by OpenPBS at the time
the job is stared. PBS_JOBID is set to an arbitrary string
that serves as key to identify all parts of the distributed runtime
environment that belong to this program execution.
jpq starts a runtime manager, a JavaParty virtual machine
and the main JavaParty class on the local node and logs into all
other hosts specified in PBS_NODEFILE using ssh
and starts a JavaParty virtual machine there.
After the main class terminates, the distributed environment is
shut down automatically and the batch job is finished. To learn more
about how to set up a JavaParty runtime environment for a single
application execution using a node file, please look in the
jpq script. Basically the following commands are executed:
On the master node:
javaparty \
rm -host -port 1099 -code \
-passive -nodefile \
vm -host -port 1099 -code \
-nodename \
exec -host -port 1099 -code \
-killonexit
On each slave node:
javaparty \
vm -host -port 1099 -code -nodename
|