RAL Grid User Guide for LHCb
Introduction
The Globus toolkit (version 1.1.3) is installed on 3 platforms at RAL:
Firewalls
The CSF front-end (csflnx01) is visible to external users via ssh/telnet.
However, to connect to heplnx2 and heplnx3 you have to connect through an intermediate
machine which is not behind the RAL firewall. The current choice is:
Globus Certificates
Before you can start work with Globus, you need to obtain a Globus
Certificate. Full instructions can be found at the Globus Project web site at http://www.globus.org. A good
starting point is the Globus Quick
Start Guide which contains most of the essential user documentation.
Basically, you need to:
You should eventually receive an email from the Globus certification authority which contains your certificate. Then, you need to:
You can test your Globus setup by issuing the command:
>globus-setup-test
but see "Known Problems" below.
Starting a Globus Session
Before you can start work, you have to obtain a Globus proxy which gives you
authentication for 12 hours. You will be prompted for your pass-phrase as follows:
>grid-proxy-init
Enter PEM pass phrase:
..+++++
................. +++++
You can erase your proxy (it will automatically disappear after 12 hours) using the command:
>grid-proxy-destroy
Submitting Work
Here we assume that you are sitting on heplnx2 or heplnx3 and are submitting
work to RAL-CSF.
An interactive job can be run on CSF using the command:
>globus-job-run csflnx01.rl.ac.uk /bin/echo "Hello World"
As this runs on the CSF front-end, it is best not to use this for heavy work for which the PBS batch system (which runs on 120 batch machines) is more appropriate. Here is an example of how to submit a script (in your home CSF directory) to run on PBS:
>globus-job-submit csflnx01.rl.ac.uk/jobmanager-pbs /home/csf/gpatrick/myjobs/sicmcv233.job
Once submitted, you should get a response like:
https://csflnx01.rl.ac.uk:3546/8600/966337733/
With this link, you can query your job using the command:
>globus-job-status https://csflnx01.rl.ac.uk:3546/8600/966337733/
You can retrieve your output via the command:
>globus-job-get-output https://csflnx01.rl.ac.uk:3546/8600/966337733/
In principle, you should be able to retrieve the cached
output whilst your batch job is actually running, but this does not appear to be working
at the moment.
Known Problems
1) globus-setup-test may
return some error messages. This is apparently a known bug in the Globus software. The
system version of the command:
/opt/globus/sbin/globus-setup-test
appears to work fine.
2) When submitting jobs, you may get the error message "GRAM Job submission failed because data transfer to the server failed (error code 10)". On previous occasions this has been indicative of an error in the gridmap file, so it is worth asking the system administrator to check your entry before looking further afield for the problem. Alternatively, you can try inspecting the gridmap file yourself in /opt/globus/etc/grid-mapfile.
Please send any corrections/additions to: g.n.patrick@rl.ac.uk
Last Modified: 16/08/00 14:27