RAL Grid User Guide for LHCb

Introduction
The Globus toolkit (version 1.1.3) is installed on 3 platforms at RAL:

Firewalls
The CSF front-end (csflnx01) is visible to external users via ssh/telnet. However, to connect to heplnx2 and heplnx3 you have to connect through an intermediate machine which is not behind the RAL firewall. The current choice is:

Globus Certificates
Before you can start work with Globus, you need to obtain a Globus Certificate. Full instructions can be found at the Globus Project web site at 
http://www.globus.org. A good starting point is the Globus Quick Start Guide which contains most of the essential user documentation.

Basically, you need to:

You should eventually receive an email from the Globus certification authority which contains your certificate.  Then, you need to:

You can test your Globus setup by issuing the command:

>globus-setup-test

but see "Known Problems" below.

Starting a Globus Session
Before you can start work, you have to obtain a Globus proxy which gives you authentication for 12 hours. You will be prompted for your pass-phrase as follows:

>grid-proxy-init
  Enter PEM pass phrase:
  ..+++++
  ................. +++++

You can erase your proxy (it will automatically disappear after 12 hours) using the command:

>grid-proxy-destroy

Submitting Work
Here we assume that you are sitting on heplnx2 or heplnx3 and are submitting work to RAL-CSF.

An interactive job can be run on CSF using the command:

>globus-job-run csflnx01.rl.ac.uk  /bin/echo "Hello World"

As this runs on the CSF front-end, it is best not to use this for heavy work for which the PBS batch system (which runs on 120 batch machines) is more appropriate. Here is an example of how to submit a script (in your home CSF directory) to run on PBS:

>globus-job-submit csflnx01.rl.ac.uk/jobmanager-pbs   /home/csf/gpatrick/myjobs/sicmcv233.job

Once submitted, you should get a response like:

https://csflnx01.rl.ac.uk:3546/8600/966337733/

With this link, you can query your job using the command:

>globus-job-status https://csflnx01.rl.ac.uk:3546/8600/966337733/

You can retrieve your output via the command:

>globus-job-get-output https://csflnx01.rl.ac.uk:3546/8600/966337733/

In principle, you should be able to retrieve the cached output whilst your batch job is actually running, but this does not appear to be working at the moment.

Known Problems
1) globus-setup-test  may return some error messages. This is apparently a known bug in the Globus software. The system version of the command:
/opt/globus/sbin/globus-setup-test
appears to work fine.

2) When submitting jobs, you may get the error message "GRAM Job submission failed because data transfer to the server failed (error code 10)". On previous occasions this has been indicative of an error in the gridmap file, so it is worth asking the system administrator to check your entry before looking further afield for the problem. Alternatively, you can try inspecting the gridmap file yourself in /opt/globus/etc/grid-mapfile.


Please send any corrections/additions to: g.n.patrick@rl.ac.uk

Last Modified: 16/08/00 14:27