All of the above applies to well-behaved, interactive programs. However, sometimes you need to use R to analyze your data. In order to do this, you have to hardcode file names into the R script, because these scripts are not interactive. This is a royal pain. However, there is a solution that makes use of HERE documents in bash. HERE documents also exist in perl, and an online tutorial for them in bash is at http://www.tldp.org/LDP/abs/html/here-docs.html. The short of it is that a HERE document can represent a skeleton document at the end of a shell script. Let’s concoct an example. You have 100 data files, labeled data.1 to data.10. Each file contains a single column of numbers, and you want to do some calculation for each of them, using R. Let’s use a HERE document:
#$ -t 1-10
# See comment below about paths to R
if [ -e $OUTFILE ]
rm -f $OUTFILE
# Below, the phrase "EOF" marks the beginning and end of the HERE document.
# Basically, what’s going on is that we’re running R, and suppressing all of
# it’s output to STDOUT, and then redirecting whatever’s between the EOF words
# as an R script, and using variable substitution to act on the desired files.
$PATHTOR/R --quiet --no-save > /dev/null << EOF
So now you can use the cluster to analyze your data – just write the R script within the HERE document, and go from there. As I’ve only just figured this out, some caveats are necessary. If anyone experiments and figures out something neat, let me know. Be aware of the following:
1. In my limited experience, indenting is important for HERE documents. In particular, it seems that the beginning and end (i.e. both lines containing the term EOF in the above example), must be aligned with the left-hand edge of the buffer (i.e. not indented at all). So, if you use a HERE document in a conditional or control statement, be mindful of this.
2. In the mean command, I escaped the dollar sign with a backslash. In my limited experiments, both mean(x\$V1) and mean(x$V1) seem to work. However, escaping the dollar sign for the read.table command prevents the variable substitution from occurring in the shell, causing R to fail, because the input file named $INFILE cannot be found. In other words, escaping in that context causes the HERE doc to pass $INFILE as a string literal to R, rather than the value stored in the shell variable.
3. This is more useful than just array jobs on an SGE system. If you know bash well enough, you can write a shell script that takes a load of arguments, and processes them with a HERE document. This solves a major limitation with R scripts themselves. You can do the same in perl, too, on your workstation, but you must use a shell language on the cluster.
- Sent using Google Toolbar"