Note
Learning Goals:
Texas Advanced Computing Center: TACC
lonestar.tacc.teragrid.org
Log in to the TACC lonestar cluster lonestar.tacc.teragrid.org:
You should have received login details from XSEDE for your new account.
jjv5$ ssh tg801771@lonestar.tacc.teragrid.org
Make sure we are using the bash shell:
login1$ echo $SHELL
/bin/bash
# If needed we can change the defualt shell to bash:
login1$ chsh -l
/bin/sh
/bin/bash
/sbin/nologin
/bin/tcsh
/bin/csh
/bin/ksh
/bin/zsh
Recreate directory structure
Important
All files should be placed in $WORK directory
Create directory in $WORK:
login2$ cd $WORK
login2$ mkdir quiz homework projects
login2$ ls
homework projects quiz
Create any other directories as needed
Transfer files from AWS EC2 server to lonestar
Open a second terminal window:
# Log in to the EC2 server
$ ssh ec2-23-20-18-242.compute-1.amazonaws.com
jjv5@ec2-23-20-18-242.compute-1.amazonaws.com's password:
$ cd lectures/
$ ls
week5
$ cd week5/
$ ls
Thurs Tues
$ cd Thurs/
$ ls
# use sftp to connect to lonestar
$ sftp tg801771@lonestar.tacc.teragrid.org
Connecting to lonestar.tacc.teragrid.org...
The authenticity of host 'lonestar.tacc.teragrid.org (129.114.53.21)' can't be established.
RSA key fingerprint is 5c:36:42:99:aa:2d:52:58:70:3a:20:c2:3a:33:e4:2f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'lonestar.tacc.teragrid.org,129.114.53.21' (RSA) to the list of known hosts.
Password:
# transfer files as needed
sftp> cd lectures
sftp> cd week5
sftp> cd Thurs
sftp> lls
4 example2.py example4.py myNumbers.txt runBlast.sh week5.fa.blast.asn
example1.py example3.py example5.py parseBlast.py week5.fa
sftp> put runBlast.sh
Uploading runBlast.sh to /home1/00921/tg801771/lectures/week5/Thurs/runBlast.sh
runBlast.sh 100% 815 0.8KB/s 00:00
sftp>
Use scp to transfer whole directories
Note
ftp (sftp) clients generally do not have a recursive option. It is difficult to transfer entire directories with an interactive ftp client.
Other methods include making a single tar file containing all files or using a transfer method that does support recursion.
wget, curl and scp support recursion.
For moving large files, Globus Online is preferred: https://www.globusonline.org/
Secure copy (scp) can recursively copy whole directories:
ip138067:~ jjv5$ ssh tg801771@lonestar.tacc.teragrid.org
Password:
Last login: Tue Mar 6 03:51:47 2012 from ip138067.uvm.edu
------------------------------------------------------------------------------
Welcome to the Lonestar4 Westmere/QDR IB Linux Cluster
Texas Advanced Computing Center, The University of Texas at Austin
------------------------ Disk quotas for user tg801771 ------------------------
| Disk Usage (GB) Limit %Used File Usage Limit %Used |
| /home1 1.1 1.1 98.11 1300 1001000 0.13 |
| /work 40.4 250.0 16.15 58255 500000 11.65 |
-------------------------------------------------------------------------------
login1$
login1$ cd $WORK
login1$ scp -r jjv5@ec2-23-20-18-242.compute-1.amazonaws.com:homework .
jjv5@ec2-23-20-18-242.compute-1.amazonaws.com's password:
Warning
scp will overwrite files by default, without warning
scp can be used to transfer files in either direction:
scp [[user@]host1:]file1 [...] [user@]host2:]file2
From this host, directory mydir, to other host:
scp -r mydir user@otherhost:/tmp
From remote host, directory mydir, to here:
scp -r user@otherhost:mydir .
Create a job script
Create the script runHello.sh shown below:
#!/bin/bash
#$ -pe 1way 12 # 12 cores per node - must take them all
#$ -q development # Queue name
#$ -N helloWorld
#$ -A TG-MCB120034
#$ -V # inherit submission env
#$ -j y # combine stderr & stdout into stdout
#$ -o $JOB_NAME.o$JOB_ID # Name of the output file (eg. myMPI.oJobID)
#$ -l h_rt=00:05:00 # Run time (hh:mm:ss)
#$ -M jjv5.jjv5@gmail.com
#$ -m bea
echo "Hello, I am running"
date
hostname
Submit the job to the development queue
The queue is specifiec in the job script itself:
qsub runHello.sh
Monitor the job with th qstat command:
login2$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
479531 0.00000 helloWorld tg801771 qw 02/28/2012 05:10:32 12
Reading
Exercises
Turn In
transfer file from AWS EC2 server to lonestar
create a shell script to run on lonestar using qsub that does the following
- be sure to use the proper qsub options and resource specifications in your script
- use the development queue to make sure the job runs properly
- when you are sure it runs correctly, change the queue name to ‘normal’
- use qstat to monitor how long it takes before your job runs in the ‘normal’ queue
- leave the script in your $WORK/week_8/Tues homework folder on lonestar
- use week5.fa file (from AWS EC2 server) as input query file
- use 16SMicrobial database as the database
- leave the script and output in your $WORK/week_8/Tues homework folder on lonestar
Note
Learning Goals:
Create a job script
Add basic qsub parameters to an otherwise empty script:
#!/bin/bash
#$ -V # inherit shell environment
#$ -l h_rt=00:05:00 # wall time limit
#$ -q development # run in dev q
#$ -pe 1way 12
#$ -A TG-MCB120034
#$ -N Hello
#$ -cwd
#$ -j y
#$ -M jjv5.jjv5@gmail.com # Mail address
#$ -m bea # send mail when job starts, stops or aborts
#module load blast
echo "Hello"
Warning
qsub recognizes #$ as meaningful. Make sure your commented lines do not begin with #$. For example: #$BLASTN -db ..... will cause qsub to interpret the line as an option string and thus fail. Put a space after the # to correct: # $BLASTN -db ...
Add comments describing tasks and variables needed:
#!/bin/bash
#$ -V # inherit shell environment
#$ -l h_rt=00:05:00 # wall time limit
#$ -q development # run in dev q
#$ -pe 1way 12
#$ -A TG-MCB120034
#$ -N Hello
#$ -cwd
#$ -j y
#------------------------
#
# James Vincent
# March 8, 2012
#
# Run blast on week5.fa vs 16SMicrobial database
# Reformat output to include Query seq-id, subject seq-id, score and e-value
#
#------------------------
# BLAST programs and variables
# TACC lonestar uses module system to provide blast
module load blast
# Database
DB=$WORK/JSCBIO2710/blast/databases/16SMicrobial
# Query
QUERY=week5.fa
OUTFILE=$QUERY.blast.asn
# BLAST output format: 11 is ASN, 6 is table no header
OUTFMT=11
# BLAST programs loaded by module command
BLASTN=blastn
BLAST_FORMATTER=blast_formatter
BLASTDBCMD=blastdbcmd
# Run blast
# $BLASTN -db $DB -query $QUERY -outfmt $OUTFMT -out $OUTFILE
# Reformat ASN to hit custom hit table
# $BLAST_FORMATTER -archive $OUTFILE -outfmt "6 qseqid sseqid evalue bitscore" -out $OUTFILE.table
# Parse BLAST output with python program to get best hits
# myParser.py $OUTFILE.table
echo "Hello"
Create python script to parse BAST table output:
#!/usr/bin/env python
"""
James Vincent
Mar 8 , 2012
parseBlast.py
Open a text file
loop over lines
split lines into fields
Sum numbers from certain field
"""
import sys
# Get file name
myInfileName = sys.argv[1]
infile = open(myInfileName)
mySum = 0.0
myCount = 0
# loop over each line in the file
for thisLine in infile.readlines():
# BLAST input file has hit lines like this:
# fmt "6 qseqid sseqid evalue bitscore"
# 1 gi|219856848|ref|NR_024667.1| 0.0 2551
myFields = thisLine.strip().split()
thisScore = int(myFields[3])
# Accumulate scores greater than 3
if thisScore > 2600:
# accumulate scores
mySum = mySum + thisScore
# count number of scores matching
myCount = myCount + 1
# Print sum, count and average
print "Sum is: ",mySum
print "Count is: ",myCount
print "Average is: ",mySum/myCount
Create function to return score:
#!/usr/bin/env python
"""
James Vincent
Mar 8 , 2012
parseBlast.py
Open a text file
loop over lines
split lines into fields
Sum numbers from certain field
"""
import sys
def getScore(blastLine):
""" parse blast output line and return score """
# BLAST input file has hit lines like this:
# fmt "6 qseqid sseqid evalue bitscore"
# 1 gi|219856848|ref|NR_024667.1| 0.0 2551
myFields = blastLine.strip().split()
thisScore = int(myFields[3])
return thisScore
# Get file name
myInfileName = sys.argv[1]
infile = open(myInfileName)
mySum = 0.0
myCount = 0
# loop over each line in the file
for thisLine in infile.readlines():
thisScore = getScore(thisLine)
# Accumulate scores greater than 3
if thisScore > 2600:
# accumulate scores
mySum = mySum + thisScore
# count number of scores matching
myCount = myCount + 1
# Print sum, count and average
print "Sum is: ",mySum
print "Count is: ",myCount
print "Average is: ",mySum/myCount
Reading
Exercises
Turn In