JSC-BIO-2710

Data Intensive Computing for Applied Bioinformatics

Table Of Contents

Previous topic

Week Twelve 10-Apr-12

Next topic

Week Fourteen 24-Apr-12

This Page

Week Thirteen 17-Apr-12

Tuesday

Note

Learning Goals:

  • Write and use Bash for loops
  • Write and use simple Bash functions
  • Create shell script to manage multiple qsub jobs


 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 

Video



Lecture


Class Project files

The class project is detailed here:

Download pyfasta

Install pyfasta:

login2$ cd $WORK/Software
login2$ wget http://pypi.python.org/packages/source/p/pyfasta/pyfasta-0.4.5.tar.gz

--2012-04-17 04:38:48--  http://pypi.python.org/packages/source/p/pyfasta/pyfasta-0.4.5.tar.gz
Resolving pypi.python.org... 82.94.164.168, 2001:888:2000:d::a8
Connecting to pypi.python.org|82.94.164.168|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 15150 (15K) [application/octet-stream]
Saving to: `pyfasta-0.4.5.tar.gz'

2012-04-17 04:38:54 (3.40 KB/s) - `pyfasta-0.4.5.tar.gz' saved [15150/15150]

login2$ tar zxf pyfasta-0.4.5.tar.gz
login2$ cd pyfasta-0.4.5
login2$ module load python
login2$ python ./setup.py install --user
running install
Installing pyfasta script to /home1/00921/tg801771/.local/bin

Modify PATH in login environment:

PATH=$PATH:$HOME/.local/bin

Split large fasta files with pyfasta


Utilities for working with fasta files

Count number of fasta sequences in a file:

login2$ egrep -c '^>' 5990.fa
8079

Split large fasta files into multiple smaller fasta files:

login2$ ls
5990.fa
login2$ pyfasta split -n 5 5990.fa
creating new files:
5990.0.fa
5990.1.fa
5990.2.fa
5990.3.fa
5990.4.fa
login2$ ls
5990.0.fa  5990.1.fa  5990.2.fa  5990.3.fa  5990.4.fa  5990.fa        5990.fa.flat  5990.fa.gdx
login2$ rm 5990.fa.flat  5990.fa.gdx

login2$ egrep -c '^>' *.fa
5990.0.fa:1614
5990.1.fa:1620
5990.2.fa:1614
5990.3.fa:1616
5990.4.fa:1615
5990.fa:8079

Concatenate files (for example, multiple output alignment files):

login2$ ls -1 alignments*
alignments.5990.0.out.tab
alignments.5990.1.out.tab
alignments.5990.2.out.tab
alignments.5990.3.out.tab
alignments.5990.4.out.tab
alignments.5990.5.out.tab

login2$ cat alignments.5990.0.out.tab alignments.5990.1.out.tab alignments.5990.2.out.tab alignments.5990.3.out.tab alignments.5990.4.out.tab alignments.5990.5.out.tab  >alignments.5990.all.tab
login2$

 OR


 login2$ for i in 0 1 2 3 4 5
 > do
 > cat alignments.5990.$i.out.tab >> alignments.5990.all.out.tab
 > done
 login2$ ls -a alignments.5990.*
 alignments.5990.0.out.tab  alignments.5990.2.out.tab  alignments.5990.4.out.tab  alignments.5990.all.out.tab
 alignments.5990.1.out.tab  alignments.5990.3.out.tab  alignments.5990.5.out.tab
 login2$

Redirect review:

The symbol > on the command line means redirect stdout to a file.
The file to the right of the > symbol will be created.
If a file of that name already exists it will be overwritten.

login2$ wc -l 5990.fa > counts
login2$ cat counts
16158 5990.fa
login2$ wc -l 5990.fa > counts
login2$ cat counts
16158 5990.fa

The symbol >> means redirect stdout and append to a file if it already exists
login2$ wc -l 5990.fa >> counts
login2$ cat counts
16158 5990.fa
16158 5990.fa
login2$ wc -l 5990.fa >> counts
login2$ cat counts
16158 5990.fa
16158 5990.fa
16158 5990.fa

Homework

Reading

Exercises

Turn In:

|

Thursday

Note

Learning Goals:

  • Use Globus Online transfer tool
  • Review XSEDE allocation process
  • Learn how to start AWS EC2 instance

Video


Lecture

Globus Online transfer tool

NSF XSEDE Allocation Process

https://portal.xsede.org/

Login to portal

Click Allocations Tab then Subit Request submenu

Click big button “Click to Enter or View a Request”

Amazon Web Services Elastic Compute Cloud - EC2