Note
Learning Goals:
Class Project files
The class project is detailed here:
Download pyfasta
Download pyfasta here:
Install pyfasta:
login2$ cd $WORK/Software
login2$ wget http://pypi.python.org/packages/source/p/pyfasta/pyfasta-0.4.5.tar.gz
--2012-04-17 04:38:48-- http://pypi.python.org/packages/source/p/pyfasta/pyfasta-0.4.5.tar.gz
Resolving pypi.python.org... 82.94.164.168, 2001:888:2000:d::a8
Connecting to pypi.python.org|82.94.164.168|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 15150 (15K) [application/octet-stream]
Saving to: `pyfasta-0.4.5.tar.gz'
2012-04-17 04:38:54 (3.40 KB/s) - `pyfasta-0.4.5.tar.gz' saved [15150/15150]
login2$ tar zxf pyfasta-0.4.5.tar.gz
login2$ cd pyfasta-0.4.5
login2$ module load python
login2$ python ./setup.py install --user
running install
Installing pyfasta script to /home1/00921/tg801771/.local/bin
Modify PATH in login environment:
PATH=$PATH:$HOME/.local/bin
Split large fasta files with pyfasta
pyfasta usage examples:
Utilities for working with fasta files
Count number of fasta sequences in a file:
login2$ egrep -c '^>' 5990.fa
8079
Split large fasta files into multiple smaller fasta files:
login2$ ls
5990.fa
login2$ pyfasta split -n 5 5990.fa
creating new files:
5990.0.fa
5990.1.fa
5990.2.fa
5990.3.fa
5990.4.fa
login2$ ls
5990.0.fa 5990.1.fa 5990.2.fa 5990.3.fa 5990.4.fa 5990.fa 5990.fa.flat 5990.fa.gdx
login2$ rm 5990.fa.flat 5990.fa.gdx
login2$ egrep -c '^>' *.fa
5990.0.fa:1614
5990.1.fa:1620
5990.2.fa:1614
5990.3.fa:1616
5990.4.fa:1615
5990.fa:8079
Concatenate files (for example, multiple output alignment files):
login2$ ls -1 alignments*
alignments.5990.0.out.tab
alignments.5990.1.out.tab
alignments.5990.2.out.tab
alignments.5990.3.out.tab
alignments.5990.4.out.tab
alignments.5990.5.out.tab
login2$ cat alignments.5990.0.out.tab alignments.5990.1.out.tab alignments.5990.2.out.tab alignments.5990.3.out.tab alignments.5990.4.out.tab alignments.5990.5.out.tab >alignments.5990.all.tab
login2$
OR
login2$ for i in 0 1 2 3 4 5
> do
> cat alignments.5990.$i.out.tab >> alignments.5990.all.out.tab
> done
login2$ ls -a alignments.5990.*
alignments.5990.0.out.tab alignments.5990.2.out.tab alignments.5990.4.out.tab alignments.5990.all.out.tab
alignments.5990.1.out.tab alignments.5990.3.out.tab alignments.5990.5.out.tab
login2$
Redirect review:
The symbol > on the command line means redirect stdout to a file.
The file to the right of the > symbol will be created.
If a file of that name already exists it will be overwritten.
login2$ wc -l 5990.fa > counts
login2$ cat counts
16158 5990.fa
login2$ wc -l 5990.fa > counts
login2$ cat counts
16158 5990.fa
The symbol >> means redirect stdout and append to a file if it already exists
login2$ wc -l 5990.fa >> counts
login2$ cat counts
16158 5990.fa
16158 5990.fa
login2$ wc -l 5990.fa >> counts
login2$ cat counts
16158 5990.fa
16158 5990.fa
16158 5990.fa
Note
Learning Goals:
Globus Online transfer tool
NSF XSEDE Allocation Process
Login to portal
Click Allocations Tab then Subit Request submenu
Click big button “Click to Enter or View a Request”
Amazon Web Services Elastic Compute Cloud - EC2