Note
Learning Goals:
BLAST ASN format
write a shell script called week5_Tues.sh:
run BLASTN vs 16SMicrobial database
use file /jscbio2710/week5.fa
make the output ASN format
| reformat the output using blast_formatter
| command to give hit table format
| without headers (-outfmt 6)
completed script:
#!/bin/bash
BLASTN=/mnt/blast/ncbi-blast-2.2.25+/bin/blastn
BLASTFORMATTER=/mnt/blast/ncbi-blast-2.2.25+/bin/blast_formatter
DB=$HOME/blast/databases/16SMicrobial
QUERY=week5.fa
OUTFILE=$QUERY.blast.asn
# /mnt/blast/ncbi-blast-2.2.25+/bin/blastn -db 16SMicrobial -query testThree.fa -outfmt 11 -out testThree.fa.blast.asn
echo "Running BLASTN"
echo "query: $QUERY"
echo "db: $DB"
$BLASTN -db $DB -query $QUERY -outfmt 11 -out $OUTFILE
echo "Finished BLASTN"
$BLASTFORMATTER -archive $OUTFILE -outfmt 6 -out OUTFILE.table
Parsing BLAST output with python
for loops
arrays
reading files line by line
write a python program that reads blast output from table format:
#!/usr/bin/env python
"""
James Vincent
Feb. 14, 2012
Read a BLAST output file in table format
Print only certain columns from the hit table
Table format from outfmt 6 should have these fields:
query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
"""
import sys
# get blast file name from command line
blastFileName = sys.argv[1]
# open the input file (blast file)
blastFile = open(blastFileName)
# loop over each line in the file
for thisLine in blastFile.readlines():
lineParts = thisLine.strip().split('\t')
print '%s\t%s\t%5.1f\t%d'% (lineParts[0],lineParts[1],float(lineParts[2]),int(lineParts[11]))
Reading
Exercises
Turn In
turn in python exercises 18,19
put them in the proper homework directory in your home on the AWS server ( week 5, Tuesday)
write a shell script to run BLAST on the file week5.fa from today’s class vs 16SMicrobial
store BLAST output in ASN format
Query Seq-id, Subject Seq-id, score, evalue
write a python program that parses the reformatted BLAST output table
make the program print the average score from all hits
Note
Learning Goals:
Example 1
for loops:
#!/usr/bin/env python
"""
James Vincent
Feb 16, 2012
Week 5 - Thurs
example1.py
Loop over items in a list
"""
myList = ['a', 'b', 'c', 'd', 'e']
for myItem in myList:
print myItem
output:
a
b
c
d
e
Example 2
lists and slice notation:
#!/usr/bin/env python
"""
James Vincent
Feb 16, 2012
Week 5 - Thurs
example2.py
Print items from a list with slice notation
"""
myList = ['a', 'b', 'c', 'd', 'e']
print "The first item in the list: ",myList[0]
print "The second item in the list: ",myList[1]
print
# Slice notation [n:m] means from the n-th item up to the m-th item,
# but excluding the last item at m
print "Slice [0:2] returns items at "
print "index 0 and index 1 but not at index 2 : ",myList[0:2]
print
# Another way of stating slice notation is [start:end]
# The item and start will be returned up to the end-1 item
print "Slice [2:5] returns items at "
print "index 2, 3 and 4 but not 5: ",myList[2:5]
output:
The first item in the list: a
The second item in the list: b
Slice [0:2] returns items at
index 0 and index 1 but not at index 2 : ['a', 'b']
Slice [2:5] returns items at
index 2, 3 and 4 but not 5: ['c', 'd', 'e']
Example 3
for loops and boolean expressions:
#!/usr/bin/env python
"""
James Vincent
Feb 16, 2012
Week 5 - Thurs
example3.py
Conditionally print items from a list
"""
myList = ['a', 'b', 'c', 'd', 'e']
# loop over items in the list
for thisItem in myList:
if thisItem == 'b':
print "This item should be b: ", thisItem
myNumberList = [1,2,3,4,5,6,7,8,9,10]
for myNum in myNumberList:
if myNum > 4:
print "I am > 4 :", myNum
Example 4
loop over lines in a file:
Create a file called myNumbers.txt:
a 1
b 2
c 3
d 4
e 5
"""
James Vincent
Feb 16, 2012
Week 5 - Thurs
example4.py
Open a text file of numbers
Loop over each line in the file
Conditionally print numbers from each line
"""
import sys
# Get input file name from command line argument:
# ./example4.py infile.txt
myInfileName = sys.argv[1]
infile = open(myInfileName)
# loop over each line in the file
for thisLine in infile.readlines():
print thisLine
ip138067:Thurs jjv5$ ./example4.py myNumbers.txt
a 1
b 2
c 3
d 4
e 5
Example 5
loop over lines and count scores, average of items in each line:
Use same file myNumberstxt:
a 1
b 2
c 3
d 4
e 5
#!/usr/bin/env python
"""
James Vincent
Feb 16, 2012
Week 5 - Thurs
example4.py
Open a text file of numbers
Loop over each line in the file
Compute the average of values in the second
field that are greater than 3
"""
import sys
# Get input file name from command line argument:
# ./example4.py infile.txt
myInfileName = sys.argv[1]
infile = open(myInfileName)
mySum = 0.0
myCount = 0
# loop over each line in the file
for thisLine in infile.readlines():
# Split the ine into a list of fields
myFields = thisLine.strip().split()
# Stor ethe second field as an integer value
thisScore = int(myFields[1])
# If the score is greater than 3, accumulate in our sum
if thisScore > 3:
mySum = mySum + thisScore
myCount = myCount + 1
# After reading all lines print the sum and average
print "Sum of all scores > 3: ", mySum
print "Number of scores > 3: ",myCount
print "Average of scores > 3: ",mySum/myCount
output:
ip138067:Thurs jjv5$ ./example5.py myNumbers.txt
Sum of all scores > 3: 9.0
Number of scores > 3: 2
Average of scores > 3: 4.5
Reading
http://www.openbookproject.net/thinkcs/python/english2e/index.html
http://www.openbookproject.net/thinkcs/python/english2e/ch01.html http://www.openbookproject.net/thinkcs/python/english2e/ch02.html http://www.openbookproject.net/thinkcs/python/english2e/ch03.html
Exercises
Turn In