JSC-BIO-2710

Data Intensive Computing for Applied Bioinformatics

Table Of Contents

Previous topic

Week Four 07-Feb-12

This Page

Week Five 14-Feb-12

Tuesday

Note

Learning Goals:

  • BLAST ASN formatting
  • open files in python
  • parse BLAST output with python
  • for loop
  • split strings

Lecture

BLAST ASN format

write a shell script called week5_Tues.sh:

run BLASTN vs 16SMicrobial database
use file  /jscbio2710/week5.fa
make the output  ASN format
| reformat the output using blast_formatter
| command to give hit table format
| without headers (-outfmt 6)

completed script:

#!/bin/bash

BLASTN=/mnt/blast/ncbi-blast-2.2.25+/bin/blastn
BLASTFORMATTER=/mnt/blast/ncbi-blast-2.2.25+/bin/blast_formatter
DB=$HOME/blast/databases/16SMicrobial
QUERY=week5.fa
OUTFILE=$QUERY.blast.asn

# /mnt/blast/ncbi-blast-2.2.25+/bin/blastn -db 16SMicrobial -query testThree.fa  -outfmt 11  -out testThree.fa.blast.asn

echo "Running BLASTN"
echo "query: $QUERY"
echo "db: $DB"
$BLASTN -db $DB -query $QUERY -outfmt 11 -out $OUTFILE
echo "Finished BLASTN"

$BLASTFORMATTER -archive $OUTFILE -outfmt 6 -out OUTFILE.table

Parsing BLAST output with python

for loops

arrays

reading files line by line

write a python program that reads blast output from table format:

#!/usr/bin/env python

"""

James Vincent
Feb. 14, 2012

Read a BLAST output file in table format
Print only certain columns from the hit table

Table format from outfmt 6 should have these fields:
query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score

"""

import sys

# get blast file name from command line
blastFileName = sys.argv[1]

# open the input file (blast file)
blastFile = open(blastFileName)

# loop over each line in the file
for thisLine in blastFile.readlines():
   lineParts = thisLine.strip().split('\t')
   print '%s\t%s\t%5.1f\t%d'% (lineParts[0],lineParts[1],float(lineParts[2]),int(lineParts[11]))

Homework

Reading

Exercises

  • complete Exercises 18,19 in the python reading above

Turn In

  • turn in python exercises 18,19

  • put them in the proper homework directory in your home on the AWS server ( week 5, Tuesday)

  • write a shell script to run BLAST on the file week5.fa from today’s class vs 16SMicrobial

  • store BLAST output in ASN format

  • reformat the ASN blast output to hit table with only these fields:

    Query Seq-id, Subject Seq-id, score, evalue

  • write a python program that parses the reformatted BLAST output table

  • make the program print the average score from all hits

Thursday

Note

Learning Goals:

  • for loops, lists again
  • boolean expressions
  • conditional statements
  • if statement
  • BLAST with python program analysis

Lecture

Example 1

for loops:

#!/usr/bin/env python

"""
James Vincent
Feb 16, 2012
Week 5 - Thurs

example1.py

Loop over items in a list

"""

myList = ['a', 'b', 'c', 'd', 'e']

for myItem in myList:
   print myItem

output:

a
b
c
d
e

Example 2

lists and slice notation:

#!/usr/bin/env python

"""
James Vincent
Feb 16, 2012
Week 5 - Thurs

example2.py

Print items from a list with slice notation

"""

myList = ['a', 'b', 'c', 'd', 'e']

print "The first item in the list: ",myList[0]

print "The second item in the list: ",myList[1]
print

# Slice notation [n:m] means from the n-th item up to the m-th item,
# but excluding the last item at m

print "Slice [0:2] returns items at "
print "index 0 and index 1 but not at index 2 : ",myList[0:2]
print

# Another way of stating slice notation is [start:end]
# The item and start will be returned up to the end-1 item

print "Slice [2:5] returns items at "
print "index 2, 3 and 4 but not 5: ",myList[2:5]

output:

The first item in the list:  a
The second item in the list:  b

Slice [0:2] returns items at
index 0 and index 1 but not at index 2 :  ['a', 'b']

Slice [2:5] returns items at
index 2, 3 and 4 but not 5:  ['c', 'd', 'e']

Example 3

for loops and boolean expressions:

#!/usr/bin/env python

"""
James Vincent
Feb 16, 2012
Week 5 - Thurs

example3.py

Conditionally print items from a list

"""

myList = ['a', 'b', 'c', 'd', 'e']

# loop over items in the list
for thisItem in myList:

   if thisItem == 'b':
      print "This item should be b: ", thisItem



myNumberList = [1,2,3,4,5,6,7,8,9,10]

for myNum in myNumberList:

   if myNum > 4:
      print "I am > 4 :", myNum

Example 4

loop over lines in a file:

Create a file called myNumbers.txt:
a    1
b    2
c    3
d    4
e    5


"""
James Vincent
Feb 16, 2012
Week 5 - Thurs

example4.py

Open a text file of numbers
Loop over each line in the file
Conditionally print numbers from each line

"""

import sys

# Get input file name from command line argument:
# ./example4.py infile.txt

myInfileName = sys.argv[1]

infile = open(myInfileName)

# loop over each line in the file
for thisLine in infile.readlines():
   print thisLine


ip138067:Thurs jjv5$ ./example4.py myNumbers.txt
a    1

b    2

c    3

d    4

e    5

Example 5

loop over lines and count scores, average of items in each line:

Use same file myNumberstxt:
a    1
b    2
c    3
d    4
e    5


#!/usr/bin/env python

"""
James Vincent
Feb 16, 2012
Week 5 - Thurs

example4.py

Open a text file of numbers
Loop over each line in the file
Compute the average of values in the second
field that are greater than 3

"""

import sys

# Get input file name from command line argument:
# ./example4.py infile.txt

myInfileName = sys.argv[1]

infile = open(myInfileName)

mySum = 0.0
myCount = 0

# loop over each line in the file
for thisLine in infile.readlines():

   # Split the ine into a list of fields
   myFields = thisLine.strip().split()

   # Stor ethe second field as an integer value
   thisScore = int(myFields[1])

   # If the score is greater than 3, accumulate in our sum
   if thisScore > 3:
      mySum = mySum + thisScore
      myCount = myCount + 1

# After reading all lines print the sum and average
print "Sum of all scores > 3:  ", mySum
print "Number of scores > 3:   ",myCount
print "Average of scores > 3:  ",mySum/myCount

output:

ip138067:Thurs jjv5$ ./example5.py myNumbers.txt
Sum of all scores > 3:   9.0
Number of scores > 3:    2
Average of scores > 3:   4.5

Homework

Reading

  • We will begin using an additional python book:

http://www.openbookproject.net/thinkcs/python/english2e/index.html

  • Read the first three sections. These are mostly review of what we have already covered
  • Section 3 contains new material on functions that we have not covered in class

http://www.openbookproject.net/thinkcs/python/english2e/ch01.html http://www.openbookproject.net/thinkcs/python/english2e/ch02.html http://www.openbookproject.net/thinkcs/python/english2e/ch03.html

Exercises

  • complete the short exercises in each section of the reading above
  • if you did not complete the homework from Tues go back and finish it
  • contact the class TA for help if needed

Turn In

  • turn in the exercises at the end of Section 3 only
  • put them in the proper homework directory in your home on the AWS server ( week5/Thurs)
  • the exercises include 3 numbered questions all using a file named tryme3.py
  • name the files tryme3-1.py, tryme3-2.py, tryme3-3.py for each of the three questions
  • the fourth exercise asks you to create a file named import_test.py. Use that file name.