JSC-BIO-2710

Data Intensive Computing for Applied Bioinformatics

Table Of Contents

Previous topic

Week Eight 6-Mar-12

This Page

Week Nine 13-Mar-12

Tuesday

Note

Learning Goals:

  • Parse BLAST output to give only GI number
  • Use python Counter class
  • Install BioPython



 
 
 
 
 
 
 
 
 

Video



Lecture

User defined functions for parsing BLAST output:

#!/usr/bin/env python

"""

James Vincent
Mar 13 , 2012

parseBlast.py

Open a text file
loop over lines
split lines into fields
Sum numbers from certain field

"""

import sys


def getScore(blastLine):
  """ parse blast output line and return score """

  # BLAST input file has hit lines like this:
  # fmt "6 qseqid sseqid evalue bitscore"
  # 1        gi|219856848|ref|NR_024667.1|   0.0     2551
  myFields = blastLine.strip().split()

  thisScore  = float(myFields[3])

  return thisScore

def getGI(blastLine):
   """ parse blast output line and return GI of hit """

   # blast line looks like this:
   # 1        gi|219856848|ref|NR_024667.1|   0.0     2551

   # spit line into fields by white space
   myFields = blastLine.strip().split()

   # split subject (hit) ID field on |
   myIDs = myFields[1].split("|")
   thisGI= int(myIDs[1])

   return thisGI



# Get file name
myInfileName = sys.argv[1]

infile = open(myInfileName)

mySum = 0.0
myCount = 0
# loop over each line in the file
for thisLine in infile.readlines():

  thisScore = getScore(thisLine)
  thisGI = getGI(thisLine)
  print thisGI,thisScore
  # Accumulate scores greater than 3
  if thisScore > 2600:
     # accumulate scores
     mySum = mySum + thisScore
     # count number of scores matching
     myCount = myCount + 1


# Print sum, count and average
print "Sum is: ",mySum
print "Count is: ",myCount
if myCount > 0:
    print "Average is: ",mySum/myCount
else:
    print "No scores"

How to use python Counter:

#!/usr/bin/env python

"""
James Vincent
Mar 13 , 2012

myCounter.py

Demonstrate use of python Counter

http://docs.python.org/library/collections.html
"""

import sys
from collections import Counter

myCounter = Counter()
for word in ['red', 'blue', 'red', 'green', 'blue', 'blue','red', 'blue', 'red', 'green', 'blue', 'blue']:
   myCounter[word] += 1

print myCounter

print "Top two most common: ", myCounter.most_common(2)

# Sum of all values:
print sum(myCounter.values())

Install BioPython:

login1$ cd $WORK
login1$ mkdir Software
login1$ cd Software
login1$ wget http://biopython.org/DIST/biopython-1.59.tar.gz

  --2012-03-13 04:48:40--  http://biopython.org/DIST/biopython-1.59.tar.gz
  Resolving biopython.org... 208.94.50.58
  Connecting to biopython.org|208.94.50.58|:80... connected.
  HTTP request sent, awaiting response... 200 OK
  Length: 8577492 (8.2M) [application/x-gzip]
  Saving to: `biopython-1.59.tar.gz'

  100%[================================================================>] 8,577,492   1.42M/s   in 6.1s

  2012-03-13 04:48:46 (1.34 MB/s) - `biopython-1.59.tar.gz' saved [8577492/8577492]

login1$ tar zxf biopython-1.58.tar.gz
login1$ cd biopython-1.58
login1$ module load python
login1$ python ./setup.py install --user

  running install
  running build
  running build_py
  running build_ext
  running install_lib
  running install_egg_info
  Removing /home1/00921/tg801771/.local/lib/python2.7/site-packages/biopython-1.58-py2.7.egg-info
  Writing /home1/00921/tg801771/.local/lib/python2.7/site-packages/biopython-1.58-py2.7.egg-info

login1$
login1$ python
Python 2.7.1 (r271:86832, Sep 20 2011, 09:42:22)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import Entrez
>>>

Homework

Reading

Exercises

  • Follow the exercises in the reading above. Complete as many of them as you can

Turn In

Write a python program to parse BLAST table formatted output:
  • write a function to parse each line and return the GI

  • use a Counter object to count how many times each GI number was seen

  • print the top five GI numbers and their counts at the end of the program

  • use week9.fa as the query file for BLAST

  • You can download week9.fa from this link

    http://bit.ly/yoK27C
    
    You can copy it directly to your lonestar account with the wget
    program. Log in to lonestar and move to a directory where you
    would like to place the file, then call wget:
    
    wget <URL> -O <outputFileName>
    
    Example:
    
    login1$ cd /work/00921/tg801771/JSCBIO2710/lectures/week9/Thurs/
    login1$  wget http://bit.ly/yoK27C -O week9.fa