Note
Learning Goals:
Quiz 3
Introduction To Python
format of python script:
#!/usr/bin/env python print "Hello World"comments, header:
#!/usr/bin/env python """ Author: James Vincent Date: 07-Feb-12 This program prints Hello World. """ # print today's message print "Hello World"variable naming: myVar or my_var:
#!/usr/bin/env python """ Author: James Vincent Date: 07-Feb-12 This program prints Hello World. """ thisMessage = "Hello World" print thisMessage that_message = "Hello World" print that_messageindentation is meaningful:
#!/usr/bin/env python """ Author: James Vincent Date: 07-Feb-12 This program prints Hello World. """ thisMessage = "Hello World" print thisMessage # this will fail that_message = "Hello World" print that_messageHomework review
- identify unknown sequence - two hits at 100% ??
BLAST
- meaning of e-value
- what if we make up our own sequence?
- how does changing e-value affect results?
Homework Directories
create homework, quiz, project directories in home directory
all homework goes in its own subdirectory of homework:
homework/week4/Tues homework/week4/Thurs homework/week5/Tues homework/week5/Thurs
Reading
- http://learnpythonthehardway.org/book/ Exercises 5-10 and 13 (skip 11,12)
Exercises
- complete Exercises 5-10 and 13 in the python reading above
Turn In
- make sure you have a directory called homework in your home directory
- make subdirectories under homework for each week and day
- turn in completed exercises from the reading above
- include a descriptive header (in comments ) to every python program you write
- write a shell script to run BLAST on the sequence from Homework5 (Thursday, last week) against the 16SMicrobial database (just like the last homework)
- read the BLAST help ( -h and –help) to find output format options
- make the output of the BLAST job in hit table format
- find the option for setting e-value
- write a second shell script run BLAST again but with evalue set to 0.000001
Note
Learning Goals:
Quiz 4
BLASTN revisited
blast programs are in /mnt/blast/ncbi-blast-2.2.25+/bin on the AWS server
download 16S database: ftp.ncbi.nih.gov/blast/db/16SMicrobial.tar.gz:
(create $HOME/blast/databases if you don't already have it ) cd ~/blast/datbases ftp ftp.ncbi.nih.gov cd blast/db get 16SMicrobial.tar.gzuncompress database:
tar -zxf 16SMicrobial.tar.gzuse blastdbcmd -db 16Smicrobial -entry all to get all fasta sequences:
/mnt/blast/ncbi-blast-2.2.25+/bin/blastdbcmd -db 16SMicrobial -entry all > 16SMicrobial.facollect three sequences from the 16SMicrobial.fa file:
head -300 16SMicrobial.fa > testThree.fa edit testThree.fa so it contains three complete sequencesexecute blastn -h to get help, find outfmt option:
/mnt/blast/ncbi-blast-2.2.25+/bin/blastn -help | less use /outfmt within less to find word outfmtexecute blastn:
/mnt/blast/ncbi-blast-2.2.25+/bin/blastn -db 16SMicrobial -query testThree.faBLAST ASN output format
execute blastn again but this time use BLAST archive ASN format -outfmt 11 and an output file name:
/mnt/blast/ncbi-blast-2.2.25+/bin/blastn -db 16SMicrobial -query testThree.fa -outfmt 11 -out testThree.fa.blast.asnReformat BLAST ASN output format
Use testThree.fa.blast.asn outpfile to generate a different output format:
/mnt/blast/ncbi-blast-2.2.25+/bin/blast_formatter -archive testThree.fa.blast.asn -outfmt 7Put commands in a shell script
Use a variable for blast programs:
#!/bin/bash BLASTN=/mnt/blast/ncbi-blast-2.2.25+/bin/blastn BLASTFORMATTER=/mnt/blast/ncbi-blast-2.2.25+/bin/blast_formatter DB=$HOME/blast/databases/16SMicrobial QUERY=testThree.fa OUTFILE=$QUERY.blast.asn # /mnt/blast/ncbi-blast-2.2.25+/bin/blastn -db 16SMicrobial -query testThree.fa -outfmt 11 -out testThree.fa.blast.asn echo "Running BLASTN" echo "query: $QUERY" echo "db: $DB" $BLASTN -db $DB -query $QUERY -outfmt 11 -out $OUTFILE echo "Finished BLASTN"Parsing BLAST output with python
Reading
http://learnpythonthehardway.org/book/ Exercises 15,16,17 (go through 11,12 if too hard)
- http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=ProgSelectionGuide
- Read through sections 1,2,3 - just review for now, don’t memorize anything
Exercises
- complete Exercises 15,16,17 in the python reading above
Turn In
turn in python exercises 15,16,17
put them in the proper homework directory in your home on the AWS server (for week 4, Thursday)
write a shell script called week4_Thurs.sh:
use varaiables to hold the name and full path of the blastn program, query file and database create a single query file containing the two sequences below run blastn on the query file use the 16SMicrobial database make the output ASN format reformat the output using blast_formatter command to give hit table format
Query sequences:
>gi|313761029|gb|GU197655.1| Anabaena bergii CHAB1385 16S ribosomal RNA gene, partial sequence
GGGTGAGTAACGCGTAAGAATCTACCTTCAGGTTGGGGACAACCACTGGAAACGGTGGCTAATACCGAAT
GTGCCGAGAGGTGAAAGGCTTGCTGCCTGAAGAAGAGCTTGCGTCTGATTAGCTAGTTGGTGGGGTAAGA
GCCTACCAAGGCGACGATCAGTAGCTGGTCTGAGAGGATGATCAGCCACACTGGGACTGAGACACGGCCC
AGACTCCTACGGGAGGCAGCAGTGGGGAATTTTCCGCAATGGGCGAAAGCCTGACGGAGCAATACCGCGT
GAGGGAGGAAGGCTCTTGGGTTGTAAACCTCTTTTCTCAGGGAAGAAGACAATGACGGTACCTGAGGAAT
AAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATGCAAGCGTTATCCGGAATGATTGG
GCGTAAAGGGTCCGCAGGTGGTAGTGTAAGTCTGCTGTTAAAGAGTCACGCTCAACGTGATCAAAGCAGT
GGAAACTACACAACTAGAGTACGGTAGGGGCAGAAGGAATTCCTGGTGTAGCGGTGAAATGCGTAGATAT
CAGGAAGAACACCGGTGGCGAAAGCGTTCTGCTAGACCTGTACTGACACTGAGGGACGAAAGCTAGGGGA
GCGAATGGGATTAGATACCCCAGTAGTCCTAGCCGTAAACGATGGATACTAGGTGTGGCTTGTATCGACC
CGAGCCGTACCGTAGCTAACGCGTTAAGTATCCCGCCTGGGGAGTACGCACGCAAGTGTGAAACTCAAAG
GAATTGACGGGGGCCCGCACAAGCGGTGGAGTATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCA
AGGCTTGACATGTCGCGAATCTCGATGAAAGTTGAGAGTGCCTTCGGGAACGCGAACACAGGTGGTGCAT
GGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCGTTTTTAGTT
GCCAGCATTAAGTTGGGCACTCTAGAGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAA
GTCAGCATGCCCCTTACGCCTTGGGCTACACACGTACTACAATGCTCCGGACAAAGGGCAGCTACACAGC
GATGTGATGCAAATCTCATAAACCGGAGCTCAGTTCAGATCGAAGGCTGCAACTCGCCTTCGTGAAGGAG
GAATCGCTAGTAATTGCAGGTCAGCATACTGCAGTGAATTCGTTCCCGGGCCTTGTACACACCGCCCGTC
ACACCATGGAAGTTGGTCACGCCCGAAGTCA
>gi|374092814|gb|JQ237773.1| Anabaena tenericaulis 08-10 16S ribosomal RNA gene, partial sequence
GACGGGTGAGTAACGCGTAAGAATCTACCTTCAGGTTGGGGACAACCACTGGAAACGGTGGCTAATACCC
AATGTGCCGAGAGGTGAAAGGCTTGCTGCCTGAAGAAGAGCTTGCGTCTGATTAGCTAGTTGGTGGGGTA
AGAGCCTACCAAGGCGACGATCAGTAGCTGGTCTGAGAGGATGATCAGCCACACTGGGACTGAGACACGG
CCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTCCGCAATGGGCGAAAGCCTGACGGAGCAATACCG
CGTGAGGGAGGAAGGCTCTTGGGTTGTAAACCTCTTTTCTCAGGGAAGAACAAAATGACGGTACCTGAGG
AATAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATGCAAGCGTTATCCGGAATGAT
TGGGCGTAAAGGGTCCGCAGGTGGCATTGTAAGTCTGCTGTTAAAGAGTTTGGCTCAACCAAATAAAAGC
AGTGGAAACTACAAAGCTAGAGTGTGGTCGGGGCAGAGGGAATTCCTGGTGTAGCGGTGAAATGCGTAGA
TATCAGGAAGAACACCGGTGGCGAAGGCGCTCTGCTAGGCCAAGACTGACACTGAGGGACGAAAGCTAGG
GGAGCGAATGGGATTAGATACCCCAGTAGTCCTAGCCGTAAACGATGGATACTAGGCGTAGCTCGTATCG
ACCCGAGCTGTGCCGTAGCTAACGCGTTAAGTATCCCGCCTGGGGAGTACGCAGGCAACTGTGAAACTCA
AAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGTATGTGGTTTAATTCGATGCAACGCGAAGAACCTTA
CCAAGGCTTGACATGTCACGAATTCCGTTGAAAGATGGAAGTGCCTTCGGGAGCGTGAACACAGGTGGTG
CATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCGTTTTTA
GTTGCCAGCATTAAGTTGGGCACTCTAGAGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGT
CAAGTCAGCATGCCCCTTACGTCTTGGGCTACACACGTACTACAATGCTACGGACAAAGGGCAGCTACAC
AGCGATGTGATGCGAATCTCATAAACCGTAGCTCAGTTCAGATCGAAGGCTGCAACTCGCCTTCGTGAAG
GAGGAATCGCTAGTAATTGCAGGTCAGCATACTGCAGTGAATTCGTTCCCGGGCCTTGTACACACCGCCC
GTCACACCATGGAAGTTGGTCACGCCCGAAGTCGTTACCCCAACCGCAAGGAGGGGGATGCCTAAGGTAG
GACTGATGACTGGGGTGAAGTCGTAACAAGGTAGCCGTACCGGAAGGTGTGGCTGGATCACCTCCTTTT