JSC-BIO-2710

Data Intensive Computing for Applied Bioinformatics

Table Of Contents

Previous topic

Syllabus Spring 2012

Next topic

Week One 17-Jan-12

This Page

Class Project Spring 2012

Metagenomes of Blue-Green Algae

Project Description

Students will determine the species composition of bacterial communities present in water samples taken from blue-green algae blooms in Lake Champlain and other north east lakes. Data will be provided in the form of DNA sequences from these samples. Using the skills learned in class, students will transfer the data sets to remote compute clusters, run DNA sequence analyses on these data sets and summarize the results using computer programs written for this task.

Project Goals

Samples from two time points will be used for the project. Each time point consists of three replicates taken at the same location, at the same time, and processed identically. The first time pooint is from Yawgoo Pond, RI, the secnd from Lake Champlain near Highgate Springs, VT. Students should devise their own questions about these samples based on the skills learned in class. A simple question might be: “Are the three replicates from each time point very similar to each other?”.

Your question should involve running BLAST on all of the sample sequence files at least once.

Project Files::

All files are located on lonestar.tacc.teragrid.org in:

/work/00921/tg801771/JSCBIO2710/Class_Project

There are three sets of fasta files for each location/time point (six total) and a mapping file giving details about each sample.

Files starting with 59 are from Lake Champlain. Files starting with R8 are from RI.

5989.fa

5990.fa

5991.fa

R84Y1.fa

R84Y2.fa

R84Y3.fa

mappingFile.txt

Turn In

  • Job Scripts
  • Job Run Output
  • Python Programs
  • BLAST results
  • Results of Analysis
  • Description of Workflow
  • Final Summary

Create a directory called $WORK/Class_Project_Final on lonestar. Put all files that you will turn in into this directory.

Change the permission on that directory and everything in it after you have place all files there:

chmod -R ug+rw $WORK/Class_Project_Final

The final summary should describe the question(s) you sought to answer, your approach to the problem, a description of the workflow or process used and the results that you found. This summary should not be more than two pages. It should not include detailed source code or job scripts.