Students will determine the species composition of bacterial communities present in water samples taken from blue-green algae blooms in Lake Champlain and other north east lakes. Data will be provided in the form of DNA sequences from these samples. Using the skills learned in class, students will transfer the data sets to remote compute clusters, run DNA sequence analyses on these data sets and summarize the results using computer programs written for this task.
Samples from two time points will be used for the project. Each time point consists of three replicates taken at the same location, at the same time, and processed identically. The first time pooint is from Yawgoo Pond, RI, the secnd from Lake Champlain near Highgate Springs, VT. Students should devise their own questions about these samples based on the skills learned in class. A simple question might be: “Are the three replicates from each time point very similar to each other?”.
Your question should involve running BLAST on all of the sample sequence files at least once.
All files are located on lonestar.tacc.teragrid.org in:
/work/00921/tg801771/JSCBIO2710/Class_Project
There are three sets of fasta files for each location/time point (six total) and a mapping file giving details about each sample.
Files starting with 59 are from Lake Champlain. Files starting with R8 are from RI.
5989.fa
5990.fa
5991.fa
R84Y1.fa
R84Y2.fa
R84Y3.fa
mappingFile.txt
Create a directory called $WORK/Class_Project_Final on lonestar. Put all files that you will turn in into this directory.
Change the permission on that directory and everything in it after you have place all files there:
chmod -R ug+rw $WORK/Class_Project_Final
The final summary should describe the question(s) you sought to answer, your approach to the problem, a description of the workflow or process used and the results that you found. This summary should not be more than two pages. It should not include detailed source code or job scripts.