JSC-BIO-2710

Data Intensive Computing for Applied Bioinformatics

Table Of Contents

Previous topic

Welcome to JSC-BIO-2710

Next topic

Week One 17-Jan-12

This Page

Syllabus Spring 2012

Data Intensive Computing for Applied Bioinformatics

JSC BIO-2710-J01

Spring 2012

Class & Instructor Information

Instructor James Vincent
Teaching Assistant Colin Delaney cdelaney2@mail.smcvt.edu
Computing Professional Patrick Clemins
Class T & Th 8:30-9:45
Room WLLL 215
Office Hours TBD
Office Bentley 330
Email James.Vincent@jsc.edu

Prerequisites and Textbooks

There are no course prerequisites for this class. Students are assumed to have no prior experience with computing or bioinformatics.

It is expected that students will have general familiarity with the use of a personal computer, such as copying and pasting text, opening and closing windows and saving and moving files.

There are no printed textbooks for this course. We will use online materials as needed. Most material will be handed out in class.

Several of the resources we will use are:

UNIX Tutorial:

http://www.ee.surrey.ac.uk/Teaching/Unix/

Python programming:

http://www.openbookproject.net/thinkcs/python/english2e/

http://docs.python.org/tutorial/

http://learnpythonthehardway.org/book/

Office Hours

Office hours will be held online with the Teaching Assistant. Additional hours may be available in person. Details will be given in class.

Course Description

Much of modern science is carried out with large data sets that require significant compute resources to analyze. Students will receive an introduction to this method of science research. This course will introduce students to bioinformatics and the basics of using remote computers to carry out bioinformatics tasks. The skills acquired will apply equally to many areas of research and scientific fields.

This course is especially targeted to early undergraduates in order to introduce these aspects of modern science research at an early stage in the science education program.

During this course students will learn what the field of bioinformatics is, how it is integral to modern life sciences research and how bioinformatics research is performed. Students will complete basic bioinformatics tasks first using web based tools and then using remote computing resources.

The use of remote computers requires learning the basics of the Unix operating system. In addition, studenst will learn very basic programming skills using the python programming language.

All subjects will be introduced assuming no prior knowledge. Successful learning will depend primarily on completing exercises that are designed to provide hands on practice. These tasks will not be difficult but will require steady work and attention to detail. Completion of both in class group exercises and assigned homework exercises will be critical to success in this course.

Course Objectives:

By the end of the course you should be able to: * login to a remote compute cluster, * create, save, delete and move files on a remote cluster, * write shell scripts to automate tasks on a remote computer, * complete DNA sequence comparisons using BLAST, * write basic python language programs, * analyze the results of BLAST jobs using a python program.

Weekly Schedule

The pace of material covered and the order of presentation will be determined in the beginning of the course and adjusted throughout the course based on the ability and performance of the class.

Homework Assignments

Homework will consist of practice exercises and will be given every class period. Learning by doing is critical in this course. Assignments are designed to provide practice exercises for every day outside of class. Students are strongly encouraged to follow the schedule of exercises and practice each day.

Quizzes

Quizzes will be given once weekly and will constitute the majority of the final class grade. Each quiz will be similar in nature to homework exercises assigned during the previous week. Students are encouraged to work together for better understanding of material but all quizzes and tests will require individual work.

Class Project

A single class project will be given at the start of the course. Students will work in teams of three or two (depending on course enrollment). All students will complete the same project.

Project Description

Students will determine the species composition of bacterial communities present in water samples taken from a blue-green algae bloom in Lake Champlain. Data will be provided in the form of DNA sequences from these samples. Using the skills learned in class, students will transfer the data sets to remote compute clusters, run DNA sequence analyses on these data sets and summarize the results using computer programs written for this task.

Evaluation

Final grades will be determined using the grading criteria outlined below.

Assignment Explanation Points Total
15 Quizzes Weekly 20 300
30 Homeworks Each class 20 600
1 Team Project     100
Total class points     1000

Grading Scale

A+ 98-100% B- 80-82% D 63-66%
A 93-97% C+ 77-79% D- 60-62%
A- 90-92% C 73-76% F below 60%
B+ 87-89% C- 70-72%    
B 83-86% D+ 67-69%