JSC-BIO-2710

Data Intensive Computing for Applied Bioinformatics

Table Of Contents

Previous topic

Week One 17-Jan-12

Next topic

Week Three 31-Jan-12

This Page

Week Two 24-Jan-12

Tuesday

Note

Learning Goals:

  • Transfer files with ftp
  • Use some handy UNIX commands: grep, date, echo
  • Login to remote computer
  • Edit a file with vi editor

Lecture

Homework grading:
 

Show work on laptops:

Directory structure from Fig. 1 of PLoS paper

Renamed yeast directory to human

Downloaded CHR_MT fasta file into human directory

Show number of lines in hs_alt_HuRef_chr21.fa

Do homework assignment to demonstrate:
 

bash file completion

ls with -R

Overview of ftp, client server protocols:
 
Transfer files with ftp:
 

Walk through download of CHR_MT from NCBI.

View files, search for text in files:
 

Use cat to send contents of file to screen.

Use less to look at contents of file.

Use / to find things with less

Use grep to find string in fasta file.

How many lines contain AAAAAAA?

Read grep man page to determine how to get just a count

Use / again in man page to find word ‘count’

Login to remote computer:
 

ssh

Login to EC2 instance: ec2-23-20-18-242.compute-1.amazonaws.com

recreate msms directory from PLoS paper

log out, log back in

Intro to vi:

vi vimtutor.txt

ESC :q to quit

Homework

Reading

Exercises

Edit a file with vi editor:
 
  • Login to EC2 instance: ec2-23-20-18-242.compute-1.amazonaws.com

  • copy the file /jscbio2710/vimtutor.txt

  • open the file with vi: vi vimtutor.txt

    Anytime the document refers to the command vimtutor it is the same as: vi vimtutor.txt

  • follow and complete all of Lesson 1 and Lesson 2

  • you can copy the file again from /jscbio2710/vimtutor.txt if you need to start over or just practice some more

Turn In

  • Log in to ec2-23-20-18-242.compute-1.amazonaws.com and complete Lesson 1 and Lesson 2 of the vimtutor.txt document. Leave the file in your home directory.
  • On the EC2 machine, copy the file /jscbio2710/holmes.txt to your home directory. Edit the file with vi and correct the various spelling mistakes or missing letters. Leave the file in your home directory.
  • On the EC2 machine in your home directory use vi to create a new file named week-2-1.txt. In this file write a few sentences about what we have learned so far. Leave this file in your home directory.
  • The three files mentioned above will be copied out of your home directory at 8:30am on Thursday morning, the latest homework can be turned in.
Sign up for an XSEDE account:
 

https://portal.xsede.org/

At the bottom of the sign in box (“Enter the Portal”), below the SIGN IN button, there is a link to create an account. Use this link to create a new account. Send email to me (James.Vincent@jsc.edu ) when you have completed this. I will verify through XSEDE.



Thursday

Note

Learning Goals:

  • Create shell scripts
  • Use environment variables in a shell script
  • Run and debug shell scripts

Lecture

What will be covered in class:

Homework grading

Quiz 2

Review of vi:

 vi filename  <ENTER>  start vi

 <ESC>  :q!  <ENTER>  quit without saving changes
 <ESC>  :wq  <ENTER>  quit and save changes.

 <ESC> back to command mode
 <ESC> will cancel an unwanted and partially completed command.

 u   undo last action
 U   undo all changes to line

 move cursor:  h (left)   j (down)   k (up)   l (right)
 0    start of line
 $    end of line
 gg   start of file
 G    end of file
 4G   4th line in file
 15G  15th line in file

 i   insert text
 a   append at next character
 A   append at end of line

 x   delete character
 dw  delete word
 d$  delete to the end of line
 dd  delete line

 repeat a motion, prepend with a number:   2w
 Examples:

 5j    move down 5 lines
 3k    move up three lines
 23l   move 23 characters right
 32h   move 32 characters left

      operator   [number]   motion

      operator - what to do, such as  d  for delete
      [number] - optional count to repeat the motion
      motion   - moves over the text to operate on, such as  w (word),
                 $ (to the end of line), etc.

p   put a line
c   change characters
    ce change to end of word
    c$ change to end of line
r  replace character
R  replace until out

Shell scripts:

#!/bin/bash

echo "Hello World"
echo "Today is: "
  • The echo, date commands
  • Make a bash shell script
  • Changing permissions on bash script to make it executable
  • Running the script
  • Introducing errors on purpose
  • Using variables

Class Project

Homework

Reading

Exercises

  • Complete all exercises in the Unix Tutorial (parts 3,4,5) above.
  • Make sure you can do the exercises quickly, repeat as needed
Turn In

You will create three shell scripts on the AWS server. Leave them in a directory called homework within your home directory.

  • Create a directory called homework in your home directory on the AWS server we have been using:

    ec2-23-20-18-242.compute-1.amazonaws.com

  • Create a shell script call homework-3a.sh in your homework directory.

  • Make the script a bash shell script

  • Add commands to the script to recreate the directory structure from the PLoS paper Figure 1 given as homework last week (Week 1, Thursday homework).

  • Create another bash script called homework-3b.sh.

  • Add command to this script to list all the files in the /tmp directory and count how many files there are.

  • Add echo and date commands to create a nicely formatted output report that includes your name, the date, the purpose of this script and finally the output of the commands themselves.

  • Create a bash script called homework-3c.sh.

  • If you don’t already have it, retreive another copy of human chromosoe 21 from ftp.ncbi.nih.gov.

  • Write commands in your shell script to list the size of this file. Make sure the chromosome file is in the homework directory also.

  • Add commands to this script to count the number of lines that contain the string ‘AAAAAAAAAA’

  • Use a variable at the top of your script to hold the name of the chromosome fasta file. Use this variable in the commands above instead of the file name itself.