Please note that I cannot answer individual specific queries---I am not a careers adviser. I am, however, happy to tackle questions of general interest to all visitors to the site.
I consider bioinformatics to be a special kind of engineering discipline---it certainly isn't a "pure" science. It has been enormously successful in its short existence and I think its successes have been the result of a practical and rigorous approach which I hope to encourage in anyone interested in entering the field.
This document is not a scientific paper or textbook (yet). You will find blunt opinions here. If you disagree with me about any of the following please tell me. I hope to learn a lot from your inevitable and welcome criticisms.
There is certainly one sense in which I consider myself a pure scientist: I'm open to rational persuasion.
"The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information."Most biologists talk about "doing bioinformatics" when they use computers to store, retrieve, analyse or predict the composition or the structure of biomolecules. As computers become more powerful you could probably add simulate to this list of bioinformatics verbs. "Biomolecules" include your genetic material---nucleic acids---and the products of your genes: proteins. These are the concerns of "classical" bioinformatics, dealing primarily with sequence analysis.
It is a mathematically interesting property of most large biological molecules that they are polymers; ordered chains of simpler molecular modules called monomers. Think of them as beads or building blocks which, despite having different colours and shapes, all have the same thickness and the same way of connecting to one another. Each monomer molecule is of the same general class, but each kind of monomer has its own well-defined set of characteristics. Many monomer molecules can be joined together to form a single, far larger, macromolecule which has exquisitely specific informational content and/or chemical properties.
According to this scheme, the monomers in a given macromolecule of DNA or protein can be treated computationally as letters of an alphabet, put together in pre-programmed arrangements to carry messages or do work in a cell.
Computational biologists interest themselves more with evolutionary, population and theoretical biology rather than cell and molecular biomedicine. It is inevitable that molecular biology is profoundly important in computational biology, but it is certainly not what computational biology is all about (see next paragraph). In these areas of computational biology it seems that computational biologists have tended to prefer statistical models for biological phenomena over physico-chemical ones. This is often wise...
One computational biologist (Paul J Schulte) did object to the above and makes the entirely valid point that this definition derives from a popular use of the term, rather than a correct one. Paul works on water flow in plant cells. He points out that biological fluid dynamics is a field of computational biology in itself. He argues that this, and any application of computing to biology, can be described as "computational biology" (see also the "loose" definition of bioinformatics below). Where we disagree, perhaps, is in the conclusion he draws from this---which I reproduce in full:
"Computational biology is not a "field", but an "approach" involving the use of computers to study biological processes and hence it is an area as diverse as biology itself."Richard Durbin, Head of Informatics at the Wellcome Trust Sanger Institute, expressed an interesting opinion on this distinction in an interview:
"I do not think all biological computing is bioinformatics, e.g. mathematical modelling is not bioinformatics, even when connected with biology-related problems. In my opinion, bioinformatics has to do with management and the subsequent use of biological information, particular genetic information."
"Biomedical Informatics is an emerging discipline that has been defined as the study, invention, and implementation of structures and algorithms to improve communication, understanding and management of medical information."Aamir Zakaria, the author of the FAQ, emphasises that medical informatics is more concerned with structures and algorithms for the manipulation of medical data, rather than with the data itself.
This suggests that one difference between bioinformatics and medical informatics as disciplines lies with their approaches to the data; there are bioinformaticists interested in the theory behind the manipulation of that data and there are bioinformatics scientists concerned with the data itself and its biological implications. (I believe that a good bioinformatics researcher should be interested in both of these aspects of the field.)
Medical informatics, for practical reasons, is more likely to deal with data obtained at "grosser" biological levels---that is information from super-cellular systems, right up to the population level---while most bioinformatics is concerned with information about cellular and biomolecular structures and systems.
On both of these points I'd be happy for any medical informatics specialists to correct me.
"the combination of chemical synthesis, biological screening, and data-mining approaches used to guide drug discovery and development"but this, again, sounds more like a field being identified by some of its most popular (and lucrative) activities, rather than by including all the diverse studies that come under its general heading.
The story of one of the most successful drugs of all time, penicillin, seems bizarre, but the way we discover and develop drugs even now has similarities, being the result of chance, observation and a lot of slow, intensive chemistry. Until recently, drug design always seemed doomed to continue to be a labour-intensive, trial-and-error process. The possibility of using information technology, to plan intelligently and to automate processes related to the chemical synthesis of possible therapeutic compounds is very exciting for chemists and biochemists. The rewards for bringing a drug to market more rapidly are huge, so naturally this is what a lot of cheminformatics works is about.
The span of academic cheminformatics is wide and is exemplified by the interests of the cheminiformatics groups at the Centre for Molecular and Biomolecular Informatics at the University of Nijmegen in the Netherlands. These interests include:
Databases of existing sequencing data can be used to identify homologues of new molecules that have been amplified and sequenced in the lab. The property of sharing a common ancestor, homology, can be a very powerful indicator in bioinformatics (see below).
They can be assembled. Note that this is one of the occasions when the meaning of a biological term differs markedly from a computational one (see the amusing confusion over the issue at Web-based geek forum Slashdot). Computer scientists, banish from your mind any thought of assembly language. Sequencing can only be performed for relatively short stretches of a biomolecule and finished sequences are therefore prepared by arranging overlapping "reads" of monomers (single beads on a molecular chain) into a single continuous passage of "code". This is the bioinformatic sense of assembly.
They can be mapped (see note)---that is, their sequences can be parsed to find sites where so-called "restriction enzymes" will cut them.
They can be compared, usually by aligning corresponding segments and looking for matching and mismatching letters in their sequences. Genes or proteins which are sufficiently similar are likely to be related and are therefore said to be "homologous" to each other---the whole truth is rather more complicated than this. Such cousins are called "homologues".
If a homologue (a related molecule) exists then a newly discovered protein may be modelled---that is the three dimensional structure of the gene product can be predicted without doing laboratory experiments.
Bioinformatics is used in primer design. Primers are short sequences needed to make many copies of (amplify) a piece of DNA as used in PCR (the Polymerase Chain Reaction).
Bioinformatics is used to attempt to predict the function of actual gene products.
Information about the similarity, and, by implication, the relatedness of proteins is used to trace the "family trees" of different molecules through evolutionary time.
There are various other applications of computer analysis to sequence data, but, with so much raw data being generated by the Human Genome Project and other initiatives in biology, computers are presently essential for many biologists just to manage their day-to-day results
Molecular modelling / structural biology is a growing field which can be considered part of bioinformatics. There are, for example, tools which allow you (often via the Net) to make pretty good predictions of the secondary structure of proteins arising from a given amino acid sequence, often based on known "solved" structures and other sequenced molecules acquired by structural biologists.
Structural biologists use "bioinformatics" to handle the vast and complex data from X-ray crystallography, nuclear magnetic resonance (NMR) and electron microscopy investigations and create the 3-D models of molecules that seem to be everywhere in the media.
Unfortunately the word "map" is used in several different ways in biology/genetics/bioinformatics. The definition given above is the one most frequently used in this context, but a gene can be said to be "mapped" when its parent chromosome has been identified, when its physical or genetic distance from other genes is established and---less frequently---when the structure and locations of its various coding components (its "exons") are established.
What almost all bioinformatics has in common is the processing of large amounts of biologically-derived information, whether DNA sequences or breast X-rays.
From T K Attwood and D J Parry-Smith's "Introduction to Bioinformatics", Prentice-Hall 1999 [Longman Higher Education; ISBN 0582327881]:
"The term bioinformatics is used to encompass almost all computer applications in biological sciences, but was originally coined in the mid-1980s for the analysis of biological sequence data."From Mark S. Boguski's article in the "Trends Guide to Bioinformatics" Elsevier, Trends Supplement 1998 p1:
"The term `bioinformatics' is a relatively recent invention, not appearing in the literature until 1991 and then only in the context of the emergence of electronic publishing..."...However, some of my role models when I was a graduate student (Margaret O. Dayhoff, Russell F. Doolittle, Walter M. Fitch and Andrew D. McLachlan) had been building databases, developing algorithms and making biological discoveries by sequence analysis since the 1960s---long before anyone thought to label this activity with a special term (if anything it was called `molecular evolution'). Even a relatively new kid on the block, the National Center for Biotechnology Information (NCBI), is celebrating its 10th anniversary this year, having been written into existence by US Congressman Claude Pepper and President Ronald Reagan in 1988. So bioinformatics has, in fact, been in existence for more than 30 years and is now middle-aged."
A gossipy and insightful account of the race to sequence the genome can be found in "The Sequence" by Kevin Davies [Weidenfeld; ISBN 0297646982]. Matt Ridley's "Genome" [Fourth Estate; ISBN 185702835X] is both an interesting layperson's introduction to the issues raised by the bioinformatic revolution and an overview of its biology and enormous scope. If I remember rightly, Ridley's book received a slightly snooty review from Walter Bodmer. This is understandable, since his and Robin McKie's excellent "pre-genomic" guide to the Human Genome Mapping Project, "The Book of Life" [Oxford Paperbacks; ISBN 0195114876] was undeservedly in a remainders bin when I bought my copy a couple of years ago.
If you are a non-biological scientist (or a non-scientist) and are hooked by these, why not go back to the "real beginning" of the race and read James Watson's entertaining and indiscreet memoir of his and Francis Crick's determination of the structure of DNA, "The Double Helix" [Penguin; ISBN 0140268774]---now updated with an introduction by media don Steve Jones.
Nigel Barber at Peterborough Regional College in the UK recommends Gary Zweiger's "Transducing the Genome" [McGraw-Hill Professional Publishing: ISBN 0071369805]. The summary at Amazon makes it sound a tad pretentious, but all the reviews seem pretty positive so it might be worth a read.
Bioinformatics.org's very own Jeff Bizzarro recommends Dan Gusfield's "Algorithms on Strings, Trees and Sequences" [Cambridge, 1997 ISBN 0-52158-519-8], Richard Durbin, S. Eddy, A. Krogh, G. Mitchison "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids" [Cambridge, 1997 ISBN 0-52162-971-3] (which I think is one of the clearest and most comprehensive guides to alignment algorithms) and---for that full "computers-to-biology conversion"--- Geoffrey M. Cooper "The Cell: A Molecular Approach" [ASM Press, 1996 ISBN 0-87893-119-8]. Jeff Ames writes that a second edition of this book is now available [Sinauer Associates, Incorporated, 2000 ISBN 0-87893-106-6] and that this version---if you can find it in the shops---comes with a CD.
If you're coming to the subject as a computer user with a biological background, looking to exploit the many tools available, you might want to try Terry Attwood and David Parry-Smith's "Introduction to Bioinformatics" [Longman Higher Education; ISBN 0582327881], or Des Higgins and Willie Taylor's "Bioinformatics: Sequence Structure and Databanks" [Oxford University Press; ISBN 0199637903]. Bioinformatics.org also recommends Cynthia Gibas and Per Jambeck's "Developing Bioinformatics Skills" [O'Reilly, 2001 ISBN 1-56592-664-1].
Stuart Brown recommends his own book "Bioinformatics: A Biologist's Guide to Biocomputing and the Internet" [Eaton Pub Co; ISBN: 188129918X]. If he sends me a review copy I might recommend it too ;-) .
Further suggestions for this section are welcome.
Tom Smith and Don Emmeluth have produced a nice little exploration of bioinformatics using NCBI resources and tools. (I suspect that they might have a dry sense of humour too. If you visit the root page of this Web tree you will find a page of such comprehensively tasteless geekiness that you will either laugh yourself stupid or be put off bioinformatics for life.)
I recently stumbled upon a promising set of online lecture notes currently under construction by B. Steipe at the Genzentrum (Gene Center) at the Ludwig-Maximilians-Universität München (University of Munich).
C. J. Schwarz of the Department of Statistics and Acturial Science, Simon Fraser University has produced a course in "Statistics for the Life Sciences" which is accompanied by set of sound, online html handouts. They aren't the prettiest, but they'e some of the best. (Though his "paradigm of statistics" mnemonic "TRRGET" is completely inconsistent with his explanation of what the letters stand for... If anyone can enlighten me I'd be pleased to know what I'm failing to understand.)
Here is a great guide to a whole array of statistical learning/teaching resources prepared by Juha Puranen of the University of Helsinki (English).
Once you've worked your way through that you might like to see some scanning electron microscope images of some of the structures you've read about taken by members of John Heuser's lab.
Of historical interest only now, I guess, is the legendary "Pedro's Molecular Biology Search and Analysis Tools".
"...established to foster the broad bioinformatics community and the UK research community in particular. Its purpose is to facilitate the transfer of knowledge and expertise through conferences, workshops, a newsletter and the use of the world wide web. CCP11 is funded by the BBSRC and is hosted at the MRC Human Genome Mapping Project Resource Centre HGMP-RC located on the Wellcome Trust Genome Campus, Cambridge."Jennifer Steinbachs runs compbiology.org which is a general computational biology site as well as being a portal to her own work.
BioPlanet is well worth visiting,
though I have to say I have no idea who runs it or what its precise status
(commercial, personal, for-fun) as a Web site is.
This resource focuses on complete, full-time degree programmes rather than on individual study modules. Curating a list of the latter would be a full-time job. You can go to other places, however, if you are looking for short courses. Thanks to various contributors, including Wentian Li who pointed me to this list at Rockefeller which is mirrored at various other sites. And to Humberto Ortiz Zuazaga for mailing me a link to the ICSB, where you can find this list. In the UK The Bioinformatics Resource (part of the BBSRC's CCP11 project) project maintains (among many other resources) lists of (mainly) British Masters and PhDs in bioinformatics. If you have any suggestions or updates please contact me with them. You can publicize your course and offer a public service at the same time.
If you know of any other bioinformatics courses on the African continent please feel free to mail me about them.
Stanford University M.S./PhD. in BioMedical Informatics
Thanks to Momchil Georgiev for the information that the University of California at San Diego offers a Bioinformatics graduate programme and to Dana Brehm that there is now a new batchelor's program, to quote her:
"[This is an] undergraduate, interdisciplinary program for undergraduates leading to a B.S. degree. The new Bioinformatics major is offered by the Division of Biology, and the departments of Chemistry/Biochemistry, Computer Science and Engineering, and Bioengineering. A student may choose to major in Bioinformatics in any one of the four departments or division. The Division of Biology currently offers two Bioinformatics courses, and with the advent of the cross-disicplinary major, even more courses are going to be taught 2002-03 and 2003-04.".
University of California, Irvine Informatics in Biology and Medicine
David Delong wrote to me to point out that the College of Natural and Agricultural Sciences at the University of California, Riverside is developing a "Center in Genomics and Bioinformatics" which will offer a PhD curriculum in genomics and bioinformatics from academic year 2001-2002 onwards.
Catherine Velazquez says that the University of California, Santa Cruz will start a new undergraduate BS course in bioinformatics in the fall of 2001. They also have made public their proposal for an MS in Bioinformatics.
If you know of any other bioinformatics courses on the American continent please feel free to mail me about them.
According to Rahul Agrawal, the Indian Institute of Technology Delhi, New Delhi provides courses in Biochemical Engineering and Biotechnology. He adds that another branch of the Institute, IIT Kharagpur also provides various courses in this area.
There is an Advanced (Graduate) Diploma in Bioinformatics in the Bioinformatics Centre at the Jawaharlal Nehru University.
Madurai Kamaraj University in Madurai, India claims to have been the first in the country to initiate a bioinformatics programme and advanced diploma in bioinformatics at its School of Biotechnology
The University of Pune, Maharashtra offers an Advanced Diploma in Bioinformatics at the Bioinformatics Centre, , India.
Lam Ah Wah wrote to tell me that the Nanyang Technological University (NTU) starts a BioInformatics undergraduate and part-time post-graduate MSc course in Jul 2002. Be warned: their Web site has hideous frame/window based "portal" which breaks half a dozen rules of good interface design. I couldn't find pages about the actual courses---perhaps you can?
If you know of any other bioinformatics courses is Asia please feel free to mail me about them.
You can obtain a Graduate Certificate in Bioinformatics from Curtin University of Technology in Western Australia.
As of 2001 Flinders University in Adelaide offers a Batchelor's of Science in Bioinformatics.
The Biochemistry Department of La Trobe University in Victoria also offers an undergraduate course in Bioinformatics.
The University of New South Wales in Sydney offers an undergraduate program in Bioinformatics.
Sydney University in New South Wales offers a Batchelor's of Science in Bioinformatics.
If you know of any other bioinformatics courses is Australasia please feel free to mail me about them.
The Department of Engineering at the Katholieke Universitiet of Leuvan offers Master of Bioinformatics degree.
The Universität Tübingen (University of Tübingen) also offers Bioinformatik. Here are their own Frequently Asked Questions (in German only) about studying bioinformatics there.
Apart from this, adds Daniel Nilsson, there is only one other "pure" bioinformatics course in Sweden: the MSc in Bioinformatics Engineering in Uppsala. There are also opportunities to study bioinformatics on the "normal" biotech courses in GothemburgLinköping and Umeå. The former, The School of Mathematical and Computing Sciences at Chalmers offers an MSc. programme in bioinformatics---thanks to Samuel Hargestam.
Two pioneering university institutions are Birkbeck College in the University of London, a British centre with a proud tradition in educating working and/or mature students to the highest academic standards and a superb X-ray crystallography group and York University whose Department of Biology offers Masters courses and PhDs in both computational biology and biomolecular science. Other universities have bioinformatics groups actively involved in the teaching of their biology/molecular biology undergraduate courses, including, for example, courses at Leeds University where there are also MRes studentships available. Manchester University also teaches bioinformatics to its undergraduates as well as offering a taught MSc. course in the subject. University College London (UCL) also offers a final year undergraduate course: "Bioinformatics:Genes, Proteins and Computers".
Imperial College recently displaced Oxford (at least temporarily) from second place of various "charts" of the "best" universities in the UK. [Disclaimer: I was a graduate student at Imperial and teach on two graduate courses there.] From next year the Department of Biochemistry at Imperial is offering a new MSc in Computational Genetics and Bioinformatics. (Oxford itself hasn't yet deigned to recognize the field with a degree course. [Disclaimer: I was an undergraduate there.])
Thank you to David Parkinson for pointing out to me that for the past two years Sheffield Hallam University has offered an MSc/PGDip in Bioinformatics at its Graduate School in Science, Engineering and Technology.
Other UK Bioinformatics courses include:
the various graduate programmes offered by the University of Exeter MSc/MRes in Bioinformatics.
University of Glasgow MRes in Bioinformatics.
University of Liverpool M.Sc., Postgraduate Diploma and Postgraduate Certificate in Biosystems & Informatics
University of NottinghamMaster of Philosophy in Molecular Biology with Bioinformatics
In April 2002 City University's Bioinformatics group is moving---along with its PhDs---to the University of Glasgow Department of Computer Science. . Thanks to Will Bachelor for alerting me to the existence of this group.
If you know of any other bioinformatics courses in Europe please feel
free to mail
me about them.
This section is opinionated, partly because there are people in the field, both computer scientists and biologists, who I would love to provoke (or convert). If you are a newcomer, and especially if you come from one of bioinformatics component pure disciplines, I hope my ranted warnings will help you to avoid the mistakes of your predecessors---and I write as one of the mistaken. David S. Roos put it well in his recent review in the journal Science:
"Lack of familiarity with the intellectual questions that motivate each side can also lead to misunderstandings. For example, writing a computer program that assembles overlapping expressed sequence tags (EST) sequences may be of great importance to the biologist without breaking any new ground in computer science. Similarly, proving that it is impossible to determine a globally optimal phylogenetic tree under certain conditions may constitute a significant finding in computer science, while being of little practical use to the biologist."
If you are a high school student / sixth former, think about taking an interdisciplinary computational biology or bioinformatics bachelor's degree of the sort offered at, for example, Manchester University in the UK or UPenn in the States. Don't worry if you can't find a place on such a course or there isn't one nearby; perhaps the best way to approach this subject is from two sides. Do a batchelor's degree in one area while taking a healthy interest in the other---or (if you can afford to) complement a first degree in one part of the discipline with a second degree in the second.
If you already have a degree in a biological discipline there are similar Master's courses---both interdisciplinary (e.g. Birkbeck's in London) and conversion type courses---for biologists or others to learn computer science, for example.
If you are currently doing a computer science or biology PhD, try to take advantage of the opportunity to take courses in the "other" discipline.
Of all the computing courses available it is most important that you have a proper introduction to the UNIX operating system. Most current bioinformatics software (especially the free stuff) runs on "open" platforms like UNIX and the Web. UNIX is elegant, powerful and frustrating. Master it and you will save a lot of time.
Learn some maths. Basic statistics, logic/set theory and a little calculus would be my recommendation. Many practising biologists have little or no grasp of elementary concepts like statistical significance, permutations and combinations and the principles of good experimental design. Logic will come in handy at the very least if you want to query databases in an intelligent way.
If you're interested in development, learn a real programming language: Pascal, C(++), Java or Fortran.
Perl and HTML are the stuff that holds the Web together. A grasp of these is essential for a lot of the Web/database work being done by many bioinformaticists at the moment.
Good old BASIC can be very useful as an introduction to programming or as a tool in its own right, but none of these latter languages is built to crunch numbers and tackle real world biological problems---which isn't to say people don't try...
Quantitative scientists talk about their interest in studying some aspect of "God's mind". Biologists are interested in "Mother Nature's body". If you want to win Nature over you are going to have to meet her in the flesh. You are as likely to be useful to biologists working in isolation at the keyboard as you are to conceive with your clothes on. Desk-bound bioinformaticists have written code that has turned out to be popular with biologists, but almost always because they have collaborated with biologists.
"MoBi" was the bioinformatics of its day; desperately fashionable, the province of new, higher-paid practitioners and considered with slight suspicion by more traditional biologists. It was once a great achievement to sequence a modest stretch of DNA, now it's a job for robots. Today we the technology is very well established. Scientists can buy molecular biology kits to perform the sort of genetic manipulations that would make your parents' jaws drop. Some of the kits are so simple your parents' parents could use them (with a modest amount of training and supervision).
Despite the profusion of commercial kits, there is still a requirement for real skill in molecular biology and the general level of scientific understanding required to be a good biological scientist---rather than just completing a practical class---doesn't come easy. Living matter, the stuff you have to work with is unpredictable and responds slowly---except when it's dying. Even supposedly fast-growing bacteria can take a long time to yield up their secrets.
Even now, as the focus of biomedical research shifts from molecular biology back to cell biology and protein biochemistry, it's well worth offering yourself up as a volunteer for some vacation work in a molecular biology lab. The term is now more often used to refer to the technological tools it provides biology in general rather than to fundamental research in the field itself. Those tools are common to a vast array of different kinds of research, from archaeology to zoology.
Protein (bio)chemistry is experiencing a revival. Proteins are still more delicate and fussy than nucleic acids. The same advice that applies to molecular biology applies to protein biochemistry. That stuff bioinformatics people refer to as "wet lab science" is much harder than it looks.
You might find it more difficult to get access to a good protein lab than a good molecular biology lab and do protein science with real wizards, but the very least you can do is read about the theoretical aspects of the subject.
For insights into the principles of proteins structure, try, for example, Carl Branden and John Tooze's "Introduction to Protein Structure" [Garland ISBN 0-8153-2305-0]. Physicists in particular might find the lack of general unifying principles in this area overwhelming. Unfortunately there's no substitute for acquiring a "feel" from the subject by examining a lot of examples. Still the most critical stages in the successful prediction of protein structure from sequence are those requiring human intervention.
Thomas E. Creighton has been responsible for a range of standard texts on protein chemistry. If you are working in a protein lab you are likely to come across his "Protein Function : A Practical Approach" [ISBN 019963615X] and the rather more expensive and theoretical "Proteins : Structures and Molecular Properties" [ISBN 071677030X]
It's a worn quote, but worth repeating:
"The mechanisms that bring evolution about certainly need study and clarification. There are no alternatives to evolution as history that can withstand critical examination. Yet we are constantly learning new and important facts about evolutionary mechanisms. Nothing in biology makes sense except in the light of evolution."Darwin's theory is one of the simplest and most misunderstood in science. Start with a good layperson's introduction, Richard Dawkin's "The Selfish Gene" (and remember: it's a metaphor, stupid) or Steve Jones' paraphrasing of Darwin's original "The Origin of the Species" "Almost Like a Whale". All biologists agree on the underlying principles, but they are nearly ready to kill one another over the details. After reading a decent book on evolutionary biology you should have at least a handful of good questions. Now you are ready to take a class in the subject. Take your questions with you. You'll probably start an argument---or a fight.Theodosius Dobzhansky in "American Biology Teacher" vol.35
You may have already generated your own sequence data experimentally. In this case you are likely to want to find sequences which are identical or similar (and therefore possibly related) to yours. The task is then one of similarity search.
In a symbolic sequence each base or residue monomer in each sequence is represented by a letter. The convention is to print the single-letter codes for the constituent monomers in order in a fixed font (from the N-most to C-most end of the protein sequence in question or from 5' to 3' of a nucleic acid molecule). This is based on the assumption that the combined monomers evenly spaced along the single dimension of the molecule's primary structure. From now on I shall refer to an alignment of two protein sequences.
Every element in a trace is either a match or a gap. Where a residue in one of two aligned sequences is identical to its counterpart in the other the corresponding amino-acid letter codes in the two sequences are vertically aligned in the trace: a match. When a residue in one sequence seems to have been deleted since the assumed divergence of the sequence from its counterpart, its "absence" is labelled by a dash in the derived sequence. When a residue appears to have been inserted to produce a longer sequence a dash appears opposite in the unaugmented sequence. Since these dashes represent "gaps" in one or other sequence, the action of inserting such spacers is known as gapping.
A deletion in one sequence is symmetric with an insertion in the other. When one sequence is gapped relative to another a deletion in sequence a can be seen as an insertion in sequence b. Indeed, the two types of mutation are referred to together as indels. If we imagine that at some point one of the sequences was identical to its primitive homologue, then a trace can represent the three ways divergence could occur (at that point).
A trace can represent a deletion:AKVAILAKIAIL
A trace can represent a insertion:VCGMDVCG-D
For obvious reasons I do not represent a silent mutation.GS-KGSGK
Traces may represent recent genetic changes which obscure older changes.
Here I have only represented point mutations for simplicity. Actual mutations
often insert or delete several residues.
Robotic technology is employed in the preparation of most arrays. The DNA sequences are bound to a surface such as a nylon membrane or glass slide at precisely defined locations on a grid. Using an alternate method, some arrays are produced using laser lithographic processes and are referred to as biochips or gene chips. The composition of DNA on the arrays is of two general types:
This resource has also been mirrored, without credit or any attempt to link to the Open Content Licence, at the so-called "National Bioinformatics Institute". If you are thinking of handing over money for their "certification" you can draw your own conclusions about their standing from this fact.
The first version of this resource was prepared when I was responsible for bioinformatics in the Section for Cell and Molecular Biology at the Institute of Cancer Research (the ICR) in London.
I am now a bioinformatics specialist at the HGMP-RC, part of the Proteomics Group and am supported by the Medical Research Council. This page does not represent their views, but I will happily read your criticisms. Although I may act on your advice I take no responsibility for anything that might happen if you browse here.