Additional BLAST Resources
This tutorial
was designed as a basic introduction to using BLAST and interpreting BLAST
results. To learn more about BLAST, check out the following NCBI resources
used as references for this tutorial:
BLAST
Search Options Guide
BLAST provides
several options for narrowing or modifying a search. Several of the options
presented on the protein-protein
BLAST page and the formatting BLAST page (accessible after submitting
a BLAST query) are explained below. Each search option on these pages links
to a BLAST Help page that includes a brief description of the option.
Search:
Besides pasting sequence data into the search box, you can also submit
query sequences by entering sequence identifier numbers such as accession
numbers or gi's. For descriptions of what accession numbers and gi's are,
see the Glossary
of Bioinformatics Terms.
Set Subsequence:
Lets you limit your query to a particular portion of your sequence. For
example, if you want to limit the query so that only the region between
amino acid residues 50 and 150 is compared with other protein sequences,
simply enter 50 into the From box and 150 into the To box.
Choose Database:
Choose from among the following protein sequence databases:
NR
- Default setting - All non-redundant translations of CDS (coding sequences)
of GenBank nucleotide sequences as well as amino acid sequences from Protein
Data Bank (PDB), SwissProt, Protein Information Resource (PIR), and Protein
Resource Foundation (PRF) in Japan. See our Genome
Database Guide for more information about these databases. Non-redundant
means that the same sequence or translation in more than one database should
be listed only once in the BLAST output.
swissprot
- Only protein sequences from the last major release of Swiss-Prot protein
sequence database. No updates to Swiss-Prot sequences are included.
pat -
Protein sequences derived from the Patent division of GenBank.
yeast
- Translations of Yeast (Saccharomyces cerevisiae) genomic CDS (coding
sequences).
ecoli
- Translations of Escherichia coli genomic CDS (coding sequences).
PDB -
Protein sequences derived from 3-dimensional structures at Protein Data
Bank (PDB). See our Genome
Database Guide for more information about PDB.
Drosophila
genome - Drosophila genome proteins provided by Celera and Berkeley
Drosophila
Genome Project (BDGP).
month
- Sequences in the NR database that are new or have been added in the last
30 days.
Do
CD Search: Checking this box will compare the query sequence with the
Conserved Domain Database. A domain is a protein section that has a a distinct
evolutionary origin and function. CD Search is carried out by default for
each protein-protein BLAST query. BLAST search results will include a link
to CD-Search results if this box is checked. For more information about
CD Search, see the CDD
Home Page.
Options for
Advanced Blasting
Limit
by entrez query: This option can be used to specify search criteria
for limiting or refining BLAST searches. Any query statement that can be
submitted to an Entrez database can be entered into the first box. For
example, you could enter mouse[ORGN] OR rat[ORGN] to include only protein
sequences from mice or rats. A specific organism also may be chosen using
the "Select from" drop-down box on the right. For more information on formulating
an entrez query, see Refining
Your Search from the Entrez Help Document.
Choose filter:
Low
complexity - This option is checked as the default. This filter allows
the masking of query sequence portions that have low complexity (e.g.,
a long string of the same amino acid or nucleotide). For a protein sequence
query, the filter will replace a low-complexity region with a string of
X's (e.g., XXXXXXXXXXXXX), or a string of N's in a nucleotide sequence
query. Low-complexity regions can result in high scores that reflect compositional
bias rather than significant position-by-position alignment (Wootton
& Federhen, 1996). Filtering is applied only to the query sequence
(or its translation products), not to database sequences.
Mask for
lookup table only - This option for advanced searchers is used in constructing
the lookup table used by BLAST. This experimental option is likely to change
in the future.
Mask lower
case - Select this option to customize filtering from the query sequence
when it is compared with other database sequences. The query sequence in
uppercase characters is entered into the search box, and areas to be filtered
are denoted in lowercase characters.
Expect:All
sequences retrieved during a BLAST search must have an Expect (E Value)
lower than the number specified by this option. The Expect describes the
likelihood that a sequence with a similar score will occur in the database
by chance. The
default Expect value is 10. Since hit sequences with Expect values closer
to zero are more statistically significant, you may want to set this option
to 1 or to some decimal value.
Other "Options
for Advanced Blasting," such as composition-based statistics, Word size,
Matrix, PSSM, Other Advanced, and PHI Pattern, are designed for more advanced
BLAST users. For our purposes, these options should be left to their default
values. For more information about these advanced options, see BLAST
help.
Format
Show
Graphical
Overview - This option is selected by default. In BLAST results, this
option provides a graphic depiction of how the similar sequences retrieved
from the databases (the subject sequences) line up with the query sequence
(the thick red line at the top). The score of each alignment is indicated
by one of five different colors as defined in the Color Key for Alignment
Scores shown at the top of the graphical overview.
Linkout
- Also selected by default. If this box is unchecked, no links from BLAST
results to other NCBI databases are provided.
NCBI-gi -
Also selected by default. This option allows the NCBI-GI (GenBank Identifier,
a number unique to each sequence) to be displayed for each hit sequence
included in output. NCBI-GI links to a subject sequence record from NCBI
sequence databases.
Format -
Leave the drop-down menu beside the NCBI-GI option set to the default ALIGNMENT.
Other selections in the drop-down menu (PSSM and Bioseq)
are for more advanced users. To view the graphical overview, the HTML
(default) setting should be selected from the second drop-down menu in
the Format option. Selecting "Plain Text" from the drop-down menu will
present BLAST output in a more printer-friendly format; the graphical overview
feature, however, will be omitted and all hyperlinks deactivated.
Number of
Descriptions
- Restricts the number of matching-sequence descriptions reported.
The default limit is 100 descriptions.
Alignments
- Restricts the number of alignments (default alignment type is pairwise)
between query and subject sequences included in the BLAST results. The
default limit is 50.
Alignment View
To
see some of the following formats, see NCBI's Examples
of Alignment Formats.
Pairwise
- Default setting for alignment view in which the query sequence's
full length is lined up, amino acid by amino acid, with the full length
of each retrieved database sequence. When comparing DNA sequences using
BLAST, the query sequence's nucleotides are matched up with those of each
database sequence.
Query-anchored
with identities - Rather than a pairwise alignment, this is a type
of multiple alignment. In this view, a query-sequence segment (for example,
amino acids 1 through 60) is displayed with the corresponding section of
each retrieved sequence listed below it. Each query-sequence segment begins
with the number 1 at the far left, while each database-sequence segment
begins with its corresponding gi (GenBank identifier) at the far left.
Identities are displayed as dashes, with mismatches as single-letter amino
acid abbreviations
Query-anchored
without identities - This multiple alignment view is similar to query-anchored
with identities; each match, however, is indicated by the single-letter
amino acid abbreviation instead of a dash.
Hit Table:
Presents
all BLAST results in a table that summarizes some of the following information
for each subject sequence retrieved: subject ID, % identity between query
and each subject sequence, alignment length, number of mismatches, number
of gap openings, E Value, and bit score
The Limit results
by entrez query option is described above.
Format
for PSI-BLAST and Expect value range options are designed for
more advanced BLAST users (see BLAST
help).
|