Additional BLAST Resources

This tutorial was designed as a basic introduction to using BLAST and interpreting BLAST results. To learn more about BLAST, check out the following NCBI resources used as references for this tutorial:


BLAST Search Options Guide
 

BLAST provides several options for narrowing or modifying a search. Several of the options presented on the protein-protein BLAST page and the formatting BLAST page (accessible after submitting a BLAST query) are explained below. Each search option on these pages links to a BLAST Help page that includes a brief description of the option. 

Search: Besides pasting sequence data into the search box, you can also submit query sequences by entering sequence identifier numbers such as accession numbers or gi's. For descriptions of what accession numbers and gi's are, see the Glossary of Bioinformatics Terms.

Set Subsequence: Lets you limit your query to a particular portion of your sequence. For example, if you want to limit the query so that only the region between amino acid residues 50 and 150 is compared with other protein sequences, simply enter 50 into the From box and 150 into the To box.

Choose Database: Choose from among the following protein sequence databases:

NR - Default setting - All non-redundant translations of CDS (coding sequences) of GenBank nucleotide sequences as well as amino acid sequences from Protein Data Bank (PDB), SwissProt, Protein Information Resource (PIR), and Protein Resource Foundation (PRF) in Japan. See our Genome Database Guide for more information about these databases. Non-redundant means that the same sequence or translation in more than one database should be listed only once in the BLAST output. 

swissprot - Only protein sequences from the last major release of Swiss-Prot protein sequence database. No updates to Swiss-Prot sequences are included.

pat - Protein sequences derived from the Patent division of GenBank.

yeast - Translations of Yeast (Saccharomyces cerevisiae) genomic CDS (coding sequences).

ecoli - Translations of Escherichia coli genomic CDS (coding sequences).

PDB - Protein sequences derived from 3-dimensional structures at Protein Data Bank (PDB). See our Genome Database Guide for more information about PDB.

Drosophila genome - Drosophila genome proteins provided by Celera and Berkeley Drosophila Genome Project (BDGP)

month - Sequences in the NR database that are new or have been added in the last 30 days.

Do CD Search: Checking this box will compare the query sequence with the Conserved Domain Database. A domain is a protein section that has a a distinct evolutionary origin and function. CD Search is carried out by default for each protein-protein BLAST query. BLAST search results will include a link to CD-Search results if this box is checked. For more information about CD Search, see the CDD Home Page.

Options for Advanced Blasting

Limit by entrez query: This option can be used to specify search criteria for limiting or refining BLAST searches. Any query statement that can be submitted to an Entrez database can be entered into the first box. For example, you could enter mouse[ORGN] OR rat[ORGN] to include only protein sequences from mice or rats. A specific organism also may be chosen using the "Select from" drop-down box on the right. For more information on formulating an entrez query, see Refining Your Search from the Entrez Help Document.

Choose filter:

Low complexity - This option is checked as the default. This filter allows the masking of query sequence portions that have low complexity (e.g., a long string of the same amino acid or nucleotide). For a protein sequence query, the filter will replace a low-complexity region with a string of X's (e.g., XXXXXXXXXXXXX), or a string of N's in a nucleotide sequence query. Low-complexity regions can result in high scores that reflect compositional bias rather than significant position-by-position alignment (Wootton & Federhen, 1996). Filtering is applied only to the query sequence (or its translation products), not to database sequences. 

Mask for lookup table only - This option for advanced searchers is used in constructing the lookup table used by BLAST. This experimental option is likely to change in the future.

Mask lower case - Select this option to customize filtering from the query sequence when it is compared with other database sequences. The query sequence in uppercase characters is entered into the search box, and areas to be filtered are denoted in lowercase characters.

Expect:All sequences retrieved during a BLAST search must have an Expect (E Value) lower than the number specified by this option. The Expect describes the likelihood that a sequence with a similar score will occur in the database by chance. The default Expect value is 10. Since hit sequences with Expect values closer to zero are more statistically significant, you may want to set this option to 1 or to some decimal value. 

Other "Options for Advanced Blasting," such as composition-based statistics, Word size, Matrix, PSSM, Other Advanced, and PHI Pattern, are designed for more advanced BLAST users. For our purposes, these options should be left to their default values. For more information about these advanced options, see BLAST help.


Format

Show
Graphical Overview - This option is selected by default. In BLAST results, this option provides a graphic depiction of how the similar sequences retrieved from the databases (the subject sequences) line up with the query sequence (the thick red line at the top). The score of each alignment is indicated by one of five different colors as defined in the Color Key for Alignment Scores shown at the top of the graphical overview. 

Linkout - Also selected by default. If this box is unchecked, no links from BLAST results to other NCBI databases are provided. 

NCBI-gi - Also selected by default. This option allows the NCBI-GI (GenBank Identifier, a number unique to each sequence) to be displayed for each hit sequence included in output. NCBI-GI links to a subject sequence record from NCBI sequence databases.

Format - Leave the drop-down menu beside the NCBI-GI option set to the default ALIGNMENT. Other selections in the drop-down menu (PSSM and Bioseq) are for more advanced users. To view the graphical overview, the HTML (default) setting should be selected from the second drop-down menu in the Format option. Selecting "Plain Text" from the drop-down menu will present BLAST output in a more printer-friendly format; the graphical overview feature, however, will be omitted and all hyperlinks deactivated.

Number of
Descriptions - Restricts the number of matching-sequence descriptions reported. The default limit is 100 descriptions. 

Alignments - Restricts the number of alignments (default alignment type is pairwise) between query and subject sequences included in the BLAST results. The default limit is 50. 

Alignment View
To see some of the following formats, see NCBI's Examples of Alignment Formats.

Pairwise - Default setting for alignment view in which the query sequence's full length is lined up, amino acid by amino acid, with the full length of each retrieved database sequence. When comparing DNA sequences using BLAST, the query sequence's nucleotides are matched up with those of each database sequence.

Query-anchored with identities - Rather than a pairwise alignment, this is a type of multiple alignment. In this view, a query-sequence segment (for example, amino acids 1 through 60) is displayed with the corresponding section of each retrieved sequence listed below it. Each query-sequence segment begins with the number 1 at the far left, while each database-sequence segment begins with its corresponding gi (GenBank identifier) at the far left. Identities are displayed as dashes, with mismatches as single-letter amino acid abbreviations 

Query-anchored without identities - This multiple alignment view is similar to query-anchored with identities; each match, however, is indicated by the single-letter amino acid abbreviation instead of a dash. 

Hit Table: Presents all BLAST results in a table that summarizes some of the following information for each subject sequence retrieved: subject ID, % identity between query and each subject sequence, alignment length, number of mismatches, number of gap openings, E Value, and bit score

The Limit results by entrez query option is described above. Format for PSI-BLAST and Expect value range options are designed for more advanced BLAST users (see BLAST help).
Top of Page
 

Back to "Sequence similarity searching using NCBI BLAST"

Back to "Index for ENTREZ and Data Base Searches"

RETURN TO SITE MAP