What are Databases?
A database is most often thought of as a 2-dimensional matrix.  The columns are known as fields;  the rows are known as records.  In this database example each person is a record, and the various elements of information about that person are the fields.  In this example, there are 5 records ........ and 7 fields for each record.

The beauty of a database is that it can be searched.  Usually this is done by entering a word or phrase.  This is known as a query, and the box it is typed into is called a query box.  If this database example   were queried for the word "rat", we would get 11 "hits"  in all the fields (3+3+0+1+2+2).

In the ENTREZ databases, the query can be refined by using Limits.  In the database example ,  we could use "rat" as a query, but then limit the search to only the "street" field,  In this case we would get only 3 hits, not 11.  If instead we had limited the search to the "street" OR "golf club" fields we would have gotten 7 hits.

Limits are very useful when very large or redundant databases are being searched.  If we queried a database of corporate executives and lawyers for "rat",  10's of 1000's of hits would be turned up.  In that case we could reduce the number of hits to perhaps only a few thousand by limiting it to Golf Clubs in Houston.

In some cases, the data in each field may be set up using a set of key terms.  The list of these terms (usually alphabetic) is an Index.  In the database example ,  we find that in the "Tel #" field, there are 5 telephone numbers which are indexed by Area Code. Thus, if we searched "203" as one of the terms in our search, we would reduce our search to only two records.

Entrez allows you to browse the indexes by which records and/or data are stored. Checking the Index for Keyterms used in a particular field enables a much more efficient search because:

  • useful keyterms can be identified, which decreases the chance of missing something relevant.
  • additional keyterms can be found, which also decreases the chance of missing something relevant.
  • unuseful keyterms can be checked and discarded from the search, which eliminates "noise".
  • EXCERCISE: (15 minutes) 
    1. Open an Excel Workbook, and set up a table with 10 records and 6 fields like this.
    2. Fill in the table with the names and telephone numbers of 10 people in your cell phone. then complete each recorde by filling in the "town", "state", and "relationship" fields.
    3. Select the cells as shown in blue.
    4. On the "DATA" scroll down menu, choose "SORT". A dialog box will open, which looks like this.
      • Click on the radio buttons as shown by the arrows.
      • Click on the "Sort by" pull-down menu, and select "relationship".
      • Click on the "Then by" pull-down menu, and select "LAST".
      • Click on "OK".
    5. Observe what happens when
      • a descending sort is done.
      • the order of the sorts is reversed.
      • the "No Header Row" radio button is selected.
    6. Open a Word document. EXPLAIN why storing data in a database, makes it easier to search and retrieve only the data which is desired.
    7. PRINT   1.) your Excel database;  2.) your answer to question #6. Hand them in to your Instructor!
    The ENTREZ Nucleotide Data Base is set up as shown in this simplified example.  The Nucleotide Data Base contains all the DNA, RNA, STS, EST, etc.  sequences which are archived by NCBI.  Each sequence is a different record. The fields are:
    1. Accession Number
    2. Author (submitter of the sequence or author of the paper in which the sequence was reported)
    3. Gene Name
    4. Organism
    5. Protein Name
    6. Publication Date
    This is a very short list, and is meant only to be a simple illustration of the NCBI databases. As will become evident, there are actually many more fields than this!
    Click on this icon to go to the ENTREZ Introduction.