WWW Entrez Help
Note! This brief guide is intended for those who are generally
familiar with database searching. If you are new to Entrez, be sure to read
the section on Special Features below.
Table of Contents:
Getting Started
WWW Entrez allows you to retrieve molecular biology data and bibliographic
citations from the NCBI's integrated
databases. These include:
- DNA sequences from GenBank, EMBL, and DDBJ
- Protein sequences from Swiss-Prot, PIR, PRF, PDB, and translated protein
sequences from the DNA sequence databases
- Genome and chromosome mapping data
- Three-dimensional protein structures derived from PDB, and incorporated into
NCBI's Molecular Modeling
Database (MMDB)
- PubMed bibliographic database containing citations
for nearly 9 million biomedical articles from the National Library of Medicine's MEDLINE and
pre-MEDLINE databases
Basic PubMed Search
To search PubMed without worrying about fancy features, select "Basic
Search" from the Entrez Home Page. If
you are on the PubMed home page, you already have a Basic Search
form in front of you.
You will then see a section that looks like this:
Enter the term or terms that you wish to search on, separating terms
by spaces, and press the return key or the "search" button. This will take
you immediately to the Document Summary Page,
below, where you can review the results of your search.
Finding all terms that begin with a given word
Placing an asterisk at the end of a term will cause Entrez to search for
all terms that begin with that word; for instance "bacter*" will find
all terms that begin with the letters "bacter", e.g. bacteria, bacterium,
bacteriophage, etc. Phrases that have a space in the word that occurs after
the asterisk will NOT be included; for instance, "infection*" will include
"infections" but not "infection control".
Forcing Entrez to search for a phrase
Entrez will do its best to find logical groupings in your input.
For instace, if you enter "Lipman DJ Genomics", Entrez will recognize
that "Lipman DJ" is the name of an author and will convert your
search into
"Lipman DJ" AND Genomics
It may happen that Entrez fails to find a phrase that you think is vital
to a search. For instance, if you enter
brca 1
Entrez will not recognize that this is all one item and will search
for "brca" and "1" separately. Since the latter is a numeral and is
not indexed in the title and abstract fields, it will likely not find
what you want. You can circumvent this by putting quotes (") around
the words that Entrez is failing to recognize, e.g.
"brca 1"
Important!It is usually best to let Entrez do your grouping
for most accurate retrieval, and to use quotes only when Entrez has failed to
find anything because of a failure to group words properly. Forcing Entrez to
group words will often result in "no documents found". This does not mean that
the phrase you are looking for does not exist; rather, it was not indexed
as a group.
Searching for all terms that begin with a given string
All of the terms that begin with a given string can be searched on
by appending '*' to the end of the term.
For example, "baker*[auth]" would find all of the author names
that began with 'baker'.
Note! If the use of a '*' character results in too long a list
of terms to process efficiently (more than a hundred or so), Entrez
will not perform the search and will so inform you.
Searching by identifier
If you want to look up a citation or citations by identifier (MEDLINE UID,
PubMed ID, sequence GI, or the like), just enter "UID" followed by
the identifier(s) that you want. For example:
UID 88055872
Will find MEDLINE UID 88055872.
For Experts
Expert users of Entrez can, if they wish, enter a full boolean expression
in the search box. See Entering a Complex Boolean
Expression below.
All of the Advanced Search capabilities are still available in Basic mode,
they are just hidden. You can use
Advanced Search
Entering a Search Term
To search a database, select the appropriate one from the
Entrez Home Page.
You will then see a screen that looks like this
(the PubMed screen is shown; the other database screens
will have different fields) :
Select the field and mode under which you want to search, enter
the term you want to search for in the box given, and then
press the Search button.
Many browsers will allow you to submit the term you want to search
for simply by pressing "return" after typing in your term. Try it.
Search Fields
There are a number of search fields available in the WWW Entrez
databases. Some of the fields are
found in all five databases; others are not.
Each field contains the following information:
- Accession contains the accession number of the
sequence, assigned to the nucleotide, protein, structure, or genome record by a
sequence database builder.
- Affiliation contains the institutional affiliation
and address of the primary author, and sometimes of other authors.
- Author Name contains the list of authors for a
paper in the literature. In the Protein and Nucleotide databases, the
authors listed are those of the MEDLINE articles to which a sequence
is linked. The format for author names is the last name, followed by
a space and the first initial(s), without periods. For example,
Jacob F. Marley would be Marley JF; Ebenezer Scrooge would be
Scrooge E . Initials may be omitted when searching.
- E. C. Number is a number assigned by the
Enzyme Commission to designate a particular enzyme.
- Feature Key is a keyword denoting a particular
DNA feature.
- Gene Symbol is the standard name for a given gene.
If you cannot find a gene using Gene Symbol, try using
All Fields or Text Words instead.
- Journal Title is the name of the journal where
the record was published. Journal names are stored in the database
in abbreviated form; for instance, the Journal of Biological
Chemistry is stored as J Biol Chem . If you are not sure how a
journal name is abbreviated, use List Terms mode to browse the
journal titles.
- Keywords allows you to search using special index
terms from a controlled vocabulary associated with the GenBank,
EMBL, DDBJ, SWISS-Prot, PIR, PRF, or PDB databases. If you are not
familiar with the keywords used in these databases, this field
may not be useful to you, although using List Terms mode will let you
see what the terms look like.
- MEDLINE UID is the MEDLINE Unique Identifier of
a given citation.
- MeSH Terms includes all of the terms in the
Medical Subject Headings, a controlled vocabulary of keywords used to
index MEDLINE. Each MEDLINE citation is given a group of MeSH terms
that relate to the subject of the paper from which it is drawn.
Frequently, MeSH terms will have an additional term, called a
"subheading", which further defines how the MeSH term relates
to the article it is associated with. This subheading is appended
to the MeSH term, e.g. "pneumonia diagnosis". Searching on the MeSH
term (here, pneumonia) will retrieve all of the articles that use that MeSH
term, whether they have subheadings or not. Use the subheading terms if you
require more specificity than the MeSH term allows.
Note: MeSH terms searched for using theMesh or Mesh Major Topic fields
are automatically "Exploded" by WWW Entrez; that
is, all terms which are logical subsets of the term entered are
included. For instance, "pneumococcal infections" includes
"streptococcus pneumoniae" . MeSH terms found using the "All Fields"
search are NOT exploded.
- MeSH Major Topic includes all MeSH Terms (see
above) that are marked as being of major importance to this
record by the MeSH indexers.
- Modification Date contains the date that
the record was placed into Entrez, in the format year/month/day,
as for Publication Date, see below.
- Page Number is the number of the first
journal page that the article appears on.
- Property is one or more keywords that denote what
type of sequence this citations contains.
- Publication Date contains the date that
the article was published (for PubMed citations) or the date
that the record was added to GenBank (for sequence records), in the
format year/month/day, e.g. 1984/10/06. A year alone, (e.g. "1984")
will retrieve all articles for that year; a year and month (e.g.
"1984/03") will retrieve all for that month.
records published in a given year without regard to month,
use the year by itself, e.g. 1984.
- PubMed ID is the PubMed Identifier of
a given citation.
- Organism contains the scientific and common
names for the organisms associated with protein and nucleotide
sequences. Organism names are "exploded" much like MeSH terms;
for instance, searching on "mammalia" will find all entries indexed
under any mammal.
- Protein Name contains the name of the protein
that this sequence is associated with. The common name of a protein
may not be indexed under this field; if you cannot find a particular
protein using this field, try All Fields or Text Words.
- SeqId is the special string identifier, similar
to a FASTA identifier, for a given sequence.
- Substance contains the names of any chemicals
associated with this record from the Chemical Abstract Service (CAS)
registry and the MEDLINE Name of Substance field.
- Text Words includes all of the "free text"
associated with a record, specifically :
- MEDLINE records: the title and abstract.
- Protein records: the definition, comment, protein name, and
protein description.
- Nucleotide records: the definition, comment. gene name, and gene
description.
- Title Words includes only those words found
in the title or definition line of a record.
- Volume is the number of the journal volume this
article appears in.
The Medline UID, PubMed ID and
Sequence ID
fields retrieve records differently than other fields do. To use
them, it, enter one or more Unique Identifier
numbers in the Term box. If you enter more than one, separate
them by spaces or commas. Select the appropriate field (MEDLINE UID,
PubMed ID, or Sequence ID), and press Search.
The entries specified will be treated as if they were a search term,
and will be referred to as {List of Articles} by Entrez.
Finding all terms that begin with a given word
Placing an asterisk at the end of a term will cause Entrez to search for
all terms that begin with that word; for instance "bacter*" will find
all terms that begin with the letters "bacter", e.g. bacteria, bacterium,
bacteriophage, etc.
Forcing Entrez to search for a phrase
Entrez will do its best to find logical groupings in your input.
For instace, if you enter "Lipman DJ Genomics", Entrez will recognize
that "Lipman DJ" is the name of an author and will convert your
search into
"Lipman DJ" AND Genomics
It may happen that Entrez fails to find a phrase that you think is vital
to a search. For instance, if you enter
brca 1
Entrez will not recognize that this is all one item and will search
for "brca" and "1" separately. Since the latter is a numeral and is
not indexed in the title and abstract fields, it will likely not find
what you want. You can circumvent this by putting quotes (") around
the words that Entrez is failing to recognize, e.g.
"brca 1"
It is usually best to let Entrez do your grouping for most accurate
retrieval, and to use quotes only when Entrez has failed to find
anything because of a grouping error.
Note! If a quoted phrase is not found, that does NOT mean that
the phrase is not in the database; it usually just means that Entrez did
not recognize this as a phrase and thus did not index it. You should
remove the quotes and try again.
Expert users of Entrez can, if they wish, enter a full boolean expression
in the term box. See Entering a Complex Boolean
Expression below.
Search Modes
WWW Entrez allows you to enter terms for searching in several
different ways.
- In List Terms mode, when you enter a term,
Entrez displays the list of available terms for that field,
starting at the first term which begins with the characters that you
entered. You can then select one or more terms to add to your search.
For example, to see the text words beginning with "pneum", you would
enter "pneum" in the term box, select "Text Words" and "List Terms", then
press Search . List Terms Mode thus allows you to
browse through the terms in any given field. This can be very
useful if you are not sure how something is spelled.
- In Automatic mode, the term or terms that
you enter are immediately added to your search. If you enter more than one
word, Entrez will try to group them appropriately into terms and
add each to your query, finding those articles that have
every one of the terms. For instance, if you entered
"central nervous system", the terms "central", "nervous", and
"system" would be added to your query. If Entrez groups or fails to group
the words you entered properly, you can place one or more
words in quotes (") to force Entrez to group them as you wish.
Choosing a Term in List Terms Mode
If a term is entered in the term box using List Terms
Mode and the Search button pressed, a list of the
terms that begin
with the characters entered in the term box will be presented. For
instance, if "pneum" were entered (with the field selector on
"All Fields"), the resultant list might look like this :
Available terms in the field(s): All Fields (Total Records)
After each term is the number of articles that the term appears in,
To pick one or more of the terms in the "Available Term" list,
highlight them and press Select; the terms will then
be added to your search and to the Select.
If you want to look at another list of terms altogether, simply
reenter the new term in the term box as before and press
Search.
Your Chosen List of Terms
As you enter or select terms,
the terms will be added to your search and also placed into a list
at the bottom of your screen; this list is called the
Chosen List.
For example, if you had entered the term "pneumonia", and then
entered "cytomegalo*", the Chosen List would look like this (the middle part
of the form is omitted for brevity) :
Modify Current Query :
Term
(Total Records)
Entrez automatically calculates the intersection of the terms you
enter and displays the resultant search statement at the top of
the screen, calculating the number of records to retrieve. The
terms included in the search are highlighted in the Chosen List.
In the above example, there are 42 articles that contain both the
word "pneumonia" and also a word that begins with the characters
"cytomegalo" . Once you have entered terms of interest, you can
do any of the following:
- If the number of documents is reasonably small, press the
Retrieve button to see a listing of the records
your search has chosen; see
Retrieving Documents below.
-
Select and/or deselect terms in the chosen list until the terms you
wish to include in the search are highlighted, then press the
Search button. The system will then create a
new search statement based upon only the highlighted terms,
according to the type of evaluation you have selected. Here is what
each of the evaluation types do:
- Intersection (AND): only those records that
contain all of the terms specified are returned by
the search. This is abbreviated to AND in the search statement.
- Union (OR): those records that contain
any of the terms specified are returned. This is
abbreviated to OR .
- Difference (BUTNOT): those records that contain
the uppermost term but not any of the lower terms are returned.
This is abbreviated to BUTNOT .
Terms or expressions which are combined using Modify Query
are grouped into a single entity and placed on a
separate line in the Chose List. This permits you to combine terms
flexibly in many ways.
Note that the Retrieve button will continue to
retrieve your old search until you tell the system to update your
search using the Search button in Modify Query.
Retrieving Documents
When the number of documents that satisfy your query is reasonably
small, press the Retrieve button to view them.
This produces a listing containing each document's title, author,
and publication year. This listing is called the Document Summary
Page, and it detailed below.
If the number of documents that your query retrieves is large,
a box will appear indicating the maximum number of articles that
will be displayed. You can change this number to whatever is
suitable. If you cannot or do not choose to display all of the
articles that your search has found, the articles you do see will
be the more recent ones in the database.
The Document Summary Page
Once you have pressed the Retrieve button, WWW Entrez
will display a listing of information on the documents that your
search has found. This permits you to browse through the retrieved
list of documents easily. Once you have determined which documents
in the list are of interest, you can view them, individually
or as a group.
Viewing Documents
Each document can be viewed in any of several "formats", each of
which is good for some purpose. The best way to decide what
format best suits you for any given purpose is to experiment with
them and see what they look like. In general, "Citation" format
is best for viewing MEDLINE records, "GenPept" for viewing Protein
records, and "GenBank" for viewing Nucleotide records.
To view a single document in PubMed, select the link at the top of the
document. This will show you the document in Citation format, and allow you
to select other formats therefrom. To view a single document in the
other databases, select the format you wish to view from the choices
below the summary information.
To view several documents at once, select the documents you wish to view
by selecting their checkboxes. If you want to view all of the documents on
the page, don't select any of them. Then pick the
type of report you want from the popup box at the top of the
screen and press "Display".
Viewing Formats
Viewing formats available include:
For PubMed articles:
- Citation - The Title, Abstract, MeSH terms, and Substance information in
an article.
- Abstract - The Title and Abstract only of an article.
- ASN.1 - The article in ASN.1 format.
- MEDLINE - The article in MEDLARS format.
For Protein and Nucleotide records:
- GenBank/GenPept - The standard GenBank or GenPept flatfile.
- Report - GenBank report format.
- ASN.1 - ASN.1 format.
- FASTA - FASTA format.
- Graphic view - The graphical view of the entry, with alignment information.
For Structure records:
- Structure Summary - Basic information about the structure. Choose
this format to view the structure in 3-D.
- ASN.1 - ASN.1 format.
For Genome records:
- Graphic view - The graphical view of the entry, with alignment information.
- ASN.1 - ASN.1 format.
Saving Documents
When you view Document Reports, you will be given the option to save your
documents in a number of formats. The Macintosh/PC/UNIX popup permits you to
select the basic file format you desire, while the Text/HTML/MIME popup
modifies the output for different uses, as follows:
- Text format removes all HTML tags and breaks lines at
80 columns.
- HTML format leaves the HTML tags in, for use in a
browser.
- MIME format sends a file of GenBank MIME type. This is
useful only if you have a Genbank MIME viewer installed and configured properly.
Getting Document Neighbors and Links
One of the most helpful features of Entrez is the ability to
find documents which are similar to a document you
are interested in. These related documents are called
neighbors. For more details on what
neighbors are, how they are calculated, and how to use them, see
Special Features below.
To retrieve the neighbors or links for a given record or set of
records, the procedure is the same as for viewing records, above.
To view a single document's neighbors or links, view the document
and select the button at the top that indicates the type of neighbor/link
that you want to see.
To view several documents' neighbors or links at once, select
the documents by pressing the checkboxes next to the documents you
want (as above, select nothing to see them all). Then select the type of
neighbor or link you want from the
popup box at the top of the screen and press "Display".
Outside Links
Some Documents have links to outside resources. These will appear as buttons
at the top of the document report. They include:
- {Journal Name} - the WWW page for this journal article.
- OMIM - Online Mendelian Inheritance in Man, the NCBI/JHU genetics text.
- UMBBD - The University of Minnesota Biodegradation/Biocatalysis Database.
- AGIS - The Department of Agriculture DNA database.
- PGR - The Plant Gene Registry.
- MGD - The Jackson Laboratories Mouse Genome Database.
and many others.
For Experts Only
This section explains features of WWW Entrez that may be of interest
to users with very specific needs. Most users do not need to be familiar with
the items in this section.
Entering Complex Boolean Expressions
A search can be performed by specifying the terms to search, their fields,
and the boolean operations to performs on them, all at once. Use the
following syntax :
term [field] operator
term [field] ....(etc)
term is the term string that you wish to search on.
Field is an Entrez Field designation, which can be:
- for PubMed : one of AFFL, ALL, AUTH, ECNO, JOUR, MESH, MAJR, PAGE, PDAT,
PTYP, SUBS, TITL, WORD, or VOL.
- for Protein : one of ACCN, AUTH, ECNO, GENE, JOUR, KYWD, MDAT, ORGN,
PDAT, PROP, PROT, SQID, SLEN, SUBS, or WORD.
- for Nucleotide : one of ACCN, AUTH, ECNO,
FKEY, GENE, JOUR, KYWD, MDAT, ORGN, PDAT, PROP, PROT, SQID, SLEN, SUBS,
or WORD.
- for Structure : one of ACCN, AUTH, JOUR, SUBS, or WORD.
- for Genomes : one of ACCN, AUTH, ECNO, GENE, JOUR, ORGN, PROP, PROT,
or WORD.
where WORD = text word, TITL = title word, MESH = mesh term,
MAJR = MeSH major topic, AUTH = author name, JOUR =
journal name, ECNO = E.C. Number, GENE = gene name,
DATE = publication year, PDAT = publication/creation date,
MDAT = modification date, PAGE = first page, VOL = volume,
KYWD = Keyword, ORGN = organism, ACCN = accession number, PROT = protein name,
SUBS = substance, PROP = property, FKEY = feature key, and PTYP = publication
type.
operator is any of :
- AND (intersection)
- OR (union)
- BUTNOT (difference).
Note : Boolean Expressions are normally processed left to right. If
you wish part of your boolean expression to be processed out of
order, enclose it in parentheses.
An Example of a boolean expression : Find the articles in the Journal of
Biological Chemistry that contain the term "p21" in their text :
Specifying A Range of Terms
Another special expression is the range. You may use the
syntax:
term1:term2
To specify all of the terms in the term list for a given field
from term1 to term2, inclusive. For instance, to find all
Protein entries that have a sequence length between 19,000 and 20,000 bases,
you would go to the protein database, select the "sequence length" field, and
enter:
019000:020000
The leading zero is necessary because the sequence length terms are all
six-digit integers. When in doubt, use "List terms" to see the terms in a list;
the range operator will use the terms in the order that they appear.
Special Features
What makes Entrez more powerful than many services is that most of its records
are linked to other records, both within a given database (such as PubMed)
and between databases. Links within a database are called "neighbors".
PubMed neighbors are determined by comparing the Text and MeSH
terms of each article, using a powerful algorithm
that determines just how well the article matches every other article. The
best matches for any article are saved, and you can retrieve them using the
"Related Articles" button at the top of the article report.
Protein and Nucleotide
neighbors are determined by performing similarity searches using
the algorithm BLAST on the amino acid or DNA sequence in the entry and the
results saved as above.
What this means is that if you find one or a few documents that match what
you are looking for, pressing the "Related Articles/Sequences" button will
find a great many more documents that are likely to be relevant, in order
from most useful to least. This allows you to find what you want with
much greater speed and accuracy: instead of having to flip through
thousands of documents to assure yourself that nothing germane to your
query was missed, you can find just a few, then look at their neighbors.
Try this feature out and see how it works for you; you may well wonder
how you got along without it!
In addition, some documents are linked to others for reasons other than
computed similarity. For instance, if a protein sequence was published
in a PubMed article, the two will be linked to one another.
How to use the WWW Entrez Genome Viewer
The WWW Entrez Genome database takes you to a graphic view that can be
used to find the specific area of a genome that you are interested in and
view its component sequences. here are detailed
instructions on how to use these
features.
How to use the WWW Entrez Structure Viewer
The WWW Entrez Structure database takes you to a summary page that can be
used to load the 3-D structure that you are want into a viewer in order
to manipulate it. Here is a description
of the MMDB structure database and instructions on how to do this.
For More Assistance
If you have found a bug or are still confused, please e-mail to
the NCBI Help Desk
and we will be happy to assist you.
Thanks!
Credits: Brandon Brylawski