Introduction to Bioinformatics

Bioinformatics seeks to analyze large sets of biological data in order to solve biological questions, to formulate hypotheses and to build models of underlying biological processes involved.



Applications of bioinformatics

  • Medicine
  • Research
  • Pharmaceutical
  • Biotechnology

Introduction to Bioinformatics

see also Definitions, Glossaries, and Dictionaries
see also Recommended Reading

The tremendous interest in bioinformatics, a new discipline at the intersection of molecular biology and computer science, is fueled by the excitement surrounding the sequencing of the human genome and the promise of a new era in which genomic research dramatically improves the human condition. Advances in detection and treatment of disease and the production of genetically engineered foods are among the most often mentioned benefits. Bioinformatics is a fertile new area for programmers. As the eminent computer scientist Donald Knuth is often quoted as saying: "Biology easily has 500 years of exciting problems to work on" (Doernberg 1993).

The National Center for Biotechnology Information (NCBI 2001) defines bioinformatics as:

"Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline...There are three important sub-disciplines within bioinformatics: the development of new algorithms and statistics with which to assess relationships among members of large data sets; the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and the development and implementation of tools that enable efficient access and management of different types of information."

Damian Counsell's Bioinformatics FAQ (2001) puts it more simply. "I would say most biologists talk about 'doing bioinformatics' when they use computers to store, retrieve, analyze or predict the composition or the structure of biomolecules. As computers become more powerful you could probably add simulate to this list of bioinformatics verbs. 'Biomolecules' include your genetic material---nucleic acids---and the products of your genes: proteins."

While the terms bioinformatics and computational biology are often used interchangeably, medical informatics is another field entirely. "Medical informatics generally deals with 'gross' data, that is information from super-cellular systems, right up to the population level, while bioinformatics tends to be concerned with information about cellular and biomolecular structures and systems." (Counsell 2001)

For more information, see the Definitions, Glossaries and Dictionaries and the Recommended Reading sections of this guide.



Definitions, Glossaries, and Dictionaries

see also Introduction to Bioinformatics
see also Guides, Tutorials and Primers

Definitions

A quick review of the basic genetic terms and concepts will help in understanding the sequence databases. The NCBI Genetics Review site is highly recommended reading since it provides a particularly good overview of the concepts as well as listing some good references for additional information ({http://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Genetics/}). The following terms are central to understanding bioinformatics:

Nucleotide:
One of the structural components, or building blocks, of DNA and RNA. A nucleotide consists of a base (one of four chemicals: adenine, thymine [uracil instead of thymine for RNA], guanine, and cytosine) plus a molecule of sugar [ribose for RNA, deoxyribose for DNA] and one of phosphoric acid .

Gene:
A length of DNA which codes for a particular protein, or in certain cases a functional or structural RNA molecule (from PhRMA Genomics Lexicon {http://genomics.phrma.org/lexicon/}). Less than 5% of the human genome codes for genes. The rest are non-coding sequences which may have other functions.

Genome:
The complete gene complement of an organism, contained in a set of chromosomes (in eukaryotes), in a single chromosome (in bacteria), or in a DNA or RNA molecule (in viruses) (from Academic Press Dictionary of Science and Technology {http://www.harcourt.com/dictionary/}).

Genomics:
Operationally defined as investigations into the structure and function of very large numbers of genes undertaken in a simultaneous fashion (from What is Genomics? {http://www.genomecenter.ucdavis.edu/what.html}). Genetics looks at single genes, one at a time, as a snapshot. Genomics is trying to look at all the genes as a dynamic system, over time, and determine how they interact and influence biological pathways and physiology, in a much more global sense (from Basic Genetics & Genomics http://www.genomicglossaries.com/content/Basic_Genetic_Glossaries.asp).

Proteome:
The complement of proteins expressed by an organism, tissue or cell type (from Proteomes and Proteomics). The concept of the proteome is fundamentally different to that of the genome: while the genome is virtually static and can be well defined for an organism, the proteome continually changes in response to external and internal events.

Proteomics:
The study of the full set of proteins encoded by a genome. The characterisation of patterns of gene expression at the protein level or the link between proteins and genomes. Proteomics encompasses many different approaches to protein study, from bioinformatics of protein content of genomes to large scale direct protein analysis of complicated protein mixtures, and the definition of a protein's properties, their interactions and modifications (from Proteomes and Proteomics {http://www.mrc-dunn.cam.ac.uk/pages/proteomes.html}).



Glossaries and Dictionaries

Science Magazine: Functional Genomics Resources: "Finding the right word: A guide to some useful online glossaries" Post-genomics, biotech and bioinformatics - {http://www.sciencemag.org/feature/plus/sfg/education/glossaries.dtl#postgenomics}
An excellent selective list, ranked by the site's editors, of the ten "best" online glossaries. See also glossaries on related topics at this site.

Access Excellence Graphics Gallery - http://www.accessexcellence.org/AB/GG/
"Graphics Gallery is a series of labeled diagrams with explanations representing the important processes of biotechnology. Each diagram is followed by a summary of information, providing a context for the process illustrated."

Genomics Glossary - http://www.genomicglossaries.com/
Actually a collection of several glossaries and taxonomies, including a Bioinformatics Glossary at http://www.genomicglossaries.com/content/Bioinformatics_gloss.asp. The Scout Report and Science Magazine give this resource very high praise, but this author found the site to be cluttered and difficult to navigate, although the content is very good.

Human Genome Project Information Glossary - {http://www.ornl.gov/sci/techresources/Human_Genome/glossary/}
A useful glossary of genetics terms from the DOE Human Genome Program that you can both browse and search.

National Human Genome Research Institute (NHGRI) Glossary of Genetic Terms - {http://www.genome.gov/glossary.cfm}
This is sometimes called the "talking glossary" since audio clips allow you to hear definitions and longer explanations given by an expert. Try it with the word "nucleotide." Illustrations are also sometimes available.

PhRMA Genomics Lexicon - {http://genomics.phrma.org/lexicon/}
This extensive glossary is sponsored by the Pharmaceutical Research and Manufacturers of America. Also provides links to other dictionaries and glossaries.

News/Keeping Current

Southwest Biotechnology and Informatics Center (SWBIC): News - {http://www.nbif.org/links/1.20.php}
Annotated directory of news sites, many focusing in bioinformatics (scroll down past the long table of contents to see the content). A good "launch pad" to news sites.

Genomics Today - {http://genomics.phrma.org/today/}
A daily headline news service that provides links to genomics news in other sites. It culls the relevant headlines from a wide variety of sources including wire services, newspapers, Yahoo, selected web sites, and university news sites. Sponsored by the Pharmaceutical Research and Manufacturers of America.

GNN: Genome News Network - {http://www.genomenewsnetwork.org/index.php}
Good source for news on scientific, as opposed to business, aspects of bioinformatics. Bioinformatics news is clearly marked. The short news summaries are to be commended for giving the full citation to the original scientific article at the end of each news piece. In addition to news there are also featured articles and a few educational links.

The Scientist - http://www.the-scientist.com/
Frequent coverage of bioinformatics news. Registration is free, after which you will automatically be sent via e-mail the tables of contents for each biweekly issue.