Escherichia coli is a bacterium that is a common - but certainly not the most abundant - inhabitant of the human colon. It also lives in the intestine of many other animals, wild as well as domestic.
Normally, E. coli does not cause disease although some strains frequently cause diarrhea in travelers, and it is the most common cause of urinary tract infections. One strain, designated O157:H7, is particularly virulent and has been responsible for several dangerous outbreaks in people eating contaminated food (usually undercooked hamburger).
Drinking water is tested for the presence of E. coli and related bacteria not because these bacteria are particularly dangerous but because they are an indication of contamination by sewage, and sewage may contain organisms (e.g., Salmonella, hepatitis A virus) that are dangerous.
E. coli is one of the most thoroughly studied of all living things. It is a favorite organism for genetic engineering as cultures of it can be made to produce unlimited quantities of the product of an introduced gene. Several important drugs (insulin, for example) are now manufactured in E. coli. However, E. coli cannot attach sugars to proteins so proteins requiring such sugars (e.g., glycoprotein hormones and clotting factors) have to be made in the cells of eukaryotes such as yeast cells and mammalian cells grown in cell culture.
Because E. coli lives in the human intestine, this has raised fears that genetically-engineered versions might escape from the laboratory (or factory) and take up residence in humans, producing a product that might be harmful. For this reason, genetic engineering is done only on strains of E. coli that have been deliberately weakened so that they cannot survive for long in humans.
The complete sequence of the genome of a harmless laboratory strain of E. coli (K-12) was reported in the 5 September 1997 issue of Science. The genome consists of a single molecule of DNA containing 4,639,221 base pairs. These encode 4288 proteins and 89 RNAs. Many of the genes were already known and the function of many others can be deduced from the similarity to known genes.
The complete sequence of the pathogenic strain O157:H7 was reported in the 25 January 2001 issue of Nature. It contains 5416 genes in 5.44 x 106 base pairs of DNA. Remarkably, these include 1,387 genes that are not present in its harmless laboratory relative E. coli K-12 (and K-12 has 528 genes that are not found in O157:H7). So here are two strains of the same species that differ in some 25% of their genes. Compare this with the difference between the genomes of humans and chimpanzees which probably is no more than 1%!)