8 November 2007

Genome Analysis Left Wanting

by Kate Melville

The sequencing and comparison of 12 fruit fly genomes - involving hundreds of scientists from more than 100 institutions in 16 countries - has been completed, but the project has revealed considerable flaws in the way researchers identify genes.

"We've made huge progress in recent years with many genomes, including humans, but a lot of the problems can't be solved by simply dumping data into a computer and having truth and light come out the other end," said project leader Thomas Kaufman, from Indiana University. "One of the things we've learned from this project is that when you compare a lot of different but related genomes, you are more likely to see the genes that are buried in all that A-C-T-G mush." This was one of the key insights to come out of the four year project: the idea that resolving any individual species' genome is greatly enhanced when related genomes are compared to it.

The researchers deliberately chose a wide variety of fruit flies for study, guessing correctly that both gene similarities and differences among the 12 species would be easier to identify. Some of the Drosophila species the scientists studied are closely related to D. melanogaster, some not. Some of the flies fulfill very specialized ecological niches, such as D. sechellia, which has evolved a unique ability to detoxify the fruit of the Seychelles' noni tree. The other 10 species the consortium examined were D. pseudoobscura, D. simulans, D. yakuba, D. erecta, D. ananassae, D. persimilis, D. willistoni, D. virilis, D. grimshawi, and the cactus-loving D. mojavensis.

In comparing the 12 genomes, the scientists found 1,193 new protein-coding genes and hundreds of new functional elements, including regulatory sequences that determine how quickly genes are expressed, and genes that encode functional RNAs such as small nuclear RNAs. They also learned certain genes appear to be evolving faster than others, such as the genes associated with smell and taste, sex and reproduction, and defenses against pathogens.

A vexing problem for scientists is finding genes and other important DNA sequences in heterochromatin, tightly packed areas of chromosomes presumed to experience little expression. Heterochromatin is common in animal genomes. "The heterochromatin is very hard to analyze," Kaufman said. "Studies show heterochromatin changes the most. It's full of intermediate- and full-repeat sequences. And there are genes buried in this stuff."

Procedures for locating the genes that encode proteins are pretty well established. The lingering problem for genomics biologists is locating genes whose parts are interrupted repeatedly, as well as locating genes that do not code for proteins. By comparing a huge number of genomes, these sorts of genes are relatively easy to locate. Genes that do important things for cells or tissues are more likely to be "conserved" over time; that is, they don't change much despite millions of years of mutations.

One of two articles about the project in Nature, by computational biologist Matthew Hahn, noted that although all 12 Drosophila species have about the same number of genes (14,000), the genomes are more dynamic than one might expect. "The highest turnover in gene number occurs in genes involved in sex and reproduction," Hahn said. "Our results demonstrate that the apparent stasis in total gene number among species has masked rapid turnover in individual gene gain and loss. It is likely that this evolutionary revolving door has played a large role in shaping the morphological, physiological, and metabolic differences among species."

Related articles:
Horizontal Gene Transfer Vastly Underestimated, Suggests New Study
Human Genome "Far More Complex Than Anyone Imagined," Laments Prof
Metagenomics: The More The Merrier
Junk RNA Begins To Yield Its Secrets

Source: Indiana University
Pic courtesy Justin Kumar