A new mathematical approach for analyzing the complex patterns of natural mutation in DNA will, according to its developers, help biologists understand how mutation contributes to evolutionary change in mammals. The researchers, Phil Green and Dick Hwang, from the Howard Hughes Medical Institute, published a description of their new analytical approach and an initial application in theProceedings of the National Academy of Sciences.
“Understanding naturally occurring mutations has been of great interest because mutations are major drivers of evolution,” said Green. “However, it’s surprising how little is still known about their causes.”
Previous studies have revealed a number of biases in the rates of different types of mutational change. These arise in part from the innate biochemical characteristics of the four DNA nucleotide units – adenine, guanine, cytosine and thymine – that affect their vulnerability to modification and the accuracy with which they are replicated when cells divide. Particular nucleotide sequences, for example cytosine-guanine (CpG) dinucleotides, form “hotspots” – regions that are particularly vulnerable to alterations that convert one nucleotide to another, causing mutations.
“Apart from their intrinsic interest, we think understanding the underlying mutation patterns better will also help us in finding the functionally important features in the genome. Basically, it’s a signal-to-noise issue, where the naturally occurring mutations are the ‘noise’ and the functional parts of the genome are the ‘signal’. The better we understand the noise, the better job we can do of understanding the signals.”
To begin to understand the patterns of DNA changes that result from neutral mutation, Hwang and Green developed a new version of a statistical technique that they call Bayesian Markov chain Monte Carlo sequence analysis. The technique enables them to feed in sequence information from genomes of different organisms and discern patterns that can distinguish models of mutational mechanisms. According to Green, the statistical approach offers a powerful way to analyze models that are very difficult or impossible to solve analytically. “Until recently,” he said, “the state of the art in the molecular evolution field was to use models that people knew were gross over-simplifications, but had the merit that you could solve them analytically. Without doing too much computation, you could make estimates of mutation rates of various sorts. However, the cost of that simplified approach was a model that really is unrealistic.”
In particular, he said, the standard model treated all positions in the sequence as evolving independently of each other, rather than taking into account context effects, in which the identity of neighboring nucleotides influences the nature and rate of mutations.
“While a few other investigators have been working on how to take into account context effects, I think we are doing it in more rigorous, more complete way,” said Green. Without such a rigorous approach, he said, models of evolution could give erroneous results regarding the effects of mutation.
“I think the more realistic you can make the model, the less likely you are to be led astray by drawing conclusions that really had more to do with the deficiencies of your model than with the underlying reality,” said Green.
Hwang and Green tested their analytical approach by using it to compare the sequences of corresponding genome segments from 19 mammalian species, including human, horse, lemur, rat, rabbit, hedgehog and armadillo. Such comparisons among species across the mammalian evolutionary tree can yield insight into how mutational patterns have changed over evolutionary time. They focused their analysis on a 1.7 million base pair DNA segment known as the “greater cystic fibrosis transmembrane conductance regulator region,” which was sequenced in the 19 mammals by Eric Green and his colleagues at the National Human Genome Research Institute. To concentrate on the neutrally evolving DNA, Hwang and Phil Green excluded the genes from those segments and compared what was left.
According to Green, the comparison of context-dependent mutation in the segments across the species revealed that the CpG mutations, unlike other mutation types, accumulated in a regular clock-like fashion. The analysis also distinguished other sources of naturally occurring mutations and their variation due to biological and biochemical influences, and appears to offer some insight into factors such as generation time and population size that have varied in mammalian evolution.
Green said that by contributing to a better understanding of naturally occurring mutations, the technique would help in understanding both how genetic disease arises and how evolution has occurred.
A next step, he said, will be to extend the analysis to sites on the genome that are not evolving neutrally. This should help identify genomic regions that were not previously recognized to be of functional importance, said Green. Also, he said, such analyses could offer considerable insight into how patterns of natural selection have varied across different species in the course of evolution.
“A more complex model of the neutral process should start to pay for itself in exploring these phenomena, because you’re frequently looking for relatively subtle effects,” said Green.