Copy Number Variants (and Thanksgiving Dinner)
Scientists at the Wellcome Trust Sanger Institute in Cambridge reported some ground-breaking findings in human genetics yesterday. According to their findings, the human genome varies much more wildly between individuals than previously thought.
Our genome is about 3 billion DNA base pairs long. This sequence of letters (A, C, G, and T) is responsible for everything from the development of a fertilized egg into a fetus, hormonal changes in puberty, and the graying of our hair later in life. DNA accomplishes this dizzying variation of tasks by making proteins - molecular factories that carry out most of the chemistry that takes place in our cells.
For a non-scientific analog, think about a cookbook. Filled with recipes that use wildly different ingredients to accomplish the task of deep-frying a turkey or making creamy mashed potatoes, each of these recipes (the actual instructions on the page) are made up of the same letters of the alphabet. Although we need 26 such letters in English, nature pulls off a language with only four letters. The words in the recipe, like "boil" or "broil" have very similar letters, but their instructions lead to wildly different results.
We assumed, perhaps naively, that each and every human being on the planet had a very similar genome. Everyone had the recipe for a liver, and two kidneys, so what makes us different and unique must be tiny little changes in individual letters. Changing this A to a C, makes my eyes blue and yours brown. You get the idea.
Without sequence data from lots of humans, we couldn't say otherwise, and it seemed like a nice explanation for our uniqueness. It looks like this, in fact, isn't the case. According to the current study, instead of each of us having DNA that is 99.9% identical, that number is closer to 99.5%. This difference is actually significant, but more interesting is the new type of differences that this team of scientists discovered.
Instead of differences in individual letters of our genetic sequence (termed "single-nucleotide polymorphism"), there are a significant number of copy number variable regions. 1,447 of these regions that varied between individuals were found, constituting 12% of the humane genome. The most amazing of these differences is the observation that some people have multiple copies of a certain sequence of DNA.
Let's go back to our cookbook analog to think about this. Imagine that you have to make every recipe in your book today - duplicates included. Your cookbook might have three recipes for pumpkin pie, while mine only has one. With three pies, you'd be better off than me. (Who doesn't like pie?)
It turns out that the number of times a copy of particular sequence of DNA appears in your DNA can have really significant effects. Studies have linked copy number variations to increased risk of HIV infection, and are associated with Parkinson's and Alzheimer's disease.
The paper published yesterday in Nature includes the first map of these variable regions in the human genome. They were only able to detect regions of variation that were around 50,000 base pairs long. With improved methods that could look for smaller regions of variation, the authors guess that they would find many, many more regions of our genetic code that make us unique individuals. Whether these regions are in fact responsible for such terrible diseases as Parkinson's, and if we can start to control them, only time will tell. The current work is certainly the first step.
For additional coverage, see: Reuters, The Independent, Wellcome Trust Press Release, Nature Newsblog, and Slashdot.