View Single Post
Old 09-24-2012, 02:52 PM   #2
gdpawel
Senior Member
 
gdpawel's Avatar
 
Join Date: Aug 2006
Location: Pennsylvania
Posts: 1,080
About the Cancer Genome Atlas

The field of genomics is caught in a data deluge, as reported by Andrew Pollack of the New York Times (DNA Sequencing Caught in Deluge of Data, December 1, 2011). DNA sequencing is becoming faster and cheaper, but the result is that the ability to determine DNA sequences is starting to outrun the ability of researchers to store, transmit and especially to analyze the data.

“Data handling is now the bottleneck,” said David Haussler, director of the center for biomolecular science and engineering at the University of California, Santa Cruz. “It costs more to analyze a genome than to sequence a genome.”

That could delay the day when DNA sequencing is routinely used in medicine. The cost of determining a person’s complete DNA blueprint is expected to fall below $1,000, but that long-awaited threshold excludes the cost of making sense of that data, which is becoming a bigger part of the total cost.

“The real cost in the sequencing is more than just running the sequencing machine,” said Mark Gerstein, professor of biomedical informatics at Yale. “And now that is becoming more apparent.”

The lower cost, along with increasing speed, has led to a huge increase in how much sequencing data is being produced. There will probably be 30,000 human genomes sequenced by the end of this year alone, up from a handful a few years ago, according to the journal Nature.

In a few cases, human genomes are being sequenced to help diagnose mysterious rare diseases and treat patients. But most are being sequenced as part of studies. The federally financed Cancer Genome Atlas is sequencing the genomes of thousands of tumors and of healthy tissue from the same people, looking for genetic causes of cancer.

And DNA is just part of the story. To truly understand biology, researchers are gathering data on the RNA, proteins and chemicals in cells. That data can be even more voluminous than data on genes. And those different types of data have to be integrated. There are giant piles of data and no way to connect them.

There is now so much raw data that it is becoming not feasible to re-analyze it. So researchers will increasingly store just the final results. In the case of human genomes, they might store even less, only the difference between a particular genome and some reference genome.
gdpawel is offline   Reply With Quote