Bioinformatics and Genomics: From Mendel to Sanger and beyond

Mendel introduced the idea of inheritance on alleles, discrete units of heritability that were later shown to reside on chromosomes. Linkage studies of family units furthered the understanding of inheritance and genetics. Three decades ago, Frederick Sanger introduced a method for elucidating the precise nucleotide sequence of DNA. He would later win his second Nobel Prize for his pioneering method, and the human genome would be sequenced using this same method over twenty years later. Recent developments in sequencing technology, dubbed "second-generation" sequencing, massively parallelize sequencing but still are surpassed by the Sanger method in reliability and read-length.

The rapid sequencing of numerous genomes has led to many important discoveries. Finding correlations between genetic variation and (complex) human disease has left much to be desired. With a few exceptions, the common disease / common variant hypothesis has not found support in GWAS studies of single nucleotide polymorphisms (SNP). One recent GWAS study, over 100,000 individuals were genetyped at ~2.6 million
SNP locations (either directly genotyped or imputed) which were then used to find correlations with four factors associated with coronary artery disease: total cholesterol (TC), LDL-C, HDL-C, and triglycerides (TG). While the study resulted in the identification of new loci associated with these risk factors, the combined power of all of these explanatory loci only account for ~25-30% of the genetic variation of the population.

What lessons are learned from this massive GWAS study? Perhaps, GWAS studies by themselves, no matter the size of the population studied or the large number of SNPs, will never be able to explain all or even most of the genetic variation of complex diseases like cardiovascular disease (CVD) or diabetes and their risk factors. Even considering this important limitation, GWAS genotyping studies give clues as to possible causitive pathways in complex disease. The use of these fast, high-throughput technologies still must be balanced with other methods of verification in wet labs. Unfortunately, as prices of sequence go down and speed goes up, the vast mound of "great ideas" for a potential target gene will continue to grow and outstrip the hard, detailed science that confirms a causative role in (and a potential fix for) the disease.

Putting the pessimism aside, an integration of new sequencing technologies will yield new insight into the possible molecular mechanisms for heritable, non-communicable disease. Creating and integrating High-throughput technologies and parallelizations for other aspects of sequencing like DNA methylation and histone spacing, packing, and modification with either methyl- or acetyl- groups.

While the generation of "great ideas" almost always outstrip the time and resources required to test them, the discrepancy between these two aspects of research grows like an ever-widening chasm. The essential ability to design and perform experiments that provide the greatest insight (the most bang for the buck) will become increasingly desirable. Intuition about data sets will still be important, but a thorough training and experience with bioinformatics tool-sets will be essential. A new breed of biologist is becoming increasingly valuable: a person who has a deep understanding of the mechanisms underlying the biological phenomena, who can translate that mechanism into data and analyze it using computational power, and who can interpret the results and translate them into real-world solutions. These three skills have always been important for scientists, but with the growing amount of data and increasing computational power to process it, the complexity of the problems require much more thought. Let the challenge begin!

Bioinformatics and Genomics

Sunday, August 29, 2010

From Mendel to Sanger and beyond

No comments:

Post a Comment