On June 6, 2016, the NCI Genomic Data Commons was launched. You can find some of the details in the NCI's press release:
"With the GDC, NCI has made a major commitment to maintaining long-term storage of cancer genomic data and providing researchers with free access to these data,” said NCI Acting Director Douglas Lowy, M.D. “Importantly, the explanatory power of data in the GDC will grow over time as data from more patients are included, and ultimately the GDC will accelerate our efforts in precision medicine."
The GDC website is gdc.nci.nih.gov and over a thousand unique visitors use it each day.
The GDC was developed at the Center for Data Intensive Science (CDIS) at the University of Chicago in collaboration with the Ontario Institute for Cancer Research (OICR) over a period of two years. It hosts approximately 5 PB of genomic and associated clinical data for the research community.
The data is harmonized in the sense that all of it is uniformly processed with a common set of bioinformatics pipelines so that the data may be compared more meaningfully. When different pipelines are used by different bioinformatics groups in different centers to compute genomic variants, it can be quite challenging to remove the batch effects that confound the analysis when the different variant files are analyzed together.
The GDC hosts what is probably one of the largest harmonized genomic datasets in the world.Share