Genomic medicine holds great promise to transform the medical profession and individualize health care. Technological advancements such as massively parallel genomic sequencing have made it possible to produce large amounts of genomic data within a reasonable timeframe and at a relatively low cost (Mardis, 2008; Horvitz and Mitchell, 2010; Koboldt et al., 2010; Kahn, 2011).
Projects such as the ClinVar and ClinGen initiatives, funded by the National Institutes of Health (NIH), are expanding our understanding of the clinical significance of genomic data through the adjudication of genomic variants and the methodical annotation of the genome (NIH Staff, 2013). Yet challenges remain in how best to interpret, reuse, and share the data (Ahalt et al., 2014; Global Alliance to Enable Responsible Sharing of Genomic and Clinical Data, 2013; Data and Informatics Working Group, NIH BD2K Initiative, 2012).
Those challenges include the need for new technologies to capture, store, and update annotations to provide critical clinical interpretations of genomic data and metadata to attribute provenance or “ownership” and the history of a given data set (e.g., biological sources, laboratory processing steps, transformation and analysis steps, estimates of validity and reliability, etc.).
Herein, we describe two solutions—CAroliNa Variant Annotation Store (CANVAS) and Annotation Bot (AnnoBot)—that together provide version-controlled annotation and metadata to aid in the clinical interpretation of genomic variant data.