E.6 Genome Analysis
Version 1.1 April 2023
download as PDF
The goal of ICGC ARGO is to advance discovery; therefore, nucleic acid analysis must go significantly beyond the assessment of limited gene sets using panels such as those currently performed in most clinical diagnostic laboratories. Cognizant of the tractability of biospecimen quantity, quality and fixation methods, minimum requirements are one of the approaches detailed below, complemented where appropriate with methodologies listed in Boxes 4 and 5 below. The exact composition of analyses for each individual project may be defined through discussion with the ICGC ARGO Management Committee. Given that technologies are rapidly evolving and that novel platforms are in development, it will be necessary to review the recommendations on a regular basis. It is also preferable to avoid being unduly prescriptive about study designs and platform choices, as it is conceivable that several different approaches could be appropriate to deliver the same ultimate goals.
It is critical to the overall success of ICGC that datasets obtained from one class of cancer (generated using a specific technology platform) are comparable to the datasets obtained from other programs and projects (even if generated using a different technology). It is particularly important, therefore, that members contribute sequencing data of the highest quality, including sufficient sequencing depth to detect a high proportion of somatic mutations in each of the samples interrogated. ICGC members are expected to deliver data derived from a combination of the following types of genomic sequencing analyses:
- Whole genome sequencing (see Box 1)
- Whole exome sequencing (see Box 2)
- Clinical genome targeted sequencing (see Box 3)
- Transcriptome sequencing (see Box 4)
Paired samples of tumour and normal are expected for analysis types 1 and 2, however un-paired (tumour only) samples are accepted with targeted sequencing data sets. The exact composition of analysis types per project will be defined through discussion with the ICGC ARGO Management Committee, however, each case should have a transcriptome to allow broad pooled data analyses, and a discovery genome sequencing approach.
It is envisaged that mutational catalogues for each tumour type or subtype will include the full range of somatic variant types, including single base substitutions, insertions, deletions, copy number changes, and structural variants including translocations and gene disrupting events.
Box 1. Guidelines on whole genome sequencing analyses
The aim of whole genome sequencing is to capture high-quality information on all variant classes across all genomic features, including small variants, gene and chromosome level copy number alterations, structural variants and mutational signatures. Sequence coverage calculations should be based on a target of ≥90% of somatic alterations identified in each sample. The required coverage should take into account tissue cellularity and, where applicable to the specific research project, detection of subclonal somatic variants. Experience indicates that at least 60-fold median coverage of the tumour sample will be required (with minimum 30-fold median coverage of the germline sample), with higher tumour coverage required for samples where the tumour cellularity is low.
Box 2. Guidelines on whole exome sequencing
Sequencing should include all coding exons and splice junctions along with other genomic regions of biological interest (e.g. regulatory regions). Sequence coverage calculations should be based on a target of ≥90% of somatic alterations identified in each sample.
Box 3: Guidelines on clinical genome targeted sequencing
Where whole genome sequencing is either unaffordable or not feasible (for example the analysis of formalin-exposed DNA, where tissue availability is limited or for low cellularity tumours), then a well-designed targeted capture sequencing assay is the best option. This approach is able to report on all variant classes (including small coding and non-coding variants, copy number alterations and structural variants) and is compatible with small amounts of FFPE-derived DNA. The objective should be to capture ≥90% of the relevant genomic information for each tumour type sequenced. For targeted capture sequencing, either paired tumour-normal sequencing or tumour-only sequencing is acceptable, provided that in the latter situation, analysis pipelines contain sufficient steps to filter the majority of germline events. The ICGC Management Committee should be consulted to discuss assay specification, metadata requirements and analysis pipelines, to ensure they meet the requirements of ICGC ARGO.
Box 4: Guidelines on transcriptome sequencing
It is recommended that transcriptome analysis should include expression of all protein coding genes, with consideration given to the coverage of important non-coding features such as microRNAs. Analysis of the transcriptome may be more critical in some cancer types than in others, for example in breast and colorectal cancer where clinically useful classification systems have been described. Various technology platforms are available, including commercial options, some of which are compatible with RNA obtained from formalin-exposed tissue.
E.6.1 Germline analysis
For whole genome and whole exome sequencing, analysis of a tumour-normal pair remains the state-of-the-art for filtering germline polymorphisms. For targeted capture sequencing, either paired tumour-normal sequencing or tumour-only sequencing is acceptable, provided that in the latter situation, analysis pipelines contain sufficient steps to filter the majority of germline events.
E.6.2 Quality control
The ICGC consortium is dependent on a high-quality genomic dataset. Members are encouraged to put in place processes and mechanisms to ensure the quality of the samples and data provided, covering both the laboratory and sequencing steps. There are routine quality control workflows applied at various stages of the genomic data processing, including at pre-alignment, post-alignment and post-variant-calling stages. These workflows identify samples and analyses that fail quality thresholds. ARGO programs are expected to regularly review the Quality Control reports provided and communicate with the Data Coordination Center the results of the review in a timely manner. All genomic data which is released must pass all Quality Control thresholds, if there are no actions on the failed analyses, they will be held from moving through the embargo release stages.
E.6.3 Metadata Requirements
It is mandatory that standard molecular data is submitted with valid metadata. Details on metadata requirements for each data type can be found on the ICGC ARGO Docs Site.