E.5 Quality Standards of Data
download as pdf
The ICGC in its first phase overcame the major challenge of generating collections of high-quality tumor samples and committed partners and funding agencies invested substantial effort and funds to ensure genomic data were generated from the highest quality samples. In this next generation of the ICGC, clinical data is deemed to be the major challenge: obtaining, curating and harmonising large sets of detailed clinical data from a multitude of tumour types and programs globally. As there is no current standardized clinical trials data set that spans across all cancer types, data fields and values will be adopted that explicitly suit ARGO requirements. While it was intended in the first phase of ICGC to require such clinical data, in practice this has only been accomplished within a few projects. The high standards for clinical annotation are known from the outset, and projects will be required to allow for significant investment, resources and efforts to build the necessary processes to curate and submit comprehensive clinical data sets.
The Tissue and Clinical Annotation Working Group has developed sample acquisition and quality metrics for clinical data and will harmonize standardisations for data including clinical nomenclature and data values. Existing policies will continue to be adopted regarding standards and quality of samples, with modifications to reflect ICGC ARGO.
POLICY: Every project will adhere to the following recommendations regarding Quality of samples:
- Tumor types should be defined using the existing international standards of the WHO (including ICD-10 and ICD-O). If novel molecular subtypes are studied, these should be defined with sufficient detail
- All samples will have to be reviewed by two or more reference pathologists. This assessment will need to be performed on stained sections of the very same tissue piece from which biomolecules will be purified. Histological examination has to be documented and respective high-resolution digital images have to be stored and made available i) to those studying the given samples and ii) on a dedicated web-page for open access. The Molecular Pathology Working Group will provide guidance
- Patient-matched control samples, representative for the germline genome, are mandatory to discern “somatic” from “inherited” mutations. For solid tumors, the mononuclear cell fraction from peripheral blood is the ideal source, while for hematological malignancies skin biopsies or (lymphocytes from patients in remission) are recommended.
POLICY: Every project will adhere to the following recommendations regarding quality and submission of clinical data:
- Member programs and projects commit to submitting the Mandatory Clinical Data set for each participant. The mandatory data elements are required to address clinically relevant analyses within as well as across entities. These data points constitute the critical elements of clinical correlation to allow harmonization of diverse ICGC ARGO programs, and will be required as a minimum. All of these data points are commonly acquired in cohort-based studies (patients studied outside of clinical trials such as observational and longitudinal studies, retrospective or prospective) and clinical trials and, therefore, are in principle available. Project leads are required to ensure that projects can meet the standard for the Mandatory data points and missing or incomplete data points need to be well-justified and approved by the DCC
- Further information regarding data submission is contained in the Data Management Policy. Status of completion of Clinical Data Submission will be available to users through the individual member project/program dashboard.
Box 1. Guidelines regarding the quality and submission of clinical data:
- Acquisition of follow-up information is highly recommended on an annual basis for collection of updated treatment and outcome information. This will inform subsequent interpretation of ICGC data and clinical correlations
- Clinical data will be submitted to ICGC ARGO using vocabulary developed wherever possible on international standards, such as ICD (WHO), AJCC, or from widely used matrices (in particular those used the Genomic Data Commons, IARC and others) to allow co-aggregation with data from these sources
- Generation of an Extended Data Set has been proposed consisting of additional variables that are recommended for the analysis of biological processes that are considered hallmarks of cancer etiology and progression. These data points will encompass detailed lifestyle, predictive and prognostic factors, family history information and additional treatment and response data along the trajectory of individual therapies. This data set is encouraged to be completed by regulated clinical trials or where deeper clinical data is available
- Use of the value “Not applicable” should be avoided where data points are absent or incomplete. Its use should be limited to instances where a data variable is not routinely reported in the disease setting
- Ensure, where appropriate, the sustainability of the data submitted through both archiving and using appropriate identification and retrieval systems
- Member projects and leads should facilitate a process for the demonstration of traceability of data, including Good Documentation Practises, and these be documented in the program or institutional Standard Operating Procedures (SOP)
Box 2. Guidelines regarding the quality standards of samples:
- Histological examination will have to be documented and respective digital images be stored and made available to those studying the given tumor entity. Specifically the degree of 1) necrosis; 2) debris; 3) inflammatory tissue; and 4) fibrosis are to be assessed
- Standard Operating Procedures (SOPs) for freezing samples will be those established by WHO/IARC (“Common Minimum Technical Standards and Protocols for Biological Resource Centres dedicated to Cancer Research” by the World Health Organization - International Agency for Research on Cancer (WHO-IARC, working group reports Vol.2, 2007)
- As a basis for the exchange of tissue specimens between countries with different national regulations that need to be respected, a coordinating rule has been formulated on the basis of the ‘home-country principle’
- Although many types of macromolecules should be isolated, priority should be given to the isolation of high quality DNA (which is also valid for some epigenomic analyses)
- The quality of the isolated classes of macromolecules needs to be controlled by standardized procedures used by all members of the ICGC. The choice of these tests will be defined by an ICGC working group
- Controls for transcriptomic and epigenomic analyses may require site-matched tissue control samples. This aspect must be dealt with in the recommendations of the tumor-specific expert panel