E.5 Quality Standards of Data
Version 1.1 July 2020
download as pdf
The ICGC in its first phase overcame the major challenge of generating collections of high-quality tumour samples and committed partners and funding agencies invested substantial effort and funds to ensure genomic data were generated from the highest quality samples. In this next generation of the ICGC, clinical data is deemed to be the major challenge: obtaining, curating and harmonising large sets of detailed clinical data from a multitude of tumour types and programs globally. As there is no current standardized clinical trials data set that spans across all cancer types, data fields and values will be adopted that explicitly suit ARGO requirements. While it was intended in the first phase of ICGC to require such clinical data, in practice this has only been accomplished within a few projects. The high standards for clinical annotation are known from the outset, and projects will be required to allow for significant investment, resources and efforts to build the necessary processes to curate and submit comprehensive clinical data sets.
The Tissue and Clinical Annotation Working Group has developed sample acquisition and quality metrics for clinical data and will harmonize standardisations for data including clinical nomenclature and data values. Existing policies will continue to be adopted regarding standards and quality of samples, with modifications to reflect ICGC-ARGO.
POLICY: Every project will adhere to the following recommendations regarding Quality of samples:
- Tumour types should be defined using the existing international standards of the WHO (including ICD-10 and ICD-O). If novel molecular subtypes are studied, these should be defined with sufficient detail.
- All samples will have to be reviewed by two or more reference pathologists. This assessment will need to be performed on stained sections of the very same tissue piece from which biomolecules will be purified. Histological examination has to be documented and respective high-resolution digital images have to be stored and made available i) to those studying the given samples and ii) on a dedicated web-page for open access. The Molecular Pathology Working Group will provide guidance.
- Patient-matched control samples, representative for the germline genome, are mandatory to discern “somatic” from “inherited” mutations. For solid tumours, the mononuclear cell fraction from peripheral blood is the ideal source, while for hematological malignancies skin biopsies or (lymphocytes from patients in remission) are recommended.
POLICY: Every project will adhere to the following recommendations regarding quality and submission of clinical data:
- Member programs and projects commit to submitting the Mandatory Clinical Data set for each participant. The mandatory data elements are required to address clinically relevant analyses within as well as across entities. These data points constitute the critical elements of clinical correlation to allow harmonization of diverse ICGC-ARGO projects, and will be required as a minimum. All of these data points are commonly acquired in cohort-based studies (patients studied outside of clinical trials such as observational and longitudinal studies, retrospective or prospective) and clinical trials and, therefore, are in principle available. Project leads are required to ensure that projects can meet the standard for the Mandatory data points and missing or incomplete data points need to be well-justified and approved by the DCC.
- Further information regarding data submission is contained in the Data Management Policy, and through the ICGC ARGO Documentation site. Status of completion of Clinical Data Submission will be available to users through the individual member program dashboard.
Box 1. Guidelines regarding the quality and submission of clinical data:
- Acquisition of follow-up information is highly recommended on an annual basis for collection of updated treatment and outcome information. This will inform subsequent interpretation of ICGC data and clinical correlations.
- Clinical data will be submitted to ICGC ARGO using controlled vocabulary as detailed in the Data Dictionary, which has been developed with consultation from programs, and wherever possible on international standards, such as ICD (WHO), AJCC, or from widely used matrices (in particular those used the Genomic Data Commons, IARC and others) to allow co-aggregation with data from these sources. The Data Dictionary defines the clinical data model and includes rigorous validation performed as quality control steps at the time of submission.
- Generation of an Extended Data Set is under way consisting of additional variables that are recommended for the analysis of biological processes that are considered hallmarks of cancer etiology and progression. These data points will encompass detailed lifestyle, predictive and prognostic factors, family history information and additional treatment and response data along the trajectory of individual therapies. Data sets will likely be developed within specific tumour groups, and this extended data is encouraged to be completed by regulated clinical trials or where deeper clinical data is available.
- All Core data must have a valid value submitted for all fields for a clinical data submission to be classified as complete. A donor must be clinically complete before any molecular analyze files are released to program members. Specifically;
- A donor must have a donor file submitted with all core fields provided.
- A donor must have at least one primary diagnosis with all core fields provided.
- A donor must have at least one tumour and one normal specimen submitted.
- For each registered specimen, a donor must have all specimen core fields provided.
- Exemptions may exist where a data element is not applicable to a particular tumour type. These cases must be documented appropriately through the submission process following the guidelines of the Data Dictionary.
- Ensure, where appropriate, the sustainability of the data submitted through both archiving and using appropriate identification and retrieval systems.
- Member projects and leads should facilitate a process for the demonstration of traceability of data, including Good Documentation Practises, and these be documented in the program or institutional Standard Operating Procedures (SOP).
Box 2. Guidelines regarding the quality standards of samples:
- Histological examination will have to be documented and respective digital images be stored and made available to those studying the given tumour entity. Specifically the degree of 1) necrosis; 2) debris; 3) inflammatory tissue; and 4) fibrosis are to be assessed.
- Standard Operating Procedures (SOPs) for freezing samples will be those established by WHO/IARC (“Common Minimum Technical Standards and Protocols for Biological Resource Centres dedicated to Cancer Research” by the World Health Organization - International Agency for Research on Cancer (WHO-IARC, working group reports Vol.2, 2007).
- As a basis for the exchange of tissue specimens between countries with different national regulations that need to be respected, a coordinating rule has been formulated on the basis of the ‘home-country principle’.
- Although many types of macromolecules should be isolated, priority should be given to the isolation of high quality DNA (which is also valid for some epigenomic analyses).
- The quality of the isolated classes of macromolecules needs to be controlled by standardized procedures used by all members of the ICGC. The choice of these tests will be defined by an ICGC working group.
- Controls for transcriptomic and epigenomic analyses may require site-matched tissue control samples. This aspect must be dealt with in the recommendations of the tumour-specific expert panel.