E.5 Quality Standards of Data
Version 1.3 June 2023
download as pdf
The ICGC in its first phase overcame the major challenge of generating collections of high-quality tumor samples and committed partners and funding agencies invested substantial effort and funds to ensure genomic data were generated from the highest quality samples. In this next generation of the ICGC, clinical data is deemed to be the major challenge: obtaining, curating and harmonising large sets of detailed clinical data from a multitude of tumour types and programs globally. As there is no current standardized clinical trials data set that spans across all cancer types, data fields and values will be adopted that explicitly suit ARGO requirements. While it was intended in the first phase of ICGC to require such clinical data, in practice this has only been accomplished within a few projects. The high standards for clinical annotation are known from the outset, and projects will be required to allow for significant investment, resources and efforts to build the necessary processes to curate and submit comprehensive clinical data sets.
The Tissue and Clinical Annotation Working Group (2018-2021) has developed quality metrics for clinical data and the more recently formed Clinical and Metadata Working group, will harmonize standardisations and implementation of clinical nomenclature and data values.
POLICY: Every project will adhere to the following recommendations regarding Quality of samples:
- Tumor types should be defined using the existing international standards of the WHO (including ICD-10 and ICD-O). If novel molecular subtypes are studied, these are welcomed and should be defined with sufficient detail
- All samples are ideally reviewed by a reference pathologist.. Histological examination should be documented and reflected in the clinical data submission. Where respective high-resolution digital images are generated these are to be made available to the ICGC community where possible. The Pathology Working Group will provide guidance as this work evolves.
- Patient-matched control samples are the optimal standard, representative for the germline genome to discern “somatic” from “inherited” mutations. However, unmatched pairs are also accepted from targeted or panel sequencing used within the clinical setting. Please consult with the Management Committee regarding targeted sequencing. For solid tumors, the most appropriate type of germline material is at the discretion of the programs provided sufficient detail about the source is documented through the clinical data submission.
POLICY: Every project will adhere to the following recommendations regarding quality and submission of clinical data:
- Member programs and projects commit to submitting the Core Clinical Data set for each participant. The mandatory Core data elements are required to address clinically relevant analyses within as well as across entities. These data points constitute the critical elements of clinical correlation to allow harmonization of diverse ICGC ARGO projects, and will be required as a minimum.
- All Core data must have a valid value submitted for all fields for a clinical data submission to be classified as complete. This can be completed through programs submission of data or through a clInical data exception (see below section: Clinical Data Exceptions Process). A donor must be clinically complete before any molecular analysis files are released to program members. After molecular data and clinical data are deemed complete, the embargo period formally commences.
- In certain circumstances (such as within regulatory clinical trials, blinded studies or where other restrictions exist) programs may release partial clinical data. This data would be deemed complete under the Exceptions process as outlined below. At the time of partial data release the embargo period formally commences. If new or updated data is added, the embargo period is not reset, and the original starting time remains. More information around data release is available in E.3 Publication Policy.
Guidelines regarding the quality and submission of clinical data:
- Program leads are required to ensure that the core data elements are reviewed and any concerns projects are discussed with the Clinical and Metadata Working Group upon enrollment. The core data elements are commonly acquired in cohort-based studies (patients studied outside of clinical trials such as observational and longitudinal studies, retrospective or prospective) and clinical trials and, therefore, are in principle available. However, given the diversity, nuance and geographic spread of locations and program types, there will be circumstances where data is not applicable, missing or not available and programs may submit an exception for certain clinical data elements for review by the Clinical and Metadata Working Group.
- Acquisition of follow-up information is highly recommended on an annual basis for collection of updated treatment and outcome information. This will inform subsequent interpretation of ICGC data and clinical correlations.
- Clinical data will be submitted to ICGC ARGO using controlled vocabulary as detailed in the Data Dictionary, which has been developed with consultation from programs, and wherever possible on international standards, such as ICD (WHO), AJCC, or from widely used matrices (in particular those used the Genomic Data Commons, IARC and others) to allow co-aggregation with data from these sources. The Data Dictionary defines the clinical data model and includes rigorous validation performed as quality control steps at the time of submission
- Extended optional Data elements have been implemented that are recommended for the analysis of biological processes that are considered hallmarks of cancer etiology and progression. These data points encompass detailed lifestyle, predictive and prognostic factors, family history information and additional treatment and response data along the trajectory of individual therapies. Data sets will likely be further developed within specific tumour groups, and this extended data is encouraged to be completed by regulated clinical trials or where deeper clinical data is available.
- Exemptions may exist where a data element is not applicable to a particular tumour type. These cases must be documented appropriately through the submission process following the guidelines of the Data Dictionary.
- Ensure, where appropriate, the sustainability of the data submitted through both archiving and using appropriate identification and retrieval systems.
- Member projects and leads should facilitate a process for the demonstration of traceability of data, including Good Documentation Practices, and these be documented in the program or institutional Standard Operating Procedures (SOP).
- Further information regarding data submission is contained in the Data Management Policy, and through the ICGC ARGO Documentation site. Status of completion of Clinical Data Submission will be available to users through the individual member program dashboard.
Guidelines regarding the quality standards of samples:
- Histological examination is optimally documented and respective digital images stored and made available to those studying the given tumor entity. Specifically the degree of 1) necrosis; 2) debris; 3) inflammatory tissue; and 4) fibrosis are to be assessed.
- Standard Operating Procedures (SOPs) for freezing samples will be those established by WHO/IARC (“Common Minimum Technical Standards and Protocols for Biological Resource Centres dedicated to Cancer Research” by the World Health Organization - International Agency for Research on Cancer (WHO-IARC, working group reports Vol.2, 2007).
- Although many types of macromolecules should be isolated, priority should be given to the isolation of high quality DNA and RNA (which is also valid for some epigenomic analyses)
- Sample processing specifics are documented through the clinical data submission process.
Accepted Sample Types
The sources of cohorts of patients that would constitute ICGC-ARGO projects may include:
- Biospecimens from participants enrolled in active clinical trials;
- Analyses of banked samples from past clinical trials;
- Analyses of samples from clinically well-annotated cohorts that satisfy ICGC-ARGO clinical data requirements;
- Longitudinal cohort studies;
- Autopsy studies with detailed clinical data
- Population-based studies with detailed clinical and lifestyle data
- Real World Data acquired through health systems.
Clinical Data Exceptions Policy
As clinical data forms a central part of the ICGC ARGO mission; its management and governance is critical to ensure the balance between maximum engagement and program requirements. Due to the comprehensive nature of the ARGO clinical data model, it is accepted that some groups will need margin for exceptions, particularly in cases involving retrospective data, disease specific circumstances, availability or inaccessibility of data. ICGC has a standard set of criteria and a consistent approach to assessing applications for exceptions as outlined below.
Within the Core Data set there are critical elements that are not subject to exceptions. These include key donor attributes and clinical endpoints such as treatment response and survival data, and fields that have technical limitations and are tied to validation rules. As these elements are vital to answering ARGO’s research questions and maintaining quality control, cases missing this information would be excluded.
Exceptions are rare and granted on a case by case basis, thresholds may exist due to technical capability.
Programs submit a form containing a standard set of detail surrounding the rationale for the request and numbers involved- requests are then reviewed and discussed centrally with clinical expertise involved.
Exceptions that are related to inherent tumour type conditions- ie tumour grade in blood cancers, will be built into the validation rules and these will not be required to be submitted as exceptions.
Projects which are prospective in nature or are regulatory grade clinical trials are expected to meet the requirements for all Core clinical data elements and are discouraged from submitting exceptions.
There are 2 main types of exceptions, 1) Program Level exceptions and 2) Donor level exceptions. Some programs may require one type of exception, some may require both.
Program Level Exceptions
Program Level exceptions are submitted when a data element is missing, unavailable or not applicable across all donors in the program. Using program wide exceptions, programs are then permitted to submit a non-permissible variable for these fields, such as unknown or not applicable.
Noting there are some fields within the clinical data model where exceptions are not permitted. These are fields where these fields are tied to validation that enforce data integrity checks.
Donor Level Exceptions
Program Level exceptions are submitted when a data element is missing, unavailable or not applicable on only specific donors within the program. Under donor level exceptions, programs are then permitted to submit a non-permissible variable for these fields, such as unknown or not applicable.
Process For Submitting Exceptions:
- Programs submit the Clinical Data Exceptions Form prior to commencing data submission. Instructions are provided on the form; specifically programs are to outline the individual field/s they are requesting, the reasons for the exception and the alternative value to be documented. Noting the form may be subject to revisions as the exceptions process evolves. Please be sure to refer back to this policy when submitting data.
- Requests reviewed by the Clinical and Metadata working group, discussed and a decision reached. Review will consider the type of program (questions being asked, retrospective vs prospective data etc), value of dataset, rationale for exemption (if legitimate for tumour type, country, etc) and the potential impact to overall data set if field not provided.
- Outcomes communicated to applicants with a full justification for the decision.
- Approval forwarded on to the DCC where technical edits are put in place to allow the exemption. This is logged and documented.
- DCC provides confirmation to the program/applicant to allow for data submission.