The database of Hartwig Medical Foundation is the world’s largest database of metastatic tumor data, obtained through Whole Genome Sequencing (WGS), in combination with clinical data.The types of tumor in the database, and the percentage, represent the incidence of those solid types of tumor in The Netherlands. Apart from the molecular data also tumor characteristics are gathered in the database, as well as the data and results of the treatment. The data in the database are of a high quality because a completely uniform process is applied to all samples from biopsy to bio-informatics analysis.
The DNA from a biopsy of the tumor and a tube of blood of every patient are analysed through Whole Genome Sequencing (WGS). Germ-line variants are determined for the blood sample and by comparing the blood with the biopsy insight is gained into the tumor-specific variants. The tumor-specific variants include single nucleotide variants (SNVs), insertions and deletions (INDELs), structural variants (SVs) and copy number alterations (CNAs) which have been identified with state-of-the-art open source bio-informatics tools. The DNA of the blood is read on average 30 to 40 times, and the DNA of the biopsy on average 90 to 120 times.
It is normal practice to take biopsies immediately prior to treatment, but for a limited number of patients a second biopsy is also available which was taken after the initial treatment.
More information regarding the technical details and performance of WGS analysis, sample requirements and an overview of all validation experiments can be found the Hartwig Medical OncoAct Technical Information document.
All code used by Hartwig Medical Foundation for her IT-pipeline (bioinformatic analyses) can be accessed on Github: # Code repository with pipeline code
# Code repository with tools developed in-house.
The below figure represents the distribution of tumor types in the database.
The database currently contains WGS data of around 4,000 metastatic tumor samples distributed among a broad range of tumor types. In addition to the WGS data, the database contains a large amount of clinical data. Access to data in the database can be acquired by a formal Data Access Request.
A subset of data (somatic variants of 2,399 samples) extracted from the database is accessible in a graphic interface (portal). This subset corresponds to the data used in the Priestley et al pan-cancer paper in Nature (pre-print available). The portal allows the user to browse through the data at different levels (see the example queries on the right side of the portal webpage).