Clinical potential of whole-genome data linked to mortality statistics in patients with breast cancer in the UK: a retrospective analysis


Publication: The Lancet Oncology

07 October 2025

Daniella Black, Helen Ruth Davies, Gene Ching Chiek Koh, Lucia Chmelova, Marko Cubric, Georgia Chalivelaki Chan, Andrea Degasperi, Jan Czarnecki, Ping Jing Toong, Yasin Memari, James Whitworth, Salome Jingchen Zhao, Yogesh Kumar, Shadi Basyuni, Giuseppe Rinaldi, Scott Shooter, Vladyslav Dembrovskyi, Rosie Davies, Maria Chatzou Dunford, Ellen Copson, Carlo Palmieri, Åke Borg,  John Ambrose,  Catey Bunce,  Alona Sosinsky,  Prabhu Arumugam,  Matthew Arthur Brown, Johan Staaf, Nicholas Turner,  Serena Nik-Zainal

 

Background


Breast cancer is the most frequently diagnosed cancer in women. Survival is generally considered favourable, yet some patients remain at risk of early death. We aimed to assess whether comprehensive whole-genome sequencing (WGS) linked to mortality data could add prognostic value to existing clinical measures and identify patients who might respond to targeted therapeutics.



Methods


In this integrative, retrospective analysis,  2445 breast cancer tumours were analysed  (any stage and molecular subtype) collected from 2403 patients recruited through 13 National Health Service Genomic Medicine Centres or hospitals in England affiliated to the 100 000 Genomes Project (100kGP) between 2012 and 2018. 2208 (90%) cases were linked with clinical data; mortality data were obtained for 1188 patients. Following high-depth WGS of tumour and matched normal DNA, comprehensive WGS profiling was performed, seeking driver mutations, mutational signatures, and compound algorithmic scores for homologous recombination repair deficiency (HRD), mismatch repair deficiency, and tumour mutational burden. Data from 1803 additional patients with breast cancer from three independent cohorts were used to validate various findings. To evaluate the prognostic value of WGS features,  univariable and multivariable Cox regression on data from patients was performed with stage I–III, ER-positive, HER2-negative breast cancer with a cancer-specific mortality endpoint (around 5-year follow-up).



Findings


Among 2445 tumours in the 100kGP breast cancer cohort, genomic characteristics with immediate personalised medicine potential in 656 (26·8%) was observed, including features reporting HRD (298 [12·2%] total cases and 76 [6·3%] ER-positive, HER2-negative cases), highly individualised driver events, mutations underpinning resistance to endocrine therapy, and mutational signatures indicating therapeutic vulnerabilities. 373 (15·2%) cases had WGS features with potential for translational research, including compromised base excision repair and non-homologous end-joining dependency. Structural variation burden (hazard ratio 3·9 [95 CI% 2·4–6·2]; p<0·0001), high levels of APOBEC signatures (2·5 [1·6–4·1]; p<0·0001), and TP53 drivers (3·9 [2·4–6·2]; p<0·0001) were independently prognostic of customary clinical measures (age at diagnosis, stage, and grade) in patients with ER-positive, HER2-negative breast cancer. A prognosticator was developed for ER-positive, HER2-negative breast cancer capable of identifying patients who require either increased intervention or therapy de-escalation, validating the framework in the independent Swedish Cancerome Analysis Network-Breast (SCAN-B) dataset.



Interpretation


Breast cancer genomes are rich in predictive and prognostic value. A two-step model is proposed for effective clinical application. First, the identification of candidates for targeted therapies or clinical trials using highly individualised genomic markers. Second, for patients without such features, the implementation of enhanced prognostication using genomic features alongside existing clinical decision-making factors.

The long-term effects of chemotherapy on normal blood cells

Publication: Nature Genetics

10 July 2025

Emily Mitchell, My H. Pham, Anna Clay, Rashesh Sanghvi, Nicholas Williams, Sandra Pietsch, Joanne I. Hsu, Hyunchul Jung, Aditi Vedi, Sarah Moody, Jingwei Wang, Daniel Leonganmornlert, Michael Spencer Chapman, Ellie Dunstone, Anna Santarsieri, Alex Cagan, Heather E. Machado, E. Joanna Baxter, George Follows, Daniel J. Hodson, Ultan McDermott, Gary J. Doherty, Inigo Martincorena, Laura Humphreys, Krishnaa Mahbubani, Kourosh Saeb Parsy, Koichi Takahashi, Margaret A. Goodell, David Kent, , , , &

Abstract

Several chemotherapeutic agents act by increasing DNA damage in cancer cells, triggering cell death. However, there is limited understanding of the extent and long-term consequences of collateral DNA damage in normal tissues. To investigate the impact of chemotherapy on mutation burdens and the cell population structure of normal tissue, we sequenced blood cell genomes from 23 individuals aged 3–80 years who were treated with a range of chemotherapy regimens. Substantial additional somatic mutation loads with characteristic mutational signatures were imposed by some chemotherapeutic agents, but the effects were dependent on the drug and blood cell types. Chemotherapy induced premature changes in the cell population structure of normal blood, similar to those caused by normal aging. The results show the long-term biological consequences of cytotoxic agents to which a substantial fraction of the population is exposed as part of disease management, raising mechanistic questions and highlighting opportunities for the mitigation of adverse effects.

View Publication 

 

A redefined indel taxonomy reveals insights into mutational signatures

Publication: Nature Genetics

Gene Ching Chiek Koh, Arjun Scott Nanda, Giuseppe Rinaldi, Soraya Boushaki, Andrea Degasperi, Cherif Badja, Andrew Marcel Pregnall, Salome Jingchen Zhao, Lucia Chmelova, Daniella Black, Laura Heskin, João Dias, Jamie Young, Yasin Memari, Scott Shooter, Jan Czarnecki, Matthew Arthur Brown, Helen Ruth Davies, Xueqing Zou & Serena Nik-Zainal

10 April 2025

In cancer genetics, small insertions and deletions (called InDels) have not been as widely researched as substitutions (both causes of cancer). Researchers created identical ‘CRISPR-edited’ human cell models of ones which were damaged and then replicated (damaged included mismatched repairs and replicative enzymes). The trail that led to these mutations were uncovered and current research was unable to show the cancerous mutations apart from more general mutations.

To address this, a technique called InDel was developed that was able to pick up unusual genetic sequences and very long long genetic sequences that meant they could be classified into 89 subtypes. By using the information collected in the 100K Genomes Project, 37 InDel sequences were found, 27 of these were new. In addition to this new finding, a new mechanism called PRRDetect was developed which allowed tumours to be ‘classified’ possibly having implications for immunotherapy, a way of treating cancerous tumours.

 

View Publication

Identification of plasma proteomic markers underlying polygenic risk of type 2 diabetes and related comorbidities

Publication: Nature

Douglas P. Loesch, Manik Garg, Dorota Matelska, Dimitrios Vitsios, Xiao Jiang, Scott C. Ritchie, Benjamin B Sun, Heiko Runz, Christopher D. Whelan, Ruey R. Holman, Robert J. Mentz, Filipe A. Moura, Stephen D. Wiviott, Marc S Sabatine, Miriam S Udler, Ingrid A. Gause-Nilsson, Slavé Petrovski, Jan Oscarsson, Abhishek Nag, Dirk S. Paul & Michael Inouye.

03 March 2025

Genomics can provide insight into the etiology of type 2 diabetes and its comorbidities, but assigning functionality to non-coding variants remains challenging. Polygenic scores, which aggregate variant effects, can uncover mechanisms when paired with molecular data. Here, we test polygenic scores for type 2 diabetes and cardiometabolic comorbidities for associations with 2,922 circulating proteins in the UK Biobank. The genome-wide type 2 diabetes polygenic score associates with 617 proteins, of which 75% also associate with another cardiometabolic score. Partitioned type 2 diabetes scores, which capture distinct disease biology, associate with 342 proteins (20% unique). In this work, we identify key pathways (e.g., complement cascade), potential therapeutic targets (e.g., FAM3D in type 2 diabetes), and biomarkers of diabetic comorbidities (e.g., EFEMP1 and IGFBP2) through causal inference, pathway enrichment, and Cox regression of clinical trial outcomes. Our results are available via an interactive portal (https://public.cgr.astrazeneca.com/t2d-pgs/v1/).

 

View Publication

Genome-wide characterization ofcirculating metabolic biomarkers

Publication: Nature

Minna K. Karjalainen, Savita Karthikeyan, Clare Oliver-Williams, Eeva Sliz, Elias Allara, Wing Tung Fung, Praveen Surendran, Weihua Zhang, Pekka Jousilahti, Kati Kristiansson, Veikko Salomaa, Matt Goodwin, David A. Hughes, Michael Boehnke, Lilian Fernandes Silva, Xianyong Yin, Anubha Mahajan, Matt J. Neville, Natalie R. van Zuydam, Renée de Mutsert, Ruifang Li-Gao, Dennis O. Mook-Kanamori, Ayse Demirkan, Jun Liu, China Kadoorie Biobank Collaborative Group, Estonian Biobank Research Team, FinnGen, …Johannes Kettunen

6 March 2024

Summary

Genome-wide association analyses using high-throughput metabolomics platforms have led to novel insights into the biology of human metabolism. This detailed knowledge of the genetic determinants of systemic metabolism has been pivotal for uncovering how genetic pathways influence biological mechanisms and complex diseases. Researchers present a genome-wide association study for 233 circulating metabolic traits quantified by nuclear magnetic resonance spectroscopy in up to 136,016 participants from 33 cohorts. 

View publication

Genetics of circulating inflammatory proteins identifies drivers of immune-mediated disease risk and therapeutic targets

Publication: Nature Immunology

Jing Hua Zhao, David Stacey, Niclas Eriksson, Erin Macdonald-Dunlop, Asa K Hedman et al

18 July 2023


Aberrant inflammatory responses play a role in pathogenesis of many diseases, including autoimmune conditions, cardiovascular diseases and cancers. In this study of genetic influences on inflammation-related proteins, an international team conducted a genome-wide association study of 91 plasma proteins in ~15,000 participants within the SCALLOP Consortium.

Having identified 180 gene-protein associations, they integrated with gene expression and disease genetics to provide insights into disease aetiology, implicating FGF5 in hypertension and cardiovascular disease, and lymphotoxin-α in multiple sclerosis.

The team identified both shared and distinct effects of specific proteins across immune mediated diseases, including directionally discordant functions for CD40 in rheumatoid arthritis versus multiple sclerosis and inflammatory bowel disease, and a role for CXCL5 in the aetiology of ulcerative colitis UC but not Crohns disease.

These results provide a powerful resource to understand the role of chronic inflammation in a wide range of diseases and facilitate future drug target prioritisation.

View publication

Substantial somatic genomic variation and selection for BCOR mutations in human induced pluripotent stem cells

Publication: Nature Genetics

Foad J. Rouhani, Xueqing Zou, Petr Danecek, Cherif Badja, Tauanne Dias Amarante, Gene Koh, Qianxin Wu, Yasin Memari, Richard Durbin, Inigo Martincorena, Andrew R. Bassett, Daniel Gaffney & Serena Nik-Zainal

11 August 2022


Summary

DNA damage caused by factors such as ultraviolet radiation affect nearly three-quarters of all stem cell lines derived from human skin cells, say Cambridge researchers, who argue that whole genome sequencing is essential for confirming if cell lines are usable. Read the full news story.

View publication

Refinements and considerations for trio whole-genome sequence analysis when investigating Mendelian diseases presenting in early childhood

Publication: HGG Advances

Courtney E. French, Helen Dolling, Karyn Mégy, Alba Sanchis-Juan, Ajay Kumar, Isabelle Delon, Matthew Wakeling, Lucy Mallin, Shruti, Agrawal, Topun Austin, Florence Walston, Soo-Mi Park, Alasdair, Parker, Chinthika Piyasena, Kimberley Bradbury, Sian Ellard, David H.Rowitch, LucyRaymond

24 May 2022


Summary

More than a third of severely sick babies referred for rapid whole genome sequencing received a vital genetic diagnosis. Results from the latest Cambridge genomic study supported by NIHR Cambridge BRC and NIHR BioResource, confirm rapid whole genome sequencing (WGS) as an effective early test to aid diagnosis in severely ill children. Read the full story. 

View publication

The NHS England 100,000 Genomes Project – Feasibility and utility of centralised genome sequencing for children with cancer

Publication: British Journal of Cancer

Jamie Trotman, Ruth Armstrong, Helen Firth, Claire Trayers, James Watkins, Kieren Allinson, James C. Nicholson, G. A. Amos Burke, Sam Behjati, Matthew J. Murray, Catherine E. Hook, Patrick Tarpey

22 April 2022


Summary

As part of the national 100,000 Genome Project, researchers recruited from 36 children, across 23 different solid tumour types. Whole genome sequencing (WGS) data from paired tumour (fresh-frozen tissue) and matched normal (blood) samples was analysed.  The results for each case were clinically reviewed at the Cambridge paediatric oncology Genomic Tumour Advisory Board (GTAB), and formal report of the results was written.

View publication 

A systematic CRISPR screen defines mutational mechanisms underpinning signatures caused by replication errors and endogenous DNA damage

Publication: Nature Cancer

Xueqing Zou, Gene Ching Chiek Koh, Arjun Scott Nanda, Andrea Degasperi, Katie Urgo, Theodoros I. Roumeliotis, Chukwuma A. Agu, Cherif Badja, Sophie Momen, Jamie Young, Tauanne Dias Amarante, Lucy Side, Glen Brice, Vanesa Perez-Alonso, Daniel Rueda, Celine Gomez, Wendy Bushell, Rebecca Harris, Jyoti S. Choudhary, Genomics England Research Consortium, Josef Jiricny,
William C. Skarnes & Serena Nik-Zainal

26 April 2021


Summary

A new way to identify tumours that could be sensitive to particular immunotherapies has been developed using data from thousands of NHS cancer patient samples sequenced through the 100,000 Genomes Project.  The MMRDetect clinical algorithm makes it possible to identify tumours that have ‘mismatch repair deficiencies’ and then improve the personalisation of cancer therapies to exploit those weaknesses.

View publication

© Copyright - NIHR Cambridge Biomedical Research Centre 2026