I have broad research interests and experience in bioinformatics, cancer genomics and data analytics. These research areas mainly involve developing and applying bioinformatics and computational approaches to analyse large-scale cancer datasets to uncover novel diagnostic and prognostic biomarkers. I also lead the Cancer Research UK Barts Centre Bioinformatics Core Facility.
ACSNI: An unsupervised machine-learning tool for prediction of tissue-specific pathway components using gene expression profiles. Patterns (2021) 2(6), 100270. PMID: 34179848
The genomic landscape of actinic keratoses. J Invest Dermatol (2021) 141(7):1664-1674.e7. PMID: 33482222
Applications and analysis of targeted genomic sequencing in cancer studies. Computational and Structural Biotechnology Journal (2019) 17, 1348-1359. PMID: 31762958
The genomic landscape of cutaneous SCC reveals drivers and a novel azathioprine associated mutational signature. Nature Commun (2018) 9(2) 3667. PMID: 30202019
My main research interests lie in developing and applying bioinformatics and computational approaches to analyse large-scale cancer datasets to uncover novel diagnostic and prognostic features. In particular, I am interested in applying machine learning / AI algorithms to integrate multi-omics and clinicopathological data to derive diagnostic and prognostic tools for patient stratification.
I also lead the CRUK Barts Centre Bioinformatics Core Service.
Biomedical science, especially cancer research, is increasingly data driven, as new bioanalytical techniques deliver ever more data about DNA, RNA, proteins, metabolites and the interactions between them, in the whole tissue and single-cell levels. Given the increasing amount of omics datasets (big-data), the challenges are in how to analyse large-scale datasets and interpret the results accurately and thoroughly, and to identify “driver” events and predictive biomarkers in tumour development and progression.
Our research interests include the following:
Cancer genomics and evolution
Focusing on large-scale multi-omics datasets, we develop analytic pipelines and identify novel driver events, molecular subtypes, and diagnostic / prognostic signatures in cancer development and progression based on machine learning and data integration techniques. Using bulk tissue RNA-seq data, we are also interested in investigating immune and stromal landscape and signatures for patient subgrouping and stratification. Currently we are working on multi-omics datasets of cutaneous and oesophageal squamous cell carcinoma. We also investigate the clonal evolutionary patterns of these tumours and further understand how clonal / subclonal architecture affects clinical features of patients.
Noncoding sequence variants and RNA genes in cancer
Using publicly available whole-genome, ChIP-seq and RNA-seq data, we investigate functionally important noncoding mutations and dysregulated long noncoding RNAs in pancreatic and ovarian cancer. Using big-data and bioinformatic approaches, we first identify top novel candidates that are then taken to the lab for further in vitro validation using high-through screening (e.g., STARR-seq) and CRISPR/Cas9.
Single cell analytics
We have constructed a cross-package toolkit, named IBRAP (https://github.com/connorhknight/IBRAP), that provides the most comprehensive workflow from data pre-processing to automatic annotation of cell types, and enables users to interchange analytical components and individual methods. Benchmarking metrices are provided that distinguishes pipeline performance(s), thus providing dataset-specific pipeline production for single-cell studies. Currently, we are implementing IBRAP to construct normal reference maps using publicly available single cell data.
Computational histopathology and imaging analysis using AI
Despite recent advances in understanding the molecular pathogenesis of many cancers, disease assessment is still based on clinical and histopathological staging, with few objective prognostic biomarkers. A rapid, simple and cost-effective tool that augments clinicopathologic staging and allows clinicians to stratify patients according to their risk of progression is a priority for translational research.
Currently we are developing deep learning-based resources to automatically extract core histological features from digitised whole slide images and map these to molecular and clinical features in cutaneous and oesophageal squamous cell carcinoma. We aim to create a risk stratification tool which can be incorporated into routine pathology workflow, significantly improving patient outcomes.
274 Homologous recombination deficiency scores in AK and cSCC are associated with tumor-immune phenotype Thomson J, Healy E, Strid J et al. Journal of Investigative Dermatology (2023) 143(10) s47
O20 TARGETING THE DEFECTIVE COA PATHWAY TO IMPROVE ERYTHROPOIESIS IN SF3B1-MUTANT MDS-RS PATIENTS Philippe C, Mian S, Maniati E et al. Leukemia Research (2023) 128(10) 107133
IBRAP: integrated benchmarking single-cell RNA-sequencing analytical pipeline Knight CH, Khan F, Patel A et al. Briefings in bioinformatics (2023) 24(7)
Longitudinal expression profiling identifies a poor risk subset of patients with ABC-type diffuse large B-cell lymphoma Bewicke-Copley F, Korfi K, Araf S et al. Blood Advances (2023) 7(7) 845-855
Vitamin B5 and succinyl-CoA improve ineffective erythropoiesis in SF3B1-mutated myelodysplasia Mian SA, Philippe C, Maniati E et al. Science Translational Medicine (2023) 15(7)
Germline ERCC excision repair 6 like 2 (ERCC6L2) mutations lead to impaired erythropoiesis and reshaping of the bone marrow microenvironment Armes H, Bewicke-Copley F, Rio-Machin A et al. British Journal of Haematology (2022) 199(7) 754-764
Replication stress generates distinctive landscapes of DNA copy number alterations and chromosome scale losses Shaikh N, Mazzagatti A, De Angelis S et al. Genome Biology (2022) 23(7)
Deep Multi-Omics Profiling in Cytogenetically Poor-Risk AML Rio-Machin A, Bewicke-Copley F, Zheng J et al. Blood (2022) 140(10) 1030-1032
Longitudinal Single Cell Analyses Reveal the Co-Evolutionary Dynamics of the Tumor and Microenvironment Accompanying Follicular Lymphoma Transformation Perrett M, Pickard L, Kumar E et al. Blood (2022) 140(10) 748-749
ERG activity is regulated by endothelial FAK coupling with TRIM25/USP9x in vascular patterning D'Amico G, Fernandez I, Gómez-Escudero J et al. Development (Cambridge) (2022) 149(7)
For additional publications, please click herePostdoctoral Bioinformaticians
PhD Students
Academic Clinical Fellow
Former lab members
I received my first degree in biological engineering at Shanghai Jiao Tong University. This was followed by an MSc degree of quantitative genetics and genome analysis, and a PhD in evolutionary genetics studying comparative genomics and evolution of noncoding sequences in Drosophila, both at the University of Edinburgh. I then joined Rothamsted Research as a postdoc working on plant genomics and genetic linkage mapping as part of the international Brassica rapa genome project. I moved to Barts Cancer Institute, Queen Mary University of London, as a bioinformaticist in 2010 to work on cancer genomics and biomarker discovery as part of the bioinformatics core. I became a Lecturer in Bioinformatics and group leader in 2016, and have also been leading the CRUK Barts Centre Bioinformatics Core Facility since 2018. I was promoted to Senior Lecturer in 2019.
I am Programme Director for the Cancer Genomics & Data Sciences MSc Programme at BCI, Queen Mary University of London.
Find out more about the programme.