The last decades have brought about a revolution in cancer biology and clinical care. The advent of large scale, cost-efficient and high-throughput genomics approaches at the bulk and single cell levels for tens of thousands of samples and millions of single cells has enabled researchers to understand cancer progression and treatment response at an unprecedented level. Additionally, phenotypic data of tumor and microenvironmental cells such as growth rates, migration, invasion, and interaction kinetics between cells are obtained for multiple tumor types, treatments, and in vivo models. This revolution enables a precise understanding of the history of human cancer, its diversification during tumor progression, and the bottleneck effects of treatments. It has also heralded the era of precision medicine where patients’ tumors are now molecularly profiled and matched to best treatment options – may that be immunotherapy, targeted agents, radiation, chemotherapy or a combination thereof.

The advent of such datasets enables groundbreaking scientific discovery and individualized patient care. However, the complexity of human tumors at such high resolution requires the development of novel analyses to sift through the noise of Big Data and enable us to focus on important biological nodes, pathways, and networks associated with treatment response to create and choose best therapeutic options for individual patients. Approaches to analyze such datasets include statistical and bioinformatic methods. In many cases, however, application of these methodologies is limited to the identification of associations between molecular nodes and outcomes, for instance, and precludes the study of causal relationships. To make mechanistic connections between inputs and outputs in large complex systems, other approaches from applied mathematics and machine learning can be used. Such models allow for a systematic analysis of scenarios, hypotheses, and counterfactuals, and may be helpful for deriving a quantitative, mechanistic understanding of cancer treatment response. Our lab has focused on integrative analyses using approaches from applied mathematics, bioinformatics, statistics and machine learning to address important questions in cancer research and developmental biology. Examples of current projects include:

  • Copy number variation is the phenomenon in which a cell gains or loses copies of parts of its genome. Copy number variation (CNV) events can vary in size from a few thousand base pairs to whole chromosomes and even the entire genome. Genomic copy number is highly correlated with gene expression levels, therefore allowing cancer cells that acquire CNVs to access a wide variety of expression states, including invasive and drug resistant states. The timing of CNV acquisition in a tumor’s lifespan is still an open area of research, and we aim to identify if CNVs occur in bursts of several CNVs at once or if they gradually accumulate over time. By leveraging single cell DNA and single cell RNA sequencing data, we are able to analyse the patterns of CNVs found in single cancer cells across many different cancer types and determine the most likely model of evolution for each cancer type.
  • Understanding the diverse trajectories and cell fates associated with a time series of cell state transitions remains a significant challenge, particularly when dealing with lineage tracing at the single cell level using single-cell RNA sequencing (scRNAseq) data. To tackle this issue, we employ a mathematical inference approach called optimal transport analysis. This approach allows us to infer ancestor distributions and reconstruct the most probable past trajectories of different cell fates. Subsequently, we investigate these ancestor distributions to calculate comprehensive statistics of various cellular signatures along the predicted trajectories of cells that end up in distinct states.
  • Immune checkpoint blockade (ICB) is a cancer therapy to enhance the activity of tumor-killing cells in the human adaptive immune system and the ICB have shown durable remission and clinical success in various cancers. However, patient outcomes vary considerably. Several biomarkers, including tumor mutation burden, PD-L1 expression and microsatellite instability are FDA-approved for the treatment of solid malignances. Nowadays, a large amount of patient data are generated from multiple resources, such as next generation sequencing data of DNA, RNA, T cell and B cell receptors and microbiota, liquid biopsy, and clinical images. Due to the tumor-intrinsic and -extrinsic heterogeneity of the tumor immune microenvironment, these biomarkers have variable predictive power on cancers. We are developing machine learning algorithms integrating these multi-omics and multi-modal data to understand the association between cancer heterogeneity and immune resistance and patient outcomes and to identify novel targets to improve patient response and survival.
  • We are interested in how therapy schedules can be optimized in settings where cancerous cells respond to treatment by entering a drug-tolerant persister state. As a model system in which such persistence is a common phenomenon, we investigate non-small cell lung cancer. In collaboration with the Hata lab ( we design and perform experiments that allow us to develop a mechanistic mathematical description of the dynamics of persistence. This approach entails characterizing the ways in which this persister state is treatment-specific, so as to identify treatment-combinations that target a maximum number of cancerous cells. Moreover, we use this experimental data to parameterize branching process models that characterize the dynamics of how cells switch back from a persistent state to a treatment-sensitive state if they are not exposed to the treatment for some time. With the knowledge about these dynamics, we develop therapy schedules that make use of treatment pauses during which cells re-sensitize, so as to make subsequent treatment cycles more effective.
  • Cancer progression requires evasion of immune-mediated elimination. The earliest stage of ductal breast cancer, in which the myoepithelium and basement membrane encase the tumor cells, is called ductal carcinoma in situ (DCIS). DCIS is a critical point in the lifespan of a breast tumor. The events occurring in this stage dictate the tumor’s fate and whether it will progress to lethal disease. We are devising statistical approaches based on multi-modal single cell sequencing data and spatial profiling to understand the mechanisms driving this transition and to identify clinically relevant actions to stop tumors from progressing to invasive disease and the tumor microenvironment from becoming immunosuppressive.