PhD opportunity on machine learning of multi-omics data (past)

We are advertising a BBSRC/EASTBIO PhD project on Machine Learning of multi-omics data. This project is joint with Guido Sanguinetti, please share with anyone who might be interested.

To apply, first email Edward Wallace for more information! More application details here. Application deadline was 4th July 2018 - contact us if you’re interested in anything similar!

This funding will only be applicable for UK/EU students.

Model-based machine learning of multi-omics data

What scientific question will you investigate?

How do cells change their gene expression to respond to a changing environment? How do we turn massive “multi-omics” data - measurements of many different kinds of molecular states in cells - to produce an accurate quantitative picture of changing gene expression patterns? This PhD project will develop artificial intelligence and machine learning methods to quantify multi-omics data, and apply them to sequencing datasets to understand how fungal cells dynamically regulate RNA expression and processing.

The project necessarily addresses the key technical problem of normalization. How do you compare counts of molecules per cell between two very different groups of cells? For example, the number of messenger RNA molecules per cell varies hugely in different growth states of the fungi including the pathogen Cryptococcus neoformans. Current methods, that assume that most RNA molecules don’t change in count, cannot accurately detect this variation. This project will develop rigorous methods to compare mRNA counts across growth states using external reference “spike-in” whole cells and RNAs.

How do you compare different molecular states in the same group of cells? For example, we have measurements of RNA in different conditions, and also of a sub-population of RNA that is regulated by a specific protein, in budding yeast Saccharomyces cerevisiae. The project will develop quantitative models of the RNA-protein interactions, and apply them to these measurements to understand how distinct RNAs are regulated as conditions change.

What training will you receive?

You will receive expert training in machine learning, bayesian modeling, bioinformatics/next-generation sequencing, and RNA biology. Your project will develop fundamental data science skills, and you will have the opportunity to take short courses to build other specific skills as needed.

You will have the opportunity to work with experimentalists in the Wallace lab to design new experiments to test the results of your computational work.

You will complete an industrial placement, spending 3 months working with scientists at a company to apply machine learning methods to their sequencing data.

What could you do afterwards?

The completion of this project will build the skills to tackle a range of problems in quantitative biology and beyond. There is huge demand for people who can combine theoretical and practical insights to make sense of big data. You will be particularly well-equipped to tackle analogous quantitative questions in biology, extending beyond the gene expression questions directly addressed towards single-cell sequencing, proteomics/metabolomics, and microbiome research.

What kind of student would fit the project?

We are seeking someone with a strong interest in developing models that bring insight into quantitative biology. This is an interdisciplinary project that brings together ideas from theoretical statistics/machine learning, bioinformatics/programming, and gene expression/RNA biology; it would be sensible to have a strong background in one of these, and demonstrable interest in the other two.

References

Growth Rate-Dependent Global Amplification of Gene Expression. Niki Athanasiadou, Benjamin Neymotin, Nathan Brandt, Darach Miller, Daniel Tranchina, David Gresham. https://doi.org/10.1101/044735

Spike-in quantification with synthetic RNAs, not with whole cells.

BASiCS: Bayesian Analysis of Single-Cell Sequencing Data. Catalina A. Vallejos, John C. Marioni, Sylvia Richardson. https://doi.org/10.1371/journal.pcbi.1004333

Spike-in for single-cells, again with synthetic RNAs.

Quality control of transcription start site selection by nonsense-mediated-mRNA decay. Christophe Malabat, Frank Feuerbach, Laurence Ma, Cosmin Saveanu, and Alain Jacquier, https://doi.org/10.7554/eLife.06722

Whole cell spike-in to compare yeast mutants in a RNA decay pathway.

Kinetic CRAC uncovers a role for Nab3 in determining gene expression profiles during stress. Rob van Nues, Gabriele Schweikert, Erica de Leau, Alina Selega, Andrew Langford, Ryan Franklin, Ira Iosub, Peter Wadsworth, Guido Sanguinetti, Sander Granneman. https://doi.org/10.1038/s41467-017-00025-5

RNA-protein binding measurements in a stress timecourse, statistical modeling.

Modeling and analysis of RNA-seq data: a review from a statistical perspective. Wei Vivian Li, Jingyi Jessica Li. https://arxiv.org/abs/1804.06050

Review of statistics of RNA-seq.