Machine Learning Coffee seminar: "Learning of Ultra High-Dimensional Potts Models for Bacterial Population Genomics" Jukka Corander, UH & UiO

2017-11-13 09:15:00 2017-11-13 10:00:00 Europe/Helsinki Machine Learning Coffee seminar: "Learning of Ultra High-Dimensional Potts Models for Bacterial Population Genomics" Jukka Corander, UH & UiO Weekly seminars held jointly by Aalto University and the University of Helsinki. http://cs.aalto.fi/en/midcom-permalink-1e7b2475934f760b24711e79d04899e80531a2f1a2f Gustaf Hällströmin katu 2B, 02150, Helsinki

Weekly seminars held jointly by Aalto University and the University of Helsinki.

13.11.2017 / 09:15 - 10:00
seminar room Exactum D122, Gustaf Hällströmin katu 2B, 02150, Helsinki, FI

Helsinki region machine learning researchers will start our week by an exciting machine learning talk. The aim is to gather people from different fields of science with interest in machine learning. Porridge and coffee is served at 9:00 and the talk will begin at 9:15. The venue for this talk is seminar room Exactum D122, Kumpula.

Subscribe to the mailing list where seminar topics are announced beforehand.

Learning of Ultra High-Dimensional Potts Models for Bacterial Population Genomics

Jukka Corander
Professor of Statistics, University of Helsinki and University of Oslo

Abstract:

The potential for genome-wide modeling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has earlier been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 10000-100000 polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here we introduce a novel inference method (SuperDCA) which employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 100000 polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA thus holds considerable potential in building understanding about numerous organisms at a systems biological level.

**

See the next talks at the seminar webpage.

Please spread the news and join us for our weekly habit of beginning the week by an interesting machine learning talk!

Welcome!