Machine Learning Coffee seminar: "Correlation-Compressed Direct Coupling Analysis" Erik Aurell, KTH-Royal Institute of Technology,

2017-11-27 09:15:00 2017-11-27 10:00:00 Europe/Helsinki Machine Learning Coffee seminar: "Correlation-Compressed Direct Coupling Analysis" Erik Aurell, KTH-Royal Institute of Technology, Weekly seminars held jointly by Aalto University and the University of Helsinki. http://cs.aalto.fi/en/midcom-permalink-1e7cdf73446609ecdf711e78144cd3552738ce58ce5 Gustaf Hällströmin katu 2B, Helsinki

Weekly seminars held jointly by Aalto University and the University of Helsinki.

27.11.2017 / 09:15 - 10:00
seminar room Exactum D122, Gustaf Hällströmin katu 2B, Helsinki, FI

Helsinki region machine learning researchers will start our week by an exciting machine learning talk. The aim is to gather people from different fields of science with interest in machine learning. Porridge and coffee is served at 9:00 and the talk will begin at 9:15. The venue for this talk is seminar room Exactum D122, Kumpula.

Subscribe to the mailing list where seminar topics are announced beforehand.

Correlation-Compressed Direct Coupling Analysis

Erik Aurell
Professor of Biological Physics, KTH-Royal Institute of Technology

Abstract:

Direct Coupling Analysis (DCA) is a powerful tool to find pair-wise dependencies in large biological data sets. It amounts to inferring coefficients in a probabilistic model in an exponential family, and then using the largest such inferred coefficients as predictors for the dependencies of interest. The main computational bottle-neck is the inference. As described recently by Jukka Corander in this seminar series DCA has be done on bacterial whole-genome data, at the price of significant compute time, and investment in code optimization.

We have looked at if DCA can be speeded up by first filtering the data on correlations, an approach we call Correlation-Compressed Direct Coupling Analysis (CC-DCA). The computational bottle-neck then moves from DCA to the more standard task of finding a subset of most strongly correlated vectors in large data sets. I will describe results obtained so far, and outline what it would take to do CC-DCA on whole-genome data in human and other higher organisms.

This is joint work with Chen-Yi Gao and Hai-Jun Zhou, available as arXiv:1710.04819.

**

See the next talks at the seminar webpage.

Please spread the news and join us for our weekly habit of beginning the week by an interesting machine learning talk!

Welcome!