Morten Rasmussen

 

With modern genomic techniques it is now possible to measure the ecological composition of for instance the gut, the skin or the airways based on sequencing of amplified marker genes or assembly of sequences from the entire genome. In contrast to somewhat normal chemometric datasets, these data are counts, inherently compositional and further very sparse. This naturally introduces challenges in the initial quality control of the data and further in the statistical modeling. Sequencing of the genomic markup coupled with structured databases makes it possible not only to name individual bacteria, but further also to infer which enzyme they produce and which biochemical pathways that are present. This structural knowledge makes it possible to pursue integration of e.g. microbiome and metabolomics data via a bioinformatics database driven angle as opposed to a data driven chemometrics one.

 

This talk is going to introduce how microbiome data is obtained and point at some challenges in this regard. The structural knowledge obtained via sequencing opens an avenue of possibilities in terms of data modeling. Specifically, we will revisit common multivariate chemometric techniques such as Canonical Correlation Analysis for integration of microbiome and metabolomics data, and use the structural knowledge to softly enforce certain conditions on the model. 

 

Mini-CV:

Morten Rasmussen is Associate Professor at the University of Copenhagen. His research focuses on development- and application of data driven mathematical and statistical methods for the modeling of complex biological systems with special emphasis on metabolomics and the microbiome within the area of clinical epidemiology. He is the holder of the 1st Bruce Kowalski Award in Chemometrics, which was awarded to him in 2014. He has published over 50 papers in subject areas like multivariate data analysis, chemometrics and systems biology.

Lieven De Lathauwer

 

In applications that involve matrix/tensor decompositions, it is often a good idea to exploit available “structure”. We discuss various types of structure and recent advances in the exploitation.

First, the entries of factor matrices are often not completely arbitrary. For instance, factors are expected to be reasonably smooth, they may contain a number of peaks, etc. We will elaborate on possibilities that emerge if factors admit a good approximation by polynomials, rational functions, sums-of-exponentials, … The use of such models has potentially many advantages. It may allow a (significant) reduction of computational and storage requirements, so that (much) larger data sets can be handled. It may increase the signal-to-noise ratio, as noise is fitted less well by the model. Some models come with factorizations that are essentially unique, even in cases where only matrix data are available.

Second, we focus on low multilinear rank structure, i.e., we consider data tensors that have a small Tucker core. Orthogonal Tucker compression is widely used as a preprocessing step in CANDECOMP/PARAFAC/CP analysis, significantly speeding up the computations. However, for constrained CP analysis its use has so far been rather limited. For instance, in a CP analysis that involves nonnegative factors, an orthogonal compression would break the nonnegativity. We will discuss a way around this.

Third, we focus on the analysis of multi-set data, in which coupling induces structural constraints besides the classical constraints on individual factors.

Fourth, we discuss (surprising) implications for the analysis of partially sampled data / data with missing entries. 

In connection with chemometrics, aspects that we pay special attention to include: smoothness and peaks in spectra, nonnegativity of spectra and concentrations, large-scale problems, new grounds for factor uniqueness, data that are incomplete (e.g. because of scattering), and data fusion.

[1] Sidiropoulos N., De Lathauwer L., Fu X., Huang K., Papalexakis E. and Faloutsos C., "Tensor Decomposition for Signal Processing and Machine Learning", IEEE Transactions on Signal Processing, 2017, to appear.

[2] Vervliet N., Debals O., Sorber L., Van Barel M. and De Lathauwer L. Tensorlab 3.0, Available online, Mar. 2016. URL: http://www.tensorlab.net/ .

 

Mini-CV:

Lieven De Lathauwer received the Master’s degree in electromechanical engineering and the Ph.D. degree in applied sciences from KU Leuven, Belgium, in 1992 and 1997, respectively. From 2000 to 2007 he was Research Associate of the French CNRS. He is currently Professor at KU Leuven, affiliated both with the Group Science, Engineering and Technology of Kulak, and with the STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics of the Electrical Engineering Department (ESAT). His research concerns the development of tensor tools for mathematical engineering. It centers on the following axes: (i) algebraic foundations, (ii) numerical algorithms, (iii) generic methods for signal processing, data analysis and system modelling, and (iv) specific applications. He was general chair of Workshops on Tensor Decompositions and Applications (TDA 2005, 2010, 2016). In 2015 he became Fellow of the IEEE for contributions to signal processing using tensor decompositions. In 2017 he became Fellow of SIAM for fundamental contributions to theory, computation, and application of tensor decompositions. Algorithms have been made available as Tensorlab (www.tensorlab.net) (with N. Vervliet, O. Debals, L. Sorber and M. Van Barel).

URL: http://www.esat.kuleuven.be/stadius/person.php?id=22

Marieke Timmerman

 

In many experiments, data are collected on a large number of variables. Typically, the manipulations involved yield differential effects on subsets of variables and on the associated individual differences. The key challenge is to unravel the nature of these differential effects. An effective approach to achieve this goal is to analyse the data with a combined additive and bilinear model. Well-known examples are principal component analysis, reduced rank regression analysis and simultaneous component analysis (SCA). In this talk, I will show how various existing models fit into the model framework, and discuss the type of insights that can be obtained witht the different variants. I will discuss into more detail simultaneous component modeling, where the multivariate data are structured in blocks with respect to the different levels of the experimental factors. Herewith, it is important to express that the dominant sources of variance in the observed data may differ across blocks. This can be done via SCA, presuming that the structure is equal across all blocks, or clusterwise SCA, which aims at identifying both similarities and differences in structure between the blocks. As with any component analysis, (clusterwise) SCA results heavily on the scaling applied. I will discuss the basic principles relevant for scaling, yielding the tools for a rational selection of scaling for a data analytic problem at hand. To illustrate the power of the approach, I will present analysis results from real-life data, and show that insight can be obtained into multivariate experimental effects, in terms of similarities and differences across individuals. The latter is highly relevant for subtyping.

 

Mini-CV:

Marieke Timmerman (http://www.rug.nl/staff/m.e.timmerman/) is a professor in multivariate data analysis at the Heymans Institute for Psychological Research at the University of Groningen, the Netherlands. Her research focuses on the development of models for multivariate data with complex structures, to achieve an understanding of the processes underlying these data. Her research interests are data reduction methods, including multi-set models, latent variable modelling and classification.

Edwin Lughofer

 

The presenter will conceive a new paradigm in the calibration and design of chemometric models from (FT-)NIR spectra. Opposed to batch off-line calibration through the usage of classical statistical methods (such as PLSR, PCR and several extensiond) or more general machine learning based methods (such as support vector machines, neural networks, fuzzy systems), evolving chemometric models can serve as core engine for addressing the incremental updating of calibration models fully automatically in on-line or even in-line installations. Such updates may become indispensable whenever a certain system dynamics or non-stationary environmental influences cause significant changes in the process. Typically, models trained in batch off-line mode then become outdated easily, leading to severe deteriorations of their quantification accuracy, which may even badly influence the (supervision of the) whole chemical process. An approach how to update chemometric models quickly and ideally with lowest possible costs in terms of additional target measurements will be presented in this talk. It will be based on PLS-fuzzy models where the latter are trained based on the score space obtained through the latent variables. This leads to a new form of a non-linear PLSR with embedded piece-wise local predictors, having granular characteristics and even offering some interpretability aspects. The update of the models will comprise

  • Recursive parameter adaptation to adapt to permanent process changes and to increase model significance and accuracy (especially when models are off-line calibrated only on a handful of data).
  • Evolution of new model components (rules) on the fly in order to account for variations in the process such as new operations modes, system states, which requires a change in the model’s ‘non-linearity degree’.
  • Incremental adaptation of the PLS space in order to address a shift in the importance of wavelengths (on the target) over time.

In order to reduce target measurements during on-line usage, the model update will only take place upon drift (=change) alarms (induced by incremental drift indicators) and then with actively selected samples; therefore, a single-pass active learning paradigm will be exploited respecting feature space exploration. In order to omit target measurements for model adaptation completely, unsupervised adaptation strategies for fuzzy models and a new variant of PLS, termed as domain invariant PLS (di-PLS), will be demonstrated.

The talk will be concluded with two real-world applications from the chemical industry (melamine resin and viscose production), where the evolving chemometric models have been successfully installed and used; some results will be presented.

 

Mini-CV:

Edwin Lughofer received his PhD-degree from the Johannes Kepler University Linz (JKU) in 2005. He is currently Key Researcher with the Fuzzy Logic Laboratorium Linz / Department of Knowledge-Based Mathematical Systems (JKU) in the Softwarepark Hagenberg, see www.flll.jku.at/staff/edwin/.

He has participated in several basic and applied research projects on European and national level, with a specific focus on topics of Industry 4.0 and FoF (Factories of the Future). He has published around 170 publications in the fields of evolving fuzzy systems, machine learning and vision, data stream mining, chemometrics, active learning, classification and clustering, fault detection and diagnosis, quality control, predictive maintenance, including 60 journals papers in SCI-expanded impact journals, a monograph on ’Evolving Fuzzy Systems’ (Springer) and an edited book on ’Learning in Non-stationary Environments’ (Springer). In sum, his publications received 2900 references achieving an h-index of 33. He is associate editor of the international journals IEEE Transactions on Fuzzy Systems, Evolving Systems, Information Fusion, Soft Computing and Complex and Intelligent Systems, the general chair of the IEEE Conference on EAIS 2014 in Linz, the publication chair of IEEE EAIS 2015, 2016 and 2017, and the Area chair of the FUZZ-IEEE 2015 conference in Istanbul. He co-organized around 20 special issues and special sessions in international journals and conferences. In 2006 he received the best paper award at the International Symposium on Evolving Fuzzy Systems, in 2013 the best paper award at the IFAC conference in Manufacturing Modeling, Management and Control (800 participants) and in 2016 the best paper award at the IEEE Intelligent Systems Conference.

 

Peter Wentzell

 

The field of chemometrics can be considered to be relatively mature and its history is reflected in the evolution of its constituent methodologies and their application both within and outside of chemistry.  Some of these have enjoyed continuous widespread popularity, while others have been relegated to the shadows after a brief incandescence, and still others have enjoyed a rebirth for a variety of reasons.  The success or failure of a given methodology is driven by a variety of factors that include academic culture, practical need, active promotion, commercial availability, simplicity, relevance to mainstream or niche applications, demonstrated advantages over established methods, and (conversely) demonstrated redundancy with established methods.

This presentation will attempt to provide a broad view of the evolution of chemometric tools in general, with some specific exemplars relevant to the current state of the art.  In addition to bibliometric trends of various chemometric methodologies that include both those with resilience (e.g. partial least squares, PLS) and those that have been more transient (e.g. Kalman filtering), a more rigorous discussion of some methods will be considered.  These include maximum likelihood principal components analysis (MLPCA), projection pursuit analysis (PP), independent component analysis (ICA) and factor analysis (FA).  It is hoped that this will provoke a discussion on where chemometrics has been and where it is headed.

 

Mini-CV:

Peter Wentzell is a Professor in the Department of Chemistry at Dalhousie University in Halifax, Nova Scotia.  He completed his PhD with Dr. Stan Crouch at Michigan State University and carried out post-doctoral work at the University of British Columbia before taking up his current position in 1989.  He has also spent sabbaticals at the University of Washington with Dr. Bruce Kowalski (1996) and in the Biology Department at the University of New Mexico (2003).  He has been involved in Chemometrics research for more than 25 years and served as North American Editor of Chemometrics and Intelligent Laboratory Systems for eight years.  Although he has a wide range of research interests, his principal focus has been on understanding and utilizing measurement errors in multivariate analysis, as well as on the development of new tools for exploratory analysis.  His distinctions include the Eastern Analytical Symposium Award for Outstanding Achievements in Chemometrics (2015), the Journal of Chemometrics Kowalski Prize (2014) and the Dalhousie Faculty of Science Award for Excellence in Teaching (2010).

Peter Filzmoser

 

Robust methods for multiple linear regression are under development already since the 1970s. Nowadays, methods are available that are robust to a high amount of contamination, while being still highly efficient if the usual model assumptions are valid. Moreover, fast algorithms have been developed and implemented in the standard statistical software environments.

For high-dimensional data, particularly if the number of explanatory variables is higher than the number of observations, robust methods are mainly available in the context of Partial Least Squares (PLS) regression. However, these methods loose their predictive power if the high-dimensional data contain many noise variables which are not related to the response. It is desirable that their corresponding regression coefficients are zero, such that their  contribution to the prediction is suppressed. This is possible with an L1 penalty term added to the objective function, as it has been done in LASSO regression, leading to so-called sparsity of the vector of regression coefficients.

We will present a robust and sparse PLS method and compare to standard methods with simulated and real data sets. Moreover, an extension of the method to a two-group classification method which is sparse and robust will be outlined. The methods are available in the R package sprm.

 

Mini-CV:
Peter Filzmoser is full professor at the statistics department of the Vienna University of Technology. His main research interests include robust statistics, methods for compositional data analysis, statistical computing, and R. For more information, see: http://www.statistik.tuwien.ac.at/public/filz/.

Follow this link to download the First Announcement of the ICRM2017 symposium to be held 10-14th September 2017 in Berg and Dal, The Netherlands:

Download First Announcement ICRM2017

We are delighted to announce that the registration form for ICRM2017 is available.

ICRM2017 Fee and Registration

ICRM2017 greatly appreciates the support from the following sponsors. Click on a logo to visit the website of the sponsor.
 

 

Standard ICRM templates for Word or LaTeX are available for abstracts. Please email the filled out templates to our conference secretary at ICRM2017@DutchChemometricsSociety.nl. The deadline for abstract submissions is on June 1st 2017.

The scientific committee will review all abstracts and will contact you by email to inform you if the submission will be accepted. Note that only submissions from registered participants are taken into the review process. The reviews will be completed on July 1st 2017.

Please specify in the email if you want to apply for an oral (contributed speaker) or for a poster presentation. Be aware there is place for only a limited number of oral presentation in the ICRM2017 program.

Click here for a Word template: Word

Click here for a Latex template Latex