E24: Advanced Chemometrics without Equations (or Hardly Any)

One-Day Course 
Date to be announced; 8:30am – 5:00pm

Dr. Neal Gallagher, Eigenvector Research, Wenatchee, WA


Advanced Chemometrics without Equations (ACWE) takes up where our popular Chemometrics without Equations (CWE) course leaves off. It is assumed that participants will have a working knowledge of Principal Components Analysis (PCA) and regression with Partial Least Squares (PLS). ACWE concentrates on improving chemometric models via 1) advanced preprocessing methods and 2) variable selection and 3) semi-automatic machine learning (semi-auto ML) for model generation.

The critical difference between inadequate and successful chemometric models is often data preprocessing, i.e. what is done to the data before using PCA, PLS etc. The goal of preprocessing is to remove variation not related to the problem of interest so that the variation of interest is more evident and can be more easily modeled. The variables selected, e.g. spectral regions, can also greatly affect the success of the application. ACWE focuses on advanced preprocessing methods for improving models. Variable selection techniques are also considered, alone with the effect of preprocessing and variable selection on robustness of the final models. The course concludes with an introduction to semi-auto ML which, with the help of the user, generates and tests many model preprocessing and variable selection options and presents the user with a number of candidate models.

Advanced Chemometrics Without Equations (or Hardly Any) is designed for those who wish to explore the problem-solving power of chemometric tools but are discouraged by the high level of mathematics found in many software manuals and texts. Course emphasis is on proper application and interpretation of chemometric methods as applied to real-life problems. The objective is to teach in the simplest way possible so that participants will be better chemometrics practitioners and managers.

1. Introduction
     a. Brief review of PCA 
     b. Brief review of PLS regression
2. Advanced Preprocessing 
   a. What are the goals of preprocessing?
b. Mean- and median-centering, autoscaling
c. Normalization and standard normal variate
d. Savitsky-Golay and filtering
e. Generalized least squares weighting (GLS)
f. Multiplicative scatter correction (MSC)
g. Extended multiplicative scatter correction (EMSC)
3. Variable Selection
   a. Why do variable selection?
b. Knowledge based selection
c. Model based, e.g. on loadings
d. Interval PLS (iPLS)
4. Semi-Auto ML for Model Generation
   a.  Why automatic model generation?
b.  The Semi-auto ML work flow
c.  Evaluating candidate models: overfit vs. prediction error
d.  Other considerations: complexity, robustness
e.  Final model selection

Dr. Neal B. Gallagher, PLS_Toolbox co-author and co‐founder of Eigenvector Research, Inc., holds a doctorate in Chemical Engineering and has experience in a wide variety of applications spanning chemical process monitoring, hyperspectral image analysis, anomaly detection, quantification and classification, regression modeling and analytical instrumental development. He has extensive teaching experience including Eigenvector University and dozens of chemometric courses.