E20: Machine Learning for Chemometricians: ANNs, SVMs, XGBoost and other Non-linear Methods for Calibration and Classification

One-Day Course 
Date to be announced; 8:30am – 5:00pm

Dr. Barry Wise, Eigenvector Research, Manson, WA
Dr. Donal O’Sullivan, Eigenvector Research, Manson, WA

Linear methods, such as PLS regression, work in a wide range of problems of chemical interest, but there are times when the relationships between variables are complex and require non-linear modeling methods. This course focuses on a number of methods that have shown great utility in modeling chemical systems. The course starts with a discussion on linearizing transforms. It is then shown how Locally Weighted Regression (LWR) and Hierarchical Models (HM) can handle non-linearity by using linear sub-models. More difficult non-linear relationships can be handled using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), and Gradient-boosed Ensemble methods (XGBoost) for both regression and classification analysis. These methods are explained in detail and the meta-parameters associated with them discussed. The course includes hands-on computer time

This one-day course is aimed at engineers, chemists and other scientists who want to be able to analyze their own laboratory or process data and develop their own data models. The course is especially well suited for those with an interest in process analytical technology (PAT) in the pharmaceutical and chemical process industries. This course serves individuals with a need to develop predictive models such as analytical instrument calibrations, sample classification and soft sensor models. No prior knowledge is needed for this course, although some knowledge of basic chemometric methods (PCA, PLS, etc.) is useful.

1. Introduction
a. Why non-linear methods?
b. How linear methods deal with non-linear data
2. Variable Transformations
a. Log, sqrt, etc.
b. Augmenting with non-linear transforms
3. Factor BBased Transforms
a. PCA scores and augmenting
b. Polynomial PLS
4. Locally Weighted Regression
  a. Weighted regression
b. Distance measures
c. Basing models on PCA scores
5. Hierarchical Models
  a. Dividing regressions into domains
6. Support Vector Machines
  a. Classification and Regression Models
7. Artificial Neural Networks
  a. Classification and regression models
8. Gradient Boosted Decision Trees
a. Classification and regression ensemble models
9. Choosing the Right Method
  a. Prediction skill
  b. Computational performance
  c. Deployment options

Dr. Barry M. Wise, PLS_Toolbox creator and President and cofounder of Eigenvector Research, holds a doctorate in Chemical Engineering and has experience in a wide variety of applications spanning chemical process monitoring, modeling and analytical instrumental development. He has extensive teaching experience, having presented over 100 chemometrics courses and has coauthored over 50 peer reviewed articles, book chapters and patents. Dr. Wise is the winner of the 2001 EAS Award for Achievements in Chemometrics and the 2019 Wold Medal for his pioneering achievements in Process Chemometrics and dedication to the proliferation of Chemometrics.

Dr. Donal O’Sullivan has B.S. and M.S. degrees in Applied Mathematics and hold a Ph.D. in Atmospheric Sciences. He is a senior software developer at Eigenvector Research and has implemented and taught many non-linear methods