What is Multivariate Data Analysis (MVDA)?

Multivariate Data Analysis (MVDA) is the set of analysis tools used to analyse and assess more than one variable simultaneously. It is ideally suited to highly dimensional complex data that might be generated by,
- Multichannel Scientific Equipment.
- Complex Modern Process Equipment.
- Financial and Stock Market Data.
- Sensory Assessments.
MVDA is highly graphical in its approach and provides a wealth of diagnostic statistics which are used to assess the quality of the models developed and also provide assurance to results generated by predictive analytics.
MVDA Methods encompass,
- Exploratory Data Analysis (EDA) and Data Mining.
- Regression and Predictive Analytics.
- Classification Methods and Preventative Maintenance.
Multivariate Methods of Analysis

Univariate and Multiple Linear Regression methods are simple least squares based approaches to data analysis typically used when the number of variables is small (1-10 variables). MLR methods can be used to model the most important variables and their interaction and square terms.
Principal Component Analysis is probably the most powerful Exploratory Data Analysis (EDA) method available. It models the similarities of objects based on the variables measured and is used to find patterns and trends in complex data sets.

Partial Least Squares Regression is not affected by the same issues as MLR with respect to collinearity. It can also be used for large numbers of variables (>> 1,000). Used to generate predictive models using multichannel instruments to replace tedious and complicated reference laboratory analyses.

Soft Independent Modelling of Class Analogy is a disjoint classification method based on a library of individual PCA models developed on known object classes. Unlike many classification methods, SIMCA is highly graphical and can also detect when an object does not belong to any class.

Partial Least Squares Discriminant Analysis is an extension of the PLS2 algorithm where an object class is defined by a binary descriptor. An object belongs to an object class when its predicted value is close to 1 within specified confidence intervals.
Univariate Methods of Analysis

Normality Testing is typically the first step before applying other parametric statistical tests to data sets. A common method used is the Shapiro-Wilk test and a data set that does not show a significant difference from the theoretical normal distribution is considered to be a normally distributed variable.

Fishers F-Test is a parametric test that compares the equivalence of variances between two samples or populations of data. Typically performed before the application of Student’s t-Test, the equivalence of variances test determines whether two sample/population means can be fairly compared to each other.

Student’s t-Test is a parametric test that compares the equivalence of means between a sample or population to a target value (one-sample t-test), two sample/population means (two-sample t-test) or the difference between two equivalent sets of data (paired t-test). These tests are used to determine whether an overall statistical difference exists between two sets of data.

Analysis Of Variance is a parametric test that assesses the within variance vs. the between variance between three or more sample sets that compose a number of replicate measurements. Used to determine whether at least one sample set can be differentiated from the others in the statistical sense. Provides important information regarding the repeatability of a set of objects.
Multivariate Statistical Process Control (MSPC)

Multivariate Statistical Process Control (MSPC) is the practical application of models developed using Multivariate Data Analysis (MVDA) to real world applications. Examples include,
- Process Control Using Process Analytical Technology (PAT).
- Multiple Sensor Assessment Using Internet of Things (IOT).
- Forecasting Share Prices and Market Trends Based on Multiple Real Time Inputs.
MSPC is highly graphical in its approach and provides detailed outputs that can be used for Early Event Detection (EED) to prevent process shutdowns before they occur, where this data can be used for Root Cause Analysis.
MSPC Outputs can also be condensed into Traffic Light Signals that are based on multiple variables all meeting specific criteria simultaneously and any deviations can signal a process engineer or process scientist to investigate further using the detailed outputs of multivariate models.