Model Validation

Transparent performance metrics from real neuroscience data

Our Models vs Real Lab Data

We don't just claim our models work — we test them against real neuroscience data. Below are our current validation results from published datasets. As we expand our training data and collect customer outcome data, these numbers will improve.

Validation Results

Purchase Intent Prediction

Validated

AUC

0.746

Accuracy

70.9%

Dataset: NeuMa (Georgiadis et al., 2023, Nature Scientific Data)
What: 41 participants viewing real supermarket brochures with EEG + eye-tracking. Participants selected products they intended to buy.
Task: Predict Buy vs NoBuy from EEG activity during ad viewing
Segments: 405 labeled EEG segments (251 Buy, 154 NoBuy)
Published: Nature Scientific Data, 2023

Cognitive Load Detection

Validated

AUC

0.705

Accuracy

60.8%

Dataset: GSR Mental Workload Collection
What: 44 real galvanic skin response recordings during high vs low cognitive workload tasks
Task: Distinguish high vs low mental workload from physiological arousal

Attention Prediction

Research-grounded

R²

0.842

Dataset: Synthetic (grounded in Pieters & Wedel 2004, Itti & Koch 1998)
What: 2,000 synthetic samples with feature weights derived from published eye-tracking research
Task: Predict attention capture score from visual properties

Emotional Engagement

Research-grounded

R²

0.781

Dataset: Synthetic (grounded in Cahill & McGaugh 1998, Elliot & Maier 2014)

Memory Encoding

Research-grounded

R²

0.758

Dataset: Synthetic (grounded in Paivio 1973, Hunt 1995)

Cognitive Load (Visual)

Research-grounded

R²

0.853

Dataset: Synthetic (grounded in Mayer & Moreno 2003, Reber et al. 2004)

What These Numbers Mean

AUC (Area Under ROC Curve) measures how well a model distinguishes between two classes. An AUC of 0.5 is random chance, 1.0 is perfect. Our NeuMa purchase intent model at 0.746 means it correctly identifies Buy vs NoBuy decisions from EEG data nearly 75% of the time — trained on real brain recordings during real advertising viewing.

R² (coefficient of determination) measures how much variance in the target score a model explains. Our attention model at R² 0.842 means it captures 84% of the factors that drive visual attention in advertising.

For context, published state-of-the-art on DEAP emotion classification from EEG achieves 55–65% accuracy with classical ML. Our NeuMa model at 70.9% accuracy on a neuromarketing-specific task demonstrates that domain-specific training data matters.

Validation Roadmap

NeuMa EEG purchase intent validation (AUC 0.746)
GSR cognitive workload validation (AUC 0.705)
Research-grounded synthetic models (5 models, R² 0.76–0.85)
Tufts fNIRS cognitive load model (68 participants, in progress)
Saliency heatmap vs MIT eye-tracking benchmark (planned)
Customer outcome validation: prediction vs actual campaign performance (collecting data)
Hardware validation: Emotiv EEG + Tobii eye-tracking comparison (Month 5–6)

Data Transparency

All validation datasets are publicly available:

NeuMa: figshare.com (Georgiadis et al., 2023)
DEAP: Queen Mary University of London
GSR Workload: Kaggle open dataset
Our model code and training scripts are documented on our methodology page.