Model Validation

Transparent performance metrics from real neuroscience data

Our Models vs Real Lab Data

We don't just claim our models work — we test them against real neuroscience data. Below are our current validation results from published datasets. As we expand our training data and collect customer outcome data, these numbers will improve.

Validation Results

Purchase Intent Prediction
Validated

AUC

0.746

Accuracy

70.9%

Dataset
NeuMa (Georgiadis et al., 2023, Nature Scientific Data)
What
41 participants viewing real supermarket brochures with EEG + eye-tracking. Participants selected products they intended to buy.
Task
Predict Buy vs NoBuy from EEG activity during ad viewing
Segments
405 labeled EEG segments (251 Buy, 154 NoBuy)
Published
Nature Scientific Data, 2023
Cognitive Load Detection
Validated

AUC

0.705

Accuracy

60.8%

Dataset
GSR Mental Workload Collection
What
44 real galvanic skin response recordings during high vs low cognitive workload tasks
Task
Distinguish high vs low mental workload from physiological arousal
Attention Prediction
Research-grounded

0.842

Dataset
Synthetic (grounded in Pieters & Wedel 2004, Itti & Koch 1998)
What
2,000 synthetic samples with feature weights derived from published eye-tracking research
Task
Predict attention capture score from visual properties
Emotional Engagement
Research-grounded

0.781

Dataset
Synthetic (grounded in Cahill & McGaugh 1998, Elliot & Maier 2014)
Memory Encoding
Research-grounded

0.758

Dataset
Synthetic (grounded in Paivio 1973, Hunt 1995)
Cognitive Load (Visual)
Research-grounded

0.853

Dataset
Synthetic (grounded in Mayer & Moreno 2003, Reber et al. 2004)

What These Numbers Mean

AUC (Area Under ROC Curve) measures how well a model distinguishes between two classes. An AUC of 0.5 is random chance, 1.0 is perfect. Our NeuMa purchase intent model at 0.746 means it correctly identifies Buy vs NoBuy decisions from EEG data nearly 75% of the time — trained on real brain recordings during real advertising viewing.

R² (coefficient of determination) measures how much variance in the target score a model explains. Our attention model at R² 0.842 means it captures 84% of the factors that drive visual attention in advertising.

For context, published state-of-the-art on DEAP emotion classification from EEG achieves 55–65% accuracy with classical ML. Our NeuMa model at 70.9% accuracy on a neuromarketing-specific task demonstrates that domain-specific training data matters.

Validation Roadmap

  • NeuMa EEG purchase intent validation (AUC 0.746)
  • GSR cognitive workload validation (AUC 0.705)
  • Research-grounded synthetic models (5 models, R² 0.76–0.85)
  • Tufts fNIRS cognitive load model (68 participants, in progress)
  • Saliency heatmap vs MIT eye-tracking benchmark (planned)
  • Customer outcome validation: prediction vs actual campaign performance (collecting data)
  • Hardware validation: Emotiv EEG + Tobii eye-tracking comparison (Month 5–6)

Data Transparency

All validation datasets are publicly available:

  • NeuMa: figshare.com (Georgiadis et al., 2023)
  • DEAP: Queen Mary University of London
  • GSR Workload: Kaggle open dataset
  • Our model code and training scripts are documented on our methodology page.