Model Validation
Transparent performance metrics from real neuroscience data
Our Models vs Real Lab Data
We don't just claim our models work — we test them against real neuroscience data. Below are our current validation results from published datasets. As we expand our training data and collect customer outcome data, these numbers will improve.
Validation Results
AUC
0.746
Accuracy
70.9%
- Dataset
- NeuMa (Georgiadis et al., 2023, Nature Scientific Data)
- What
- 41 participants viewing real supermarket brochures with EEG + eye-tracking. Participants selected products they intended to buy.
- Task
- Predict Buy vs NoBuy from EEG activity during ad viewing
- Segments
- 405 labeled EEG segments (251 Buy, 154 NoBuy)
- Published
- Nature Scientific Data, 2023
AUC
0.705
Accuracy
60.8%
- Dataset
- GSR Mental Workload Collection
- What
- 44 real galvanic skin response recordings during high vs low cognitive workload tasks
- Task
- Distinguish high vs low mental workload from physiological arousal
R²
0.842
- Dataset
- Synthetic (grounded in Pieters & Wedel 2004, Itti & Koch 1998)
- What
- 2,000 synthetic samples with feature weights derived from published eye-tracking research
- Task
- Predict attention capture score from visual properties
R²
0.781
- Dataset
- Synthetic (grounded in Cahill & McGaugh 1998, Elliot & Maier 2014)
R²
0.758
- Dataset
- Synthetic (grounded in Paivio 1973, Hunt 1995)
R²
0.853
- Dataset
- Synthetic (grounded in Mayer & Moreno 2003, Reber et al. 2004)
What These Numbers Mean
AUC (Area Under ROC Curve) measures how well a model distinguishes between two classes. An AUC of 0.5 is random chance, 1.0 is perfect. Our NeuMa purchase intent model at 0.746 means it correctly identifies Buy vs NoBuy decisions from EEG data nearly 75% of the time — trained on real brain recordings during real advertising viewing.
R² (coefficient of determination) measures how much variance in the target score a model explains. Our attention model at R² 0.842 means it captures 84% of the factors that drive visual attention in advertising.
For context, published state-of-the-art on DEAP emotion classification from EEG achieves 55–65% accuracy with classical ML. Our NeuMa model at 70.9% accuracy on a neuromarketing-specific task demonstrates that domain-specific training data matters.
Validation Roadmap
- NeuMa EEG purchase intent validation (AUC 0.746)
- GSR cognitive workload validation (AUC 0.705)
- Research-grounded synthetic models (5 models, R² 0.76–0.85)
- Tufts fNIRS cognitive load model (68 participants, in progress)
- Saliency heatmap vs MIT eye-tracking benchmark (planned)
- Customer outcome validation: prediction vs actual campaign performance (collecting data)
- Hardware validation: Emotiv EEG + Tobii eye-tracking comparison (Month 5–6)
Data Transparency
All validation datasets are publicly available:
- NeuMa: figshare.com (Georgiadis et al., 2023)
- DEAP: Queen Mary University of London
- GSR Workload: Kaggle open dataset
- Our model code and training scripts are documented on our methodology page.