Decoding human thoughts from EEG signals is a complex task that requires capturing intricate spatial and temporal patterns in the brain's electrical activity.
Recent advancements in AI, particularly in Transformer architectures and Large Language Models (LLMs), have shown remarkable capabilities in modelling sequential and complex data patterns.
In this exposition, I present a mathematically detailed proof that human thoughts can be decoded from EEG signals using a sophisticated Transformer-based AI model. I incorporate elements from Graph Neural Networks (GNNs), expert models, and agentic models to enhance the model's specialization and accuracy.
Let:
-
$\mathbf{X} \in \mathbb{R}^{N \times T}$ denote the EEG data matrix, where$N$ is the number of electrodes (channels) and$T$ is the number of time steps. -
$\mathbf{y} \in \mathbb{R}^C$ represent the target thought or cognitive state encoded as a one-hot vector over$C$ possible classes.
Our goal is to find a function
where
The proposed AI model integrates several advanced components:
- Transformer Encoder for Temporal Dynamics
- Graph Neural Network for Spatial Relationships
- Mixture of Experts for Specialization
- Agentic Learning for Dynamic Adaptation
Each EEG channel signal is embedded into a higher-dimensional space:
where
To incorporate temporal information
where
For each head
-
Query:
$\mathbf{Q}h^{(l)} = \mathbf{E}{\text{pos}}^{(l)} \mathbf{W}_h^{Q(l)}$ -
Key:
$\mathbf{K}h^{(l)} = \mathbf{E}{\text{pos}}^{(l)} \mathbf{W}_h^{K(l)}$ -
Value:
$\mathbf{V}h^{(l)} = \mathbf{E}{\text{pos}}^{(l)} \mathbf{W}_h^{V(l)}$
Compute attention weights:
Update embeddings:
Concatenate heads and apply linear transformation:
-
$\mathbf{Z}^{(l)}$ : The output of the multi-head attention mechanism at layer$l$ . -
$Concat(…)$ : A function that concatenates the outputs from all$H$ attention heads. -
$\mathbf{Z}_1^{(l)}, \mathbf{Z}_2^{(l)}, …, \mathbf{Z}_H^{(l)}$ : The outputs from each of the$H$ attention heads at layer. -
$\mathbf{W}^{O(l)}$ : The output weight matrix at layer$l$ .
Apply position-wise feed-forward network:
Each sub-layer includes residual connections and layer normalization:
Construct a graph
-
$V$ represents the set of EEG electrodes. -
$E$ represents edges based on physical proximity or functional connectivity.
Compute the normalized graph Laplacian:
where
Apply GNN to capture spatial dependencies:
with
Define
Learn gating functions
subject to
Incorporate an agent that interacts with the environment and adapts based on feedback.
Define a policy
Define a reward
Maximize expected reward:
Update parameters using policy gradients:
-
Loss Function Use cross-entropy loss:
$L = -\frac{1}{M} \sum_{i=1}^M \mathbf{y}^{(i)^\top} \log \hat{\mathbf{y}}^{(i)}.$ -
Regularization Include regularization terms to prevent overfitting:
$L_{\text{total}} = L + \lambda \left( | \theta |^2 + \sum_{m=1}^M | g_m |^2 \right).$ -
Optimization Algorithm Use Adam optimizer with gradients computed via backpropagation.
Transformers are capable of approximating any sequence-to-sequence function, given sufficient model capacity.
-
Existence of Function
$f$ : There exists a function$f$ such that:$\mathbf{y} = f(\mathbf{X}) = f_{\text{agent}} \left( f_{\text{experts}} \left( f_{\text{GNN}} \left( f_{\text{Transformer}}(\mathbf{X}) \right) \right) \right).$ -
Approximation by the Model: Given the model's capacity and proper training,
$f$ can be approximated arbitrarily well.
-
Transformer Encoder: Captures temporal dependencies, approximating temporal mappings in EEG data.
-
Graph Neural Network: Models spatial relationships, capturing the spatial structure of EEG electrodes.
-
Mixture of Experts: Enhances specialization, allowing the model to approximate complex functions by combining simpler ones.
-
Agentic Model: Adapts dynamically, refining the approximation based on feedback.
-
Cross-Validation Implement k-fold cross-validation to assess generalization.
-
Performance Metrics
-
Accuracy:
$\text{Accuracy} = \frac{1}{M} \sum_{i=1}^M \mathbf{1}{\hat{\mathbf{y}}^{(i)} = \mathbf{y}^{(i)}}$ . - Precision, Recall, F1-Score: Calculated per class.
-
Accuracy:
-
Statistical Significance Perform hypothesis testing (e.g., permutation tests) to confirm that decoding performance is significantly better than chance.
-
Ablation Studies Evaluate the impact of each component (Transformer, GNN, experts, agentic learning) by systematically removing them and observing performance changes.
You are very welcome to modify and use them in your own projects.
Please keep a link to the original repository. If you have made a fork with substantial modifications that you feel may be useful, then please open a new issue on GitHub with a link and short description.
This project is opened under the MIT which allows very broad use for both private and commercial purposes.
A few of the images used for demonstration purposes may be under copyright. These images are included under the "fair usage" laws.