EEG Thought Decoder

Decoding human thoughts from EEG signals is a complex task that requires capturing intricate spatial and temporal patterns in the brain's electrical activity.

Recent advancements in AI, particularly in Transformer architectures and Large Language Models (LLMs), have shown remarkable capabilities in modelling sequential and complex data patterns.

In this exposition, I present a mathematically detailed proof that human thoughts can be decoded from EEG signals using a sophisticated Transformer-based AI model. I incorporate elements from Graph Neural Networks (GNNs), expert models, and agentic models to enhance the model's specialization and accuracy.

Mathematical Formulation

Let:

$\mathbf{X} \in \mathbb{R}^{N \times T}$ denote the EEG data matrix, where $N$ is the number of electrodes (channels) and $T$ is the number of time steps.
$\mathbf{y} \in \mathbb{R}^C$ represent the target thought or cognitive state encoded as a one-hot vector over $C$ possible classes.

Our goal is to find a function $f: \mathbb{R}^{N \times T} \rightarrow \mathbb{R}^C$ such that:

$\hat{\mathbf{y}} = f(\mathbf{X}),$

where $\hat{\mathbf{y}}$ is the model's prediction of the thought corresponding to EEG input $\mathbf{X}$.

Model Architecture

The proposed AI model integrates several advanced components:

Transformer Encoder for Temporal Dynamics
Graph Neural Network for Spatial Relationships
Mixture of Experts for Specialization
Agentic Learning for Dynamic Adaptation

1. Transformer Encoder for Temporal Dynamics

Input Embedding

Each EEG channel signal is embedded into a higher-dimensional space:

$\mathbf{E} = \text{Embedding}(\mathbf{X}) \in \mathbb{R}^{N \times T \times d_{\text{model}}},$

where $d_{\text{model}}$ is the model dimension.

Positional Encoding

To incorporate temporal information

$\mathbf{E}_{\text{pos}} = \mathbf{E} + \mathbf{P},$

where $\mathbf{P} \in \mathbb{R}^{N \times T \times d_{\text{model}}}$ is the positional encoding matrix defined as:

$\mathbf{P}{(n,t,2k)} = \sin\left( \frac{t}{10000^{2k/d{\text{model}}}} \right),$

$\mathbf{P}{(n,t,2k+1)} = \cos\left( \frac{t}{10000^{2k/d{\text{model}}}} \right).$

Multi-Head Self-Attention

For each head $h$ and layer $l$:

Query: $\mathbf{Q}h^{(l)} = \mathbf{E}{\text{pos}}^{(l)} \mathbf{W}_h^{Q(l)}$
Key: $\mathbf{K}h^{(l)} = \mathbf{E}{\text{pos}}^{(l)} \mathbf{W}_h^{K(l)}$
Value: $\mathbf{V}h^{(l)} = \mathbf{E}{\text{pos}}^{(l)} \mathbf{W}_h^{V(l)}$

Compute attention weights:

$\mathbf{A}_h^{(l)} = \text{softmax}\left( \frac{\mathbf{Q}_h^{(l)} (\mathbf{K}_h^{(l)})^\top}{\sqrt{d_k}} \right).$

Update embeddings:

$\mathbf{Z}_h^{(l)} = \mathbf{A}_h^{(l)} \mathbf{V}_h^{(l)}.$

Concatenate heads and apply linear transformation:

$\mathbf{Z}^{(l)} = \text{Concat}\left( \mathbf{Z}_1^{(l)}, \dots, \mathbf{Z}_H^{(l)} \right) \mathbf{W}^{O(l)}.$

$\mathbf{Z}^{(l)}$: The output of the multi-head attention mechanism at layer $l$.
$Concat(…)$ : A function that concatenates the outputs from all $H$ attention heads.
$\mathbf{Z}_1^{(l)}, \mathbf{Z}_2^{(l)}, …, \mathbf{Z}_H^{(l)}$: The outputs from each of the $H$ attention heads at layer.
$\mathbf{W}^{O(l)}$: The output weight matrix at layer $l$.

Feed-Forward Network

Apply position-wise feed-forward network:

$\mathbf{F}^{(l)} = \text{ReLU}\left( \mathbf{Z}^{(l)} \mathbf{W}_1^{(l)} + \mathbf{b}_1^{(l)} \right) \mathbf{W}_2^{(l)} + \mathbf{b}_2^{(l)}.$

Layer Normalization and Residual Connections

Each sub-layer includes residual connections and layer normalization:

$\mathbf{E}{\text{pos}}^{(l+1)} = \text{LayerNorm}\left( \mathbf{E}{\text{pos}}^{(l)} + \mathbf{F}^{(l)} \right).$

2. Graph Neural Network for Spatial Relationships

Graph Construction

Construct a graph $G = (V, E)$ where:

$V$ represents the set of EEG electrodes.
$E$ represents edges based on physical proximity or functional connectivity.

Graph Laplacian

Compute the normalized graph Laplacian:

$\mathbf{L} = \mathbf{I} - \mathbf{D}^{-1/2} \mathbf{A} \mathbf{D}^{-1/2},$

where $\mathbf{A}$ is the adjacency matrix and $\mathbf{D}$ is the degree matrix.

Graph Convolution

Apply GNN to capture spatial dependencies:

$\mathbf{H}^{(k+1)} = \sigma\left( \mathbf{L} \mathbf{H}^{(k)} \mathbf{W}^{(k)} \right),$

with $\mathbf{H}^{(0)} = \mathbf{E}_{\text{pos}}^{(L)}$ (output from Transformer encoder), and $\sigma$ is an activation function (e.g., ReLU).

3. Mixture of Experts for Specialization

Expert Models

Define $M$ expert models ${f_m}_{m=1}^M$, each specializing in different aspects (e.g., frequency bands, cognitive tasks).

Gating Mechanism

Learn gating functions $g_m(\mathbf{X})$ to weight each expert's contribution:

$\hat{\mathbf{y}} = \sum_{m=1}^M g_m(\mathbf{X}) f_m(\mathbf{X}),$

subject to $\sum_{m=1}^M g_m(\mathbf{X}) = 1$ and $g_m(\mathbf{X}) \geq 0$.

4. Agentic Learning for Dynamic Adaptation

Incorporate an agent that interacts with the environment and adapts based on feedback.

Policy Network

Define a policy $\pi_\theta(a | \mathbf{X})$ where $a$ is an action (e.g., adjusting model parameters).

Reward Function

Define a reward $R(\hat{\mathbf{y}}, \mathbf{y})$ based on decoding accuracy.

Optimization Objective

Maximize expected reward:

$\max_\theta \mathbb{E}_{\mathbf{X}, \mathbf{y}} \left[ R\left( \hat{\mathbf{y}}, \mathbf{y} \right) \right].$

Update parameters using policy gradients:

$\theta \leftarrow \theta + \eta \nabla_\theta \mathbb{E}\left[ R \right].$

Training Procedure

Loss Function Use cross-entropy loss: $L = -\frac{1}{M} \sum_{i=1}^M \mathbf{y}^{(i)^\top} \log \hat{\mathbf{y}}^{(i)}.$
Regularization Include regularization terms to prevent overfitting:

$L_{\text{total}} = L + \lambda \left( | \theta |^2 + \sum_{m=1}^M | g_m |^2 \right).$
Optimization Algorithm Use Adam optimizer with gradients computed via backpropagation.

Mathematical Proof of Decoding Capability

Universal Approximation Theorem for Transformers

Transformers are capable of approximating any sequence-to-sequence function, given sufficient model capacity.

Existence of Function $f$: There exists a function $f$ such that:

$\mathbf{y} = f(\mathbf{X}) = f_{\text{agent}} \left( f_{\text{experts}} \left( f_{\text{GNN}} \left( f_{\text{Transformer}}(\mathbf{X}) \right) \right) \right).$
Approximation by the Model: Given the model's capacity and proper training, $f$ can be approximated arbitrarily well.

Proof Sketch

Transformer Encoder: Captures temporal dependencies, approximating temporal mappings in EEG data.
Graph Neural Network: Models spatial relationships, capturing the spatial structure of EEG electrodes.
Mixture of Experts: Enhances specialization, allowing the model to approximate complex functions by combining simpler ones.
Agentic Model: Adapts dynamically, refining the approximation based on feedback.

Verification and Evaluation

Cross-Validation Implement k-fold cross-validation to assess generalization.
Performance Metrics
- Accuracy: $\text{Accuracy} = \frac{1}{M} \sum_{i=1}^M \mathbf{1}{\hat{\mathbf{y}}^{(i)} = \mathbf{y}^{(i)}}$.
- Precision, Recall, F1-Score: Calculated per class.
Statistical Significance Perform hypothesis testing (e.g., permutation tests) to confirm that decoding performance is significantly better than chance.
Ablation Studies Evaluate the impact of each component (Transformer, GNN, experts, agentic learning) by systematically removing them and observing performance changes.

Contribution

You are very welcome to modify and use them in your own projects.

Please keep a link to the original repository. If you have made a fork with substantial modifications that you feel may be useful, then please open a new issue on GitHub with a link and short description.

License (MIT)

This project is opened under the MIT which allows very broad use for both private and commercial purposes.

A few of the images used for demonstration purposes may be under copyright. These images are included under the "fair usage" laws.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
src/thought_decoder		src/thought_decoder
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EEG Thought Decoder

Mathematical Formulation

Model Architecture

1. Transformer Encoder for Temporal Dynamics

Input Embedding

Positional Encoding

Multi-Head Self-Attention

Feed-Forward Network

Layer Normalization and Residual Connections

2. Graph Neural Network for Spatial Relationships

Graph Construction

Graph Laplacian

Graph Convolution

3. Mixture of Experts for Specialization

Expert Models

Gating Mechanism

4. Agentic Learning for Dynamic Adaptation

Policy Network

Reward Function

Optimization Objective

Training Procedure

Mathematical Proof of Decoding Capability

Universal Approximation Theorem for Transformers

Proof Sketch

Verification and Evaluation

Contribution

License (MIT)

About

Releases

Packages

Languages

License

victor-iyi/eeg-thought-decoder

Folders and files

Latest commit

History

Repository files navigation

EEG Thought Decoder

Mathematical Formulation

Model Architecture

1. Transformer Encoder for Temporal Dynamics

Input Embedding

Positional Encoding

Multi-Head Self-Attention

Feed-Forward Network

Layer Normalization and Residual Connections

2. Graph Neural Network for Spatial Relationships

Graph Construction

Graph Laplacian

Graph Convolution

3. Mixture of Experts for Specialization

Expert Models

Gating Mechanism

4. Agentic Learning for Dynamic Adaptation

Policy Network

Reward Function

Optimization Objective

Training Procedure

Mathematical Proof of Decoding Capability

Universal Approximation Theorem for Transformers

Proof Sketch

Verification and Evaluation

Contribution

License (MIT)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages