Before using this script, you should have:
-
Familiarity with Python syntax
- Basic understanding of Python programming concepts and structures.
-
Knowledge of the MNE-Python library (suggested)
- Experience in using MNE-Python for EEG data processing and analysis.
-
Understanding of EEG paradigms (suggested)
- Familiarity with common EEG experimental paradigms and related concepts.
-
Time Consuming
- 30 mins
This tutorial is made for developers to create a make_fif_xxxx
script (below marked as standard script
)using the EEGUnity
library. The script standardizes raw EEG datasets into the FIF format for following any unified processing. It ensures that:
- Events are stored as MNE annotations, with descriptions consistent with the original dataset.
- Subject Information is embedded in the
mne.io.raw.info['description']
field. - Channel Information is updated if the original data does not sufficiently reflect electrode positions.
This document is detailed and beginner-friendly, making it accessible even to those without prior knowledge of EEG or programming. Below are the steps to create
standard script
.
-
Create a Python environment with Python version ≥ 3.6.
- You can use
venv
orconda
to manage the environment.
- You can use
-
Install the
EEGUnity
library by running the following command in your terminal:pip install eeginity
- Create a Python project and name it:
standard_script_projects
- Create a Python package named
setting
in the root directory of project and create a python file namepath_variable.py
inside thesetting
package - Create a folder in the root directory of the project, name
stage1_locator
, for storing locator files. - Create a folder in the root directory of the project
- Create a folder named
original_raweeg
in any accessible path, to save original EEG datasets (for dataset downloasd). - Create a folder named
standard_raweeg
in any accessible path, to save processed standard EEG datasets (folder for storing processed dataset). - Download any datasets and unzip EEG datasets in
orignal_raweeg
, such as BCI Competition IV 2a - Define path variable on
path_variable.py
, like# Input path for each dataset bcic_iv_2a_path = r"path/to/original_raweeg/bcic_iv_2a" bcic_iv_2b_path = r"path/to/original_raweeg/bcic_iv_2b" # add more path if needed # output directory for stage 1 stage1_locator = r"path/to/stage1_locator/" # folder to save stage 1 locator stage1_output_dir = r"path/to/standard_raweeg" # folder to save processed datasets
Create a Python file, name make_fif_xxxx.py
in the root directory of the project. Then, make it by following instrutions.
The script performs the following tasks:
- Parameter Settings:
Define input and output paths, domain tag, and cache usage. For example:Note: Avoid absolute paths. Only specify basic parameters.# Parameter settings import os import setting.path_variable as pv # this is additional python file which stores all path variable input_path = pv.bcic_iv_2a_path # Dataset directory domain_tag = 'bcic-iv-2a' # Domain tag for marking the dataset output_path = os.path.join(pv.stage1_output_dir, 'bcic_iv_2a') # Output path use_cache = False
- Loading the Dataset:
The script uses theUnifiedDataset
class to load the dataset. If a locator file exists and caching is enabled, it reuses that file; otherwise, it creates a new locator file.locator_path = os.path.join(pv.stage1_locator, os.path.basename(input_path)+".csv") if os.path.exists(locator_path) and use_cache: unified_dataset = UnifiedDataset(locator_path=locator_path) else: unified_dataset = UnifiedDataset(dataset_path=input_path, domain_tag=domain_tag) unified_dataset.save_locator(locator_path)
- Processing Each Data Row:
A functionapp_func
is defined to:- Load the raw EEG data using
get_data_row
, a key function inEEGUnity
. - Extract the subject ID (e.g., from the file path) and update the subject information (such as age, gender, etc.) in
mne_raw.info['description']
. For convenience, you can check the dataset folder beforehand and store the information in a Python dictionary to simplify loading it within the code. - Rename channels to match the standard system (e.g., 10-20 system).
- Events Handling:
- ⚡ Extract events from the dataset folder and convert them back to annotations using
mne.annotations_from_events
.
Note: This step is the most important and time-consuming when creatingstandard script
. 🚨 For more details on MNE annotations, please read the MNE-Python Annotations Documentation :contentReference[oaicite:0]{index=0} carefully.
- ⚡ Extract events from the dataset folder and convert them back to annotations using
- Handle irregularities, such as nonstandard data (e.g., file names and event annotations), directly within the script.
This ensures that users can run thestandard script
immediately after downloading the datasets without additional modifications.
- Load the raw EEG data using
- Saving the Processed Data:
After processing, the EEG data is saved as a FIF file. The output filename is modified (by appending_raw.fif
) to conform to MNE naming conventions. - Batch Processing:
Finally, the script appliesapp_func
to all data rows (filtered to only those marked asCompleted
), processing the entire dataset in batch.
Below is the full sample script for standardizing the bcic_iv_2a
dataset, you can copy it in your project and modify it based on your dataset:
import scipy.io as scio
from eegunity.unifieddataset import UnifiedDataset
from eegunity.modules.parser import get_data_row
import numpy as np
import os
import mne
import setting.path_variable as pv
import json
subject_dict = {
'A01': {'gender': 'female', 'age': 22},
'A02': {'gender': 'female', 'age': 24},
'A03': {'gender': 'male', 'age': 26},
'A04': {'gender': 'female', 'age': 24},
'A05': {'gender': 'male', 'age': 24},
'A06': {'gender': 'female', 'age': 23},
'A07': {'gender': 'male', 'age': 25},
'A08': {'gender': 'male', 'age': 23},
'A09': {'gender': 'male', 'age': 17}
}
# Parameter settings
input_path = pv.bcic_iv_2a_path # Dataset directory
domain_tag = "bcic-iv-2a" # Domain tag for marking the dataset
output_path = os.path.join(pv.stage1_output_dir, "bcic_iv_2a") # Output path
use_cache = False
def app_func(row, output_dir):
# Load the MNE raw data
mne_raw = get_data_row(row)
subject_id = os.path.basename(row['File Path'])[:3]
description_dict = {
"original_description": mne_raw.info['description'],
"eegunity_description": {
"amplifier": "unknown",
"cap": "Ag/AgCl",
"age": subject_dict[subject_id]['age'],
"sex": subject_dict[subject_id]['gender'],
"handedness": "unknown"
}
}
mne_raw.info['description'] = json.dumps(description_dict)
# Rename the channels to the 10-20 system, commonly used for 64 electrode positions
mne_raw.rename_channels({
'EEG-Fz': 'Fz', 'EEG-0': 'FC3', 'EEG-1': 'FC1', 'EEG-2': 'FCz',
'EEG-3': 'FC2', 'EEG-4': 'FC4', 'EEG-5': 'C5', 'EEG-C3': 'C3',
'EEG-6': 'C1', 'EEG-Cz': 'Cz', 'EEG-7': 'C2', 'EEG-C4': 'C4',
'EEG-8': 'C6', 'EEG-9': 'CP3', 'EEG-10': 'CP1', 'EEG-11': 'CPz',
'EEG-12': 'CP2', 'EEG-13': 'CP4', 'EEG-14': 'P1', 'EEG-15': 'Pz',
'EEG-16': 'P2', 'EEG-Pz': 'POz'
})
montage = mne.channels.make_standard_montage('standard_1020')
mne_raw.info.set_montage(montage, on_missing='ignore')
mne_raw.set_channel_types({'EOG-left': 'eog', 'EOG-central': 'eog', 'EOG-right': 'eog'})
# Event ID mapping
event_id = {
'Rejected trial': 1,
'Eye movements': 2,
'Idling EEG (eyes open)': 3,
'Idling EEG (eyes closed)': 4,
'Start of a new run': 5,
'Start of a trial': 6,
'Cue onset left (class 1)': 7,
'Cue onset right (class 2)': 8,
'Cue onset foot (class 3)': 9,
'Cue onset tongue (class 4)': 10
}
# Extract events and original event IDs
events, original_event_id = mne.events_from_annotations(mne_raw)
# Update events based on new event_id mapping
for event_desc, new_id in event_id.items():
if event_desc in original_event_id:
events[events[:, 2] == original_event_id[event_desc], 2] = new_id
# Check if the file name ends with 'E' and construct the .mat file path
file_base, file_ext = os.path.splitext(row['File Path'])
if file_base.endswith('E') and file_ext == '.gdf':
mat_filepath = f"{file_base}.mat"
if os.path.exists(mat_filepath):
mat_data = scio.loadmat(mat_filepath)
values_from_mat = mat_data[
'classlabel'].flatten() + 6 # Replace 'data' with the correct key in your .mat file
# Replace events where the last column is 7
replacement_indices = np.where(events[:, -1] == 7)[0]
if len(replacement_indices) >= len(values_from_mat):
events[replacement_indices[:len(values_from_mat)], 2] = values_from_mat
else:
print(f"Warning: {mat_filepath} contains fewer values than needed for replacement.")
# Convert modified events back to annotations
event_desc = {value: key for key, value in event_id.items()} # Convert event IDs back to descriptions
annotations = mne.annotations_from_events(
events=events,
sfreq=mne_raw.info['sfreq'],
event_desc=event_desc # Mapping event codes to descriptions
)
mne_raw.set_annotations(annotations) # Set new annotations to raw data
# Save the processed EEG data to the output directory
filename = os.path.basename(row['File Path']) # Extract the file name
# Modify the output filename to conform to MNE naming conventions
output_filename = f"{filename[:-4]}_raw.fif" # Assuming the original filename ends with '.gdf', remove the extension and add '_raw.fif'
output_path = os.path.join(output_dir, output_filename) # Define the output path
mne_raw.save(output_path, overwrite=True) # Save the file, overwriting if necessary
return None
# 1. Load the dataset directory
locator_path = os.path.join(pv.stage1_locator, os.path.basename(input_path)+".csv")
if os.path.exists(locator_path) and use_cache:
unified_dataset = UnifiedDataset(locator_path=locator_path)
else:
unified_dataset = UnifiedDataset(dataset_path=input_path, domain_tag=domain_tag)
unified_dataset.save_locator(locator_path)
# 2. Batch process EEG data
unified_dataset.eeg_batch.batch_process(
con_func=lambda row: row['Completeness Check'] == 'Completed', # Filter out 'Completed' data
app_func=lambda row: app_func(row, output_path), # Call the processing function and set output directory
is_patch=False, # No patching needed
result_type=None # No return type needed
)
- Annotations: All events are stored as MNE annotations, ensuring consistency in event descriptions. For more details, refer to MNE-Python Annotations :contentReference[oaicite:1]{index=1}.
- Subject Information: Subject details (e.g., age, gender) are embedded in the
info['description']
field. - Channel Information: Renaming channels and setting a standard montage guarantees accurate electrode positioning.
- Robustness: Exception handling is built into the script (e.g., handling irregular file names and time annotations) so that users can process the dataset without manual directory modifications.
- Parameter Settings: Only basic parameters (input_path, domain_tag, output_path, use_cache) need to be specified, with no absolute paths used.