Pre-trained motor-imagery models

Collection of pre-trained neural neworks for motor-imagery decoding.

Twelve neural networks were trained on twelve different motor imagery datasets. Those models are available for download and can easily be re-used and fine-tuned.

An explanation of the training procedure and benchmarking results of these models are available in the poster presented at the 10th BCI Meeting in Brussels (link above).

Here is a minimal example that loads EEGNet pre-trained on Lee2019_MI:

import torch
from huggingface_hub import hf_hub_download
from braindecode.models import EEGNetv4

path = hf_hub_download(repo_id='PierreGtch/EEGNetv4', filename='EEGNetv4_Lee2019_MI/model-params.pkl')
net = EEGNetv4(3, 2, 385).eval()
net.load_state_dict(torch.load(path, map_location='cpu'))

The notebook linked above explains in details how to train and share a model, but also how to download and fine-tune pre-trained models. Finally, you can find a static preview of this same notebook bellow:

Notebook - fast calibration in BCI with deep learning

This notebook is designed as a tutorial and is composed of two independent parts. The first one is focused on (pre-)training neural networks for EEG decoding. In this part, you will see how you can select a neural network architecture from Braindecode, train it and share the resulting pre-trained network on the HuggingFace Hub. The second part is focused on re-using pre-trained models. There, you will learn how you can download pre-trained models from the HuggingFace Hub, fine-tune them or use them in scikit-learn pipelines and finally, benchmark them on BCI datasets with MOABB. These two parts are ment to be independent, so if you are only interested in re-using my pre-trained models, you can directly jump to Part 2.

To run this notebook, you will need the following libraries installed in your python environment:

  • numpy
  • torch
  • scikit-learn
  • skorch
  • braindecode
  • huggingface_hub
  • moabb
  • (notebook / jupyterlab)

Part 1 - (Pre-)Training neural networks

This part is dedicated to pre-training neural networks. In this tutorial, we will use PyTorch to implement and train those networks, but other deep learning frameworks exist such as JAX or TensorFlow.

1.1. Braindecode architecture

The first step to train a neural network is to define its architecture. Thankfully, some architectures have already been created and implemented so we will only have to select one. For this, we will use the Braindecode library in which many neural networks designed for EEG processing and BCI decoding have been implemented in Pytorch. In particular, we will use the EEGNetv4 model introduced in Lawhern et. al, 2018 because it is rather light-weight, and it showed a good classification performance on various BCI paradigms.

In this demo, we will use fake meaningless data that has 3 EEG channels, two different classes and 200 samples per trial. Here, we create a batch of 50 such trials:

import torch

X_torch = torch.randn(size=(50, 3, 200))  # size: (batch, in_chans, input_window_samples)
y_torch = torch.randint(low=0, high=2, size=(50,))  # size: (batch), values: 0 or 1

The to get our neural network, we simply have to instantiate the EEGNetv4 class with the right parameters:

from braindecode.models import EEGNetv4

module = EEGNetv4(in_chans=3, n_classes=2, input_window_samples=200)

Finally, we can use the forward method of our neural network to get the predictions for our batch of trials:

y_pred = module(X_torch)
print('y_pred.shape =', y_pred.shape)  # size: (batch, n_classes)
# print(y_pred.exp().sum(dim=1)) # y_pred are log-probabilities, so the exp of the outputs sum to 1 for each trial

The network predicts the log-probability for each class, so we will have to use a negative log-likelihood loss to train it on classification tasks.

1.2. Training a pytorch model

Now that we have our neural network, we can train it on our fake data. Multiple methods exist to train Pytorch models. Under the hood, they all rely on pytorch, but they allow to reduce the amount of boilerplate code needed to train a model.

Option 1 - Pure Pytorch training

The first option at out disposal is to use pure Pytorch. For that, we need to first define an optimizer and a loss function:

from copy import deepcopy
from torch import nn

torch_module = deepcopy(module)  # we copy the architecture instantiated earlier
optimizer = torch.optim.SGD(params=torch_module.parameters(), lr=0.001)
criterion = nn.NLLLoss()

Then, the training loop is as follows:

  1. First, compute the predictions of the model on the input data;
  2. Then, compute the loss between the predictions and the targets;
  3. Then, compute the gradients of the loss with respect to the model parameters;
  4. Finally, update the model parameters according to the gradients and using the optimizer.

This translates into the following code:

for epoch in range(1, 11):
    print('epoch', epoch)
    y_pred = torch_module(X_torch)
    loss = criterion(y_pred, y_torch)

Option 2 - Training using Skorch

The second option is to use the Skorch library. This library will take care of some of the boilerplate code for us but it also allows us to use Pytorch models as if they were scikit-learn models. This means that we can use them in scikit-learn pipelines, grid-searches, etc.

For this, we simply have to wrapp our Pytorch model in a skorch.NeuralNetClassifier object:

from braindecode import EEGClassifier  # EEGClassifier is a subclass of skorch.NeuralNetClassifier

skorch_module = deepcopy(module)  # we copy the architecture instantiated earlier
skorch_classifier = EEGClassifier(skorch_module, max_epochs=10)

Then, as any scikit-learn model, we can use simple numpy arrays as training data:

import numpy as np

X_np = np.random.randn(50, 3, 200)  # size: (batch, in_chans, input_window_samples)
y_np = np.random.randint(low=0, high=2, size=50)  # size: (batch), values: 0 or 1

Finally, we can train our model in one line, using the fit method:

_ =, y_np)

Among the boilerplate code that Skorch handles for us, we can see that it automatically logs the epoch number, the validation loss, and the duration of the epoch.

Option 3 - Other libraries

Skorch is not the only library you can use to handle the boilerplate code for training Pytorch models. The other options include:

Those libraries are particularly relevant when training models on GPUs, TPUs, or in a distributed manner because they can automatically place the data and model on the right device and distribute the computations. They also provide more advanced features such as automatic mixed-precision training, gradient accumulation, etc.

1.3. Sharing models on HuggingFace Hub

The HuggingFace Hub is a platform that allows to share and download pre-trained models. Uploading models to Hugging Face Hub can be done in two steps: you first have the save the pre-trained weights or your models on your local machine (i.e. in a file), then you can upload those files containing those files to the Hub.

For this, we will first create a temporary local directory in order to not pollute our current directory:

from pathlib import Path
from tempfile import mkdtemp

save_dir = Path(mkdtemp())

Then, we can save the pre-trained models in this temporary folder. Here we both save the weights of the Pytorch model and of the Skorch model, but they are independant of each other:

    f_params=save_dir / 'skorch_params.pkl',
    f_optimizer=save_dir / 'skorch_opt.pkl',
    f_history=save_dir / 'skorch_history.json',
), save_dir / 'torch_params.pkl')

Now, we can upload all the files in the temporary folder using the command huggingface_hub.upload_folder. Those files will be uploaded to the repository PierreGtch/EEGNetv4 that also contain the pre-trained models presented at the 10th BCI Meeting (c.f. poster). In this repository, we put them in the folder named toy:

from huggingface_hub import upload_folder

_ = upload_folder(

Finally, we can remove our temporary local folder:

from shutil import rmtree


Part 2 - Re-using pre-trained neural networks

This part is focused on re-using pre-trained BCI models. It is designed as a stand-alone, so there will be redundant imports with the previous part.

2.1. Loading models from HuggingFace Hub

The first step to re-using a pre-trained model is to download it from the HuggingFace Hub. For this, we will use the function huggingface_hub.hf_hub_download. This function takes as input the repository ID and the name of the file to download. It returns the local path to the downloaded file. The nice thing is that if this file is already present on your local machine, it will not be downloaded again.

from huggingface_hub import hf_hub_download

file_names = dict(
local_paths = {
    k: hf_hub_download(
        filename='toy/' + name,
    for k, name in file_names.items()

Now that we have downloaded and collected local paths to the pre-trained weights of our models from Part 1, we can load then in memory. For this, we first have to instantiate the model architecture, then we can load the weights using the load_state_dict method (for Pytorch) or the load_params method (for Skorch):

import torch
from braindecode import EEGClassifier
from braindecode.models import EEGNetv4

# load the pure pytorch module:
torch_module = EEGNetv4(in_chans=3, n_classes=2, input_window_samples=200)

# load the pure pytorch module:
skorch_module = EEGNetv4(in_chans=3, n_classes=2, input_window_samples=200)
skorch_classifier = EEGClassifier(skorch_module, max_epochs=5)

2.2. Re-using a pre-trained neural network

Once a pre-trained model is loaded, we can use it in different ways.

Option 1 - Simple prediction

The first option is to simply use the model to make predictions on new data. For this, we will first create again some fake data:

import numpy as np

X_np = np.random.randn(20, 3, 200)  # size: (batch, in_chans, input_window_samples)
y_np = np.random.randint(low=0, high=2, size=20)  # size: (batch), values: 0 or 1

Then, like any other Scikit-learn estimator, we can use the predict method of the model:


or its score method to get the accuracy:

skorch_classifier.score(X_np, y_np)

Option 2 - Fine-tuning using Skorch

We also have the possibility to fine-tune the model using Skorch. For this, we will use the partial_fit method of the Skorch classifier. Here, we will use it to train the model for a few additional epochs on our fake data:

_ = skorch_classifier.partial_fit(X_np, y_np)

As you can see, the training did not start from scratch, it started at the 10th epoch, where the model was saved in Part 1. Please also note that teh state of the optimizer was also restored. This is particularly useful for optimizer with learnable parameters, such as Adam.

Option 3 - Frozen embedding in a Scikit-learn pipeline

Finally, the method we used to obtain the results presented ath the 10th BCI Meeting (c.f. poster): using the pre-trained model as a frozen feature extractor in a Scikit-learn classification pipeline. For this, we first have to get a frozen feature extractor from our classification neural network. To do that, we define two function: one that discards the classification layers and only keep the embedding part of the model, and one that freezes the model to avoid accumulating gradients unnecessarily.

from collections import OrderedDict
from torch import nn

def remove_clf_layers(model: nn.Sequential):
    Remove the classification layers from braindecode models.
    Tested on EEGNetv4, Deep4Net (i.e. DeepConvNet), and EEGResNet.
    new_layers = []
    for name, layer in model.named_children():
        if 'classif' in name:
        if 'softmax' in name:
        new_layers.append((name, layer))
    return nn.Sequential(OrderedDict(new_layers))

def freeze_model(model):
    for param in model.parameters():
        param.requires_grad = False
    return model

embedding = freeze_model(remove_clf_layers(torch_module)).double()

Now we have a frozen pytorch model to act as frozen embedding function, we want to integrate it in a Scikit-learn pipeline. For this, we need to wrap it in a sklearn.base.TransformerMixin. Unfortunately, Skorch does not implement this kind of estimators, so we have to define it:

from skorch import NeuralNet
from skorch.utils import to_numpy
from sklearn.base import TransformerMixin

class FrozenNeuralNetTransformer(NeuralNet, TransformerMixin):
    def __init__(
            criterion=nn.MSELoss,  # should be unused
            unique_name=None,  # needed for a unique digest in MOABB
        self.unique_name = unique_name

    def fit(self, X, y=None, **fit_params):
        return self  # do nothing

    def transform(self, X):
        X = self.infer(X)
        return to_numpy(X)

    def __repr__(self):
        return super().__repr__() + self.unique_name

And finally, we define a function that flattens all the dimensions of its input except for the first one (i.e. the batch dimension). This way, we will be able to pass those frozen features to Scikit-learn classifiers.

def flatten_batched(X):
    return X.reshape(X.shape[0], -1)

Now, we can combine all the pieces together into a Scikit-learn pipeline:

from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import FunctionTransformer

sklearn_pipeline = Pipeline([
    ('embedding', FrozenNeuralNetTransformer(embedding)),
    ('flatten', FunctionTransformer(flatten_batched)),
    ('classifier', LogisticRegression()),

And if you call the fit method of this pipeline, only the logistic regression will be trained, the embedding will remain frozen:

_ =, y_np)

2.3. Benchmarking with MOABB

Now that we know how to integrate our pre-trained models in Scikit-learn pipelines and train them, we can benchmark them using the MOABB library. In order to get interesting results, we will load a model that was properly trained on a real dataset. In particular, we will load the model that was pre-trained on the dataset Lee2019_MI because it showed good transfer results for the left hand vs right hand classification task. Here, we simply repeat what was done in sections 2.1. and 2.2. option 3, but with the real model:

import pickle

# download the model from the hub:
path_kwargs = hf_hub_download(
path_params = hf_hub_download(
with open(path_kwargs, 'rb') as f:
    kwargs = pickle.load(f)
module_cls = kwargs['module_cls']
module_kwargs = kwargs['module_kwargs']

# load the model with pre-trained weights:
torch_module = module_cls(**module_kwargs)
torch_module.load_state_dict(torch.load(path_params, map_location='cpu'))
embedding = freeze_model(remove_clf_layers(torch_module)).double()

# Integrate the model in a Scikit-learn pipeline:
sklearn_pipeline = Pipeline([
    ('embedding', FrozenNeuralNetTransformer(embedding, unique_name='pretrained_Lee2019')),
    ('flatten', FunctionTransformer(flatten_batched)),
    ('classifier', LogisticRegression()),

To benchmark a pipeline with MOABB, we have to define three components:

  1. The paradigm, i.e. the pre-processing and epoching steps that will be applied to the data;
  2. The datasets on which the pipeline will be evaluated;
  3. And the evaluation procedure, which can be within-session, cross-session or cross-subject.

We will use the same paradigm and evaluation parameters as those we used to obtain the results in the poster. However, we only test on the Zhou2017 dataset because it is relatively small and lightweight.

from moabb.paradigms import MotorImagery
from moabb.datasets import Zhou2016
from moabb.evaluations import WithinSessionEvaluation

paradigm = MotorImagery(
    channels=['C3', 'Cz', 'C4'],  # Same as the ones used to pre-train the embedding
    events=['left_hand', 'right_hand', 'feet'],
datasets = [Zhou2016()]
evaluation = WithinSessionEvaluation(

Now that the paradigm, datasets and evaluation procedure are defined, benchmarking the pipeline is done in one line:

results = evaluation.process(pipelines=dict(demo_pipeline=sklearn_pipeline))

And the results are returned as a pandas dataframe: