Note
Go to the end to download the full example code.
Quick Start¶
End-to-end walkthrough: initialize, query, download, and load data.
This example touches every step of a typical LAION-fMRI workflow:
Initialize a local data directory
Query what is available in the dataset
Download data for a single subject
Load and inspect the data
For deeper dives into each step, see the focused examples on initialization, querying, and loading.
Initialize the data directory¶
Examples 1, 2, and 4 share one data directory so the licenses accepted here, and the data downloaded below, are reused by the other examples without re-prompting or re-downloading.
See also Dataset Initialization for first-time setup and license-acceptance details.
import os
from laion_fmri.config import dataset_initialize, get_data_dir
data_dir = os.environ.get(
"LAION_FMRI_EXAMPLE_DATA_DIR",
os.path.join(os.getcwd(), "laion_fmri_quickstart"),
)
os.makedirs(data_dir, exist_ok=True)
dataset_initialize(data_dir)
print(f"Data directory: {get_data_dir()}")
Query the dataset¶
The bucket is public, so no AWS credentials are needed. Discovery functions read directly from the S3 bucket, so you can see what is available before downloading anything.
See also Querying the Dataset for the full discovery API: subjects, ROIs, splits, OOD partitions, bucket inspection.
from laion_fmri.discovery import describe, get_subjects
print(f"Available subjects: {get_subjects()}")
describe()
Download one subject – but just one session, in parallel¶
A full subject is several tens of GB. download accepts BIDS
entities (ses, task, space, desc, stat,
suffix, extension) as filters, so you can grab just the
slice you need.
About ses: when set to a session ID, only that session’s
files are pulled – subject-level aggregate maps (the per-subject
noise-ceiling variants, the mean-R^2 summaries, etc.) are
excluded. To pull only those aggregates, use the special value
ses="averages"; combine with a list to pull both. The brain
mask is the one exception – it’s automatically included with any
ses filter, since the loader needs it to mask voxels.
n_jobs runs that many aws s3 cp workers in parallel. The
call is also idempotent – re-running after an interrupted
transfer skips files that already match the bucket size, so you
only fetch what’s missing.
The neuroimaging data and the stimuli are covered by two separate
licenses. On the first download you will be prompted twice –
once for each – and you must type I AGREE each time:
Neuroimaging data (CC0 1.0) – unrestricted use.
Stimuli (closed, research-only) – no redistribution, no commercial or AI/ML-training use.
The acceptances are persisted, so the prompts only appear on the first download into a given data directory.
See also Loading Data for the full set of brain-mask, ROI, and noise-ceiling kwargs plus surface ROI loading.
from laion_fmri.download import download
subject_id = "sub-01"
session_id = "ses-01"
print(f"Downloading {subject_id} / {session_id}")
# Most workflows only need the files the loaders read directly --
# trial info, statmaps, and ROI masks. Asking for the suffix
# subset below keeps a session pull around a few hundred MB
# instead of the multi-GB you'd get pulling everything; drop
# ``suffix`` if you also want the raw GLMsingle model dump or
# the JSON sidecars. ``include_anatomical=True`` brings in the
# anatomical T1w used as the backdrop for the visualizations
# below; ``include_freesurfer=True`` pulls the per-subject
# FreeSurfer recon needed by surface / template projections.
download(
subject=subject_id,
ses=session_id,
suffix=["statmap", "trials", "mask"],
include_stimuli=True,
include_freesurfer=True,
include_anatomical=True,
n_jobs=4,
)
Load the subject¶
Once data is on disk, load a Subject
and inspect its sessions and available ROIs. The brain mask
defaults to the anatomically-derived mask shipped under
derivatives/anatomical/. Pass source="rsquare" to
switch to the functional mean-R^2 derived mask instead
(every voxel with a non-zero GLMsingle fit; see
Loading Data for the full cascading-kwarg story).
from laion_fmri.subject import load_subject
sub = load_subject(subject_id)
print(f"Subject: {sub.subject_id}")
print(f"Sessions: {sub.get_sessions()}")
print(
f"Voxels: {sub.get_n_voxels()} "
f"(rsquare: {sub.get_n_voxels(source='rsquare')})"
)
print(f"ROIs: {sub.get_available_rois()}")
Single-trial betas¶
get_betas returns (n_trials, n_voxels) within the brain
mask. For real subjects the brain-mask voxel count is ~270k;
multiplied by ~1000 trials per session, that’s ~1 GB per call.
In practice you should always pass an roi= filter to keep the
array small (face-area ROI, e.g. ~1000 voxels, drops the call to
a few MB).
session = "ses-01"
# Without ROI: heavier, so pass streaming=True to keep peak
# memory at ~50 MB instead of materializing the full 4-D file.
betas_all = sub.get_betas(session=session, streaming=True)
print(f"{session} betas (full mask): {betas_all.shape}")
# Recommended: use an ROI filter. ``streaming=True`` keeps peak
# memory low even when an ROI is set, since the underlying nii.gz
# is otherwise materialized in full before masking.
rois_face = sub.get_available_rois(category="face")
if rois_face:
betas_face = sub.get_betas(
session=session, roi="face", streaming=True,
)
print(f"{session} betas (face ROIs): {betas_face.shape}")
Save a derived map back to NIfTI¶
get_betas returns a 1-D (n_trials, n_voxels) array
masked to brain-mask voxels. To round-trip any per-voxel
statistic you compute back into a 3-D NIfTI on disk (so
external tools can read it), pass the array to
Subject.to_nifti: it scatters the values into a
(X, Y, Z) volume and writes the file.
mean_betas = betas_all.mean(axis=0) # (n_voxels,)
mean_path = f"/tmp/{subject_id}_{session}_trial_mean.nii.gz"
sub.to_nifti(mean_betas, mean_path)
print(f"Saved {mean_path}")
Visualize the first three trials¶
Single-trial betas are inherently noisier than the contrast maps you may be used to: each panel below captures one stimulus presentation rather than an average over many, so don’t expect crisp activation patterns. The view is most useful as a quick sanity check – amplitudes in a reasonable range, signal concentrated in cortex rather than at edges or in white matter, and the three trials looking distinct from each other rather than suspiciously similar.
import warnings
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.colors import Normalize
from nilearn import plotting
# Nilearn warns about NaN / inf voxels from GLMsingle non-fits;
# they're outside the brain mask and don't affect the rendering.
warnings.filterwarnings(
"ignore",
message="Non-finite values detected",
category=UserWarning,
)
bg_img = str(sub.get_t1w())
stat_cmap = sns.diverging_palette(220, 20, as_cmap=True)
trial_paths = []
for i in range(3):
p = f"/tmp/{subject_id}_{session}_trial{i}.nii.gz"
sub.to_nifti(betas_all[i], p)
trial_paths.append(p)
vmax = float(np.nanpercentile(np.abs(betas_all[:3]), 99))
fig = plt.figure(figsize=(16, 4.3))
gs = fig.add_gridspec(2, 1, height_ratios=[1, 0.05], hspace=0.1)
strip_gs = gs[0].subgridspec(1, 3, wspace=0.05)
axes = [fig.add_subplot(strip_gs[0, i]) for i in range(3)]
cbar_ax = fig.add_subplot(gs[1])
for i, ax in enumerate(axes):
plotting.plot_stat_map(
trial_paths[i], bg_img=bg_img, axes=ax,
display_mode="z", cut_coords=[-17],
cmap=stat_cmap, vmax=vmax, colorbar=False,
black_bg=False, threshold=0.5,
)
ax.set_title(f"Trial {i}")
sm = plt.cm.ScalarMappable(
cmap=stat_cmap, norm=Normalize(vmin=-vmax, vmax=vmax),
)
fig.colorbar(
sm, cax=cbar_ax, orientation="horizontal",
label=f"{session} trial β (% signal change)",
)
plt.show()
Three category-selective ROIs¶
Face, body, and place ROIs sit in fairly stereotyped parts of ventral temporal cortex, but each subject’s exact localization differs. Glance at this panel to confirm the masks landed where you’d expect – FFA1 in fusiform gyrus, EBA in lateral occipitotemporal cortex, PPA in parahippocampal cortex – before relying on them to filter downstream analyses.
See also Loading Data for the multi-format ROI
accessor (volume .nii.gz / surface .func.gii /
FreeSurfer .label).
import nibabel as nib
from nilearn.plotting import find_xyz_cut_coords
palette = sns.color_palette("colorblind")
roi_specs = [
("FFA1", palette[0]),
("EBA", palette[1]),
("PPA", palette[2]),
]
fig, axes = plt.subplots(1, 3, figsize=(16, 3.7))
for ax, (roi, color) in zip(axes, roi_specs):
roi_path = f"/tmp/{subject_id}_roi_{roi}.nii.gz"
sub.to_nifti(
sub.get_roi_mask(roi).astype("float32"), roi_path,
)
_, _, z = find_xyz_cut_coords(nib.load(roi_path))
display = plotting.plot_anat(
bg_img, axes=ax,
display_mode="z", cut_coords=[z],
black_bg=False, threshold=0.1, colorbar=False,
)
display.add_contours(
roi_path, levels=[0.5],
colors=[color], linewidths=1.5,
)
ax.set_title(roi)
plt.show()
Per-session noise ceiling¶
nc = sub.get_noise_ceiling(session=session)
print(
f"NC: shape={nc.shape}, "
f"range=[{nc.min():.3f}, {nc.max():.3f}]"
)
Visualize the noise-ceiling map¶
The noise ceiling tells you which voxels are reliably driven by the stimulus set: high values mark voxels where repeated presentations produce consistent responses, low values mark noise. Looking at this map before any decoding or RSA work helps you decide whether to threshold by NC, restrict to high-NC voxels, or stay with ROI-based analyses.
mako_cmap = sns.color_palette("mako", as_cmap=True)
nc_path = f"/tmp/{subject_id}_{session}_nc.nii.gz"
sub.to_nifti(nc, nc_path)
nc_vmax = float(nc.max())
cuts = [-17, -5, 8]
fig = plt.figure(figsize=(16, 4.5))
gs = fig.add_gridspec(2, 1, height_ratios=[1, 0.05], hspace=0.1)
strip_gs = gs[0].subgridspec(1, 3, wspace=0.05)
axes = [fig.add_subplot(strip_gs[0, i]) for i in range(3)]
cbar_ax = fig.add_subplot(gs[1])
for ax, z in zip(axes, cuts):
plotting.plot_stat_map(
nc_path, bg_img=bg_img, axes=ax,
display_mode="z", cut_coords=[z],
cmap=mako_cmap, vmax=nc_vmax, colorbar=False,
black_bg=False, threshold=0.1,
)
sm = plt.cm.ScalarMappable(
cmap=mako_cmap, norm=Normalize(vmin=0, vmax=nc_vmax),
)
fig.colorbar(
sm, cax=cbar_ax, orientation="horizontal",
label=f"{session} noise ceiling (% var. expl.)",
)
plt.show()
Stimulus images¶
include_stimuli=True on the download above already pulled the
stimulus images; sub.images exposes them by global trial
index (0 .. n_total_trials-1, matching the rows of
sub.metadata). The returned object is a PIL.Image.
The matplotlib render below is commented out so the gallery doesn’t redistribute stimulus content – uncomment it to inspect the image locally.
See also Object Segmentations for per-image object-level segmentation masks.
img = sub.images.get(0)
print(f"First trial image: {img.size}")
# fig, ax = plt.subplots(figsize=(4, 4))
# ax.imshow(img)
# ax.set_title("First trial stimulus")
# ax.axis("off")
# plt.show()