laion_fmri.download

Download logic for the LAION-fMRI dataset.

Functions

accept_license()

Walk through the CC0 dataset-license acceptance without downloading.

accept_licenses([include_stimuli])

Deprecated.

download(subject[, ses, task, space, desc, ...])

Download fMRI dataset files for a subject, narrowed by BIDS entities.

download_captions([data_dir])

Download the per-stimulus captions CSV from the public S3 bucket.

download_embeddings([models, data_dir, n_jobs])

Download stimulus embedding HDF5 files from the public S3 bucket.

download_segmentations([data_dir])

Download the per-stimulus segmentation masks from the public S3 bucket.

download_stimuli([data_dir, server_url])

Download the stimuli (HDF5 + metadata CSV).

request_stimulus_access([server_url])

Walk the user through the form and persist the returned request_id.

laion_fmri.download.accept_license()[source]

Walk through the CC0 dataset-license acceptance without downloading.

Stimulus terms are no longer accepted locally — they’re handled by the access service. Use request_stimulus_access() (or laion-fmri request-access) when you need stimulus images.

laion_fmri.download.accept_licenses(include_stimuli=False)[source]

Deprecated. Use accept_license() or request_stimulus_access() instead.

laion_fmri.download.download(subject, ses=None, task=None, space=None, desc=None, stat=None, suffix=None, extension=None, include_stimuli=False, include_embeddings=False, include_freesurfer=False, include_anatomical=False, n_jobs=1)[source]

Download fMRI dataset files for a subject, narrowed by BIDS entities.

The download is idempotent: a file whose local size already matches the S3 size is skipped, so re-running after an interrupted transfer only fetches what’s missing.

The stimuli is dataset-wide (one HDF5 for all subjects), so it is not subject-keyed. For stimulus-only downloads use the standalone download_stimuli() function. The include_stimuli=True flag here is a convenience that calls download_stimuli() after the fMRI fetch completes.

Parameters:
  • subject (str or "all") – Subject identifier (BIDS ID, e.g. "sub-01" / "01", or "all" to iterate every subject).

  • ses (str or list[str], optional) – BIDS-entity filters. Each accepts a bare value (ses="04") or the full BIDS token (ses="ses-04"). A list narrows to multiple values. Files that don’t carry an entity are not excluded by a filter on it (so subject-level summaries survive a ses= filter).

  • task (str or list[str], optional) – BIDS-entity filters. Each accepts a bare value (ses="04") or the full BIDS token (ses="ses-04"). A list narrows to multiple values. Files that don’t carry an entity are not excluded by a filter on it (so subject-level summaries survive a ses= filter).

  • space (str or list[str], optional) – BIDS-entity filters. Each accepts a bare value (ses="04") or the full BIDS token (ses="ses-04"). A list narrows to multiple values. Files that don’t carry an entity are not excluded by a filter on it (so subject-level summaries survive a ses= filter).

  • desc (str or list[str], optional) – BIDS-entity filters. Each accepts a bare value (ses="04") or the full BIDS token (ses="ses-04"). A list narrows to multiple values. Files that don’t carry an entity are not excluded by a filter on it (so subject-level summaries survive a ses= filter).

  • stat (str or list[str], optional) – BIDS-entity filters. Each accepts a bare value (ses="04") or the full BIDS token (ses="ses-04"). A list narrows to multiple values. Files that don’t carry an entity are not excluded by a filter on it (so subject-level summaries survive a ses= filter).

  • suffix (str or list[str], optional) – BIDS suffix filter ("statmap", "events", …).

  • extension (str or list[str], optional) – File extension filter ("nii.gz", "tsv", …).

  • include_stimuli (bool) – After the fMRI fetch, also call download_stimuli() to pull the dataset-wide stimuli. Useful when you want both in a single call. Use download_stimuli() directly if you only need the stimuli.

  • include_embeddings (bool or str or list[str]) – After the fMRI fetch, also call download_embeddings(). Pass True for all four models, or a model label / list of labels to narrow. False (default) skips the embeddings.

  • include_freesurfer (bool) – If True, also pull the per-subject FreeSurfer recon under derivatives/freesurfer/{subject}/ (a few hundred MB per subject). The recon enables Subject.to_template – the chain that projects T1w-volume data onto fsaverage / fsLR / MNI templates without external tools.

  • include_anatomical (bool) – If True, also pull the per-subject anatomical derivatives under derivatives/anatomical/{subject}/ses-PrismaAnat/ anat/ (T1w, T2w, brain mask – full-res plus res-1pt8 copies aligned with the functional grid). Tens of MB per subject. Unlocks Subject.get_t1w, get_t2w, get_anatomical_brain_mask, and mask_source="anatomical" on the voxel-axis accessors.

  • n_jobs (int) – Number of parallel download workers for fMRI data (AWS CLI copy subprocesses). 1 (default) is sequential. Does not affect stimulus downloads.

Raises:
  • SubjectNotFoundError – If the subject identifier is invalid.

  • LicenseNotAcceptedError – If the CC0 dataset license is declined.

  • AccessServiceError – If include_stimuli=True and the stimulus access service rejects the request or a download fails.

  • TermsOutdatedError – If include_stimuli=True and the server’s current Terms of Use version differs from the version on the cached request_id.

laion_fmri.download.download_captions(data_dir=None)[source]

Download the per-stimulus captions CSV from the public S3 bucket.

Pulls task-images_desc-captions.csv into <data_dir>/stimuli/. The file is a dataset-wide stimulus metadata derivative: shared images have five human captions, shared non-OOD images have one AI caption, and unique images have three human captions and no AI caption.

The download is idempotent: a file whose local size matches the S3 size is skipped, so re-running an interrupted transfer only fetches what’s missing.

Parameters:

data_dir (str or Path, optional) – Override the configured data directory.

Returns:

Local path to task-images_desc-captions.csv.

Return type:

pathlib.Path

laion_fmri.download.download_embeddings(models='all', data_dir=None, n_jobs=1)[source]

Download stimulus embedding HDF5 files from the public S3 bucket.

The embeddings are dataset-wide derivatives (one set of files for all subjects), shipped under the same CC0 license as the rest of the fMRI data — no Data Use Agreement, no signed URLs.

The download is idempotent: files whose local size matches the S3 size are skipped, so re-running an interrupted transfer only fetches what’s missing.

Parameters:
  • models (str or list[str]) –

    One of:

  • data_dir (str or Path, optional) – Override the configured data directory.

  • n_jobs (int) – Number of parallel AWS CLI copy workers. 1 (default) is sequential.

Returns:

Mapping of model label to local file path for each requested model.

Return type:

dict[str, pathlib.Path]

laion_fmri.download.download_segmentations(data_dir=None)[source]

Download the per-stimulus segmentation masks from the public S3 bucket.

Pulls two sibling files into <data_dir>/stimuli/:

  • task-images_desc-segmentations.h5 — stacked (N, H, W) uint8 masks

  • task-images_desc-segmentations_metadata.csv — one row per mask

These are dataset-wide derivatives (one set of files for all subjects), shipped under the same CC0 license as the rest of the fMRI data — no Data Use Agreement, no signed URLs.

The download is idempotent: files whose local size matches the S3 size are skipped, so re-running an interrupted transfer only fetches what’s missing.

Parameters:

data_dir (str or Path, optional) – Override the configured data directory.

Returns:

Mapping of {"h5": ..., "metadata": ...} to local file paths.

Return type:

dict[str, pathlib.Path]

laion_fmri.download.download_stimuli(data_dir=None, server_url='https://laion-fmri.hebartlab.com')[source]

Download the stimuli (HDF5 + metadata CSV).

The stimuli is a single HDF5 covering all subjects — it is dataset-wide, not per-subject — so this function takes no subject argument.

Network behaviour:

  • Always starts with the public manifest endpoint to find out what the current files are and their sha256s. No authentication involved.

  • If the local files already match the manifest, the function returns immediately. No access-service call, no auth state needed. This is why a cluster job can just rsync the data dir from your laptop and call download_stimuli() without ever copying auth.json — the package sees the files are correct and short-circuits.

  • Only when at least one file is missing or has the wrong sha256 does the function reach for the access service: if no cached request_id is present, it walks the user through the Data Use Agreement form; otherwise it re-mints URLs via /api/v1/refresh and downloads what’s missing.

Parameters:
  • data_dir (str or Path, optional) – Override the configured data directory.

  • server_url (str) – Override the access service URL (default: production).

Returns:

Mapping of file name to local pathlib.Path for the downloaded files.

Return type:

dict

Raises:
  • AccessServiceError – If the access service rejects the request or a download fails.

  • TermsOutdatedError – If the cached request_id needs to re-accept an updated ToU.

laion_fmri.download.request_stimulus_access(server_url='https://laion-fmri.hebartlab.com')[source]

Walk the user through the form and persist the returned request_id.

Returns the response dict (request_id, expires_at, files).