laion_fmri.download¶
Download logic for the LAION-fMRI dataset.
Functions
Walk through the CC0 dataset-license acceptance without downloading. |
|
|
Deprecated. |
|
Download fMRI dataset files for a subject, narrowed by BIDS entities. |
|
Download the per-stimulus captions CSV from the public S3 bucket. |
|
Download stimulus embedding HDF5 files from the public S3 bucket. |
|
Download the per-stimulus segmentation masks from the public S3 bucket. |
|
Download the stimuli (HDF5 + metadata CSV). |
|
Walk the user through the form and persist the returned request_id. |
- laion_fmri.download.accept_license()[source]¶
Walk through the CC0 dataset-license acceptance without downloading.
Stimulus terms are no longer accepted locally — they’re handled by the access service. Use
request_stimulus_access()(orlaion-fmri request-access) when you need stimulus images.
- laion_fmri.download.accept_licenses(include_stimuli=False)[source]¶
Deprecated. Use
accept_license()orrequest_stimulus_access()instead.
- laion_fmri.download.download(subject, ses=None, task=None, space=None, desc=None, stat=None, suffix=None, extension=None, include_stimuli=False, include_embeddings=False, include_freesurfer=False, include_anatomical=False, n_jobs=1)[source]¶
Download fMRI dataset files for a subject, narrowed by BIDS entities.
The download is idempotent: a file whose local size already matches the S3 size is skipped, so re-running after an interrupted transfer only fetches what’s missing.
The stimuli is dataset-wide (one HDF5 for all subjects), so it is not subject-keyed. For stimulus-only downloads use the standalone
download_stimuli()function. Theinclude_stimuli=Trueflag here is a convenience that callsdownload_stimuli()after the fMRI fetch completes.- Parameters:
subject (str or "all") – Subject identifier (BIDS ID, e.g.
"sub-01"/"01", or"all"to iterate every subject).ses (str or list[str], optional) – BIDS-entity filters. Each accepts a bare value (
ses="04") or the full BIDS token (ses="ses-04"). A list narrows to multiple values. Files that don’t carry an entity are not excluded by a filter on it (so subject-level summaries survive ases=filter).task (str or list[str], optional) – BIDS-entity filters. Each accepts a bare value (
ses="04") or the full BIDS token (ses="ses-04"). A list narrows to multiple values. Files that don’t carry an entity are not excluded by a filter on it (so subject-level summaries survive ases=filter).space (str or list[str], optional) – BIDS-entity filters. Each accepts a bare value (
ses="04") or the full BIDS token (ses="ses-04"). A list narrows to multiple values. Files that don’t carry an entity are not excluded by a filter on it (so subject-level summaries survive ases=filter).desc (str or list[str], optional) – BIDS-entity filters. Each accepts a bare value (
ses="04") or the full BIDS token (ses="ses-04"). A list narrows to multiple values. Files that don’t carry an entity are not excluded by a filter on it (so subject-level summaries survive ases=filter).stat (str or list[str], optional) – BIDS-entity filters. Each accepts a bare value (
ses="04") or the full BIDS token (ses="ses-04"). A list narrows to multiple values. Files that don’t carry an entity are not excluded by a filter on it (so subject-level summaries survive ases=filter).suffix (str or list[str], optional) – BIDS suffix filter (
"statmap","events", …).extension (str or list[str], optional) – File extension filter (
"nii.gz","tsv", …).include_stimuli (bool) – After the fMRI fetch, also call
download_stimuli()to pull the dataset-wide stimuli. Useful when you want both in a single call. Usedownload_stimuli()directly if you only need the stimuli.include_embeddings (bool or str or list[str]) – After the fMRI fetch, also call
download_embeddings(). PassTruefor all four models, or a model label / list of labels to narrow.False(default) skips the embeddings.include_freesurfer (bool) – If True, also pull the per-subject FreeSurfer recon under
derivatives/freesurfer/{subject}/(a few hundred MB per subject). The recon enablesSubject.to_template– the chain that projects T1w-volume data onto fsaverage / fsLR / MNI templates without external tools.include_anatomical (bool) – If True, also pull the per-subject anatomical derivatives under
derivatives/anatomical/{subject}/ses-PrismaAnat/ anat/(T1w, T2w, brain mask – full-res plusres-1pt8copies aligned with the functional grid). Tens of MB per subject. UnlocksSubject.get_t1w,get_t2w,get_anatomical_brain_mask, andmask_source="anatomical"on the voxel-axis accessors.n_jobs (int) – Number of parallel download workers for fMRI data (AWS CLI copy subprocesses).
1(default) is sequential. Does not affect stimulus downloads.
- Raises:
SubjectNotFoundError – If the subject identifier is invalid.
LicenseNotAcceptedError – If the CC0 dataset license is declined.
AccessServiceError – If
include_stimuli=Trueand the stimulus access service rejects the request or a download fails.TermsOutdatedError – If
include_stimuli=Trueand the server’s current Terms of Use version differs from the version on the cachedrequest_id.
- laion_fmri.download.download_captions(data_dir=None)[source]¶
Download the per-stimulus captions CSV from the public S3 bucket.
Pulls
task-images_desc-captions.csvinto<data_dir>/stimuli/. The file is a dataset-wide stimulus metadata derivative: shared images have five human captions, shared non-OOD images have one AI caption, and unique images have three human captions and no AI caption.The download is idempotent: a file whose local size matches the S3 size is skipped, so re-running an interrupted transfer only fetches what’s missing.
- Parameters:
data_dir (str or Path, optional) – Override the configured data directory.
- Returns:
Local path to
task-images_desc-captions.csv.- Return type:
- laion_fmri.download.download_embeddings(models='all', data_dir=None, n_jobs=1)[source]¶
Download stimulus embedding HDF5 files from the public S3 bucket.
The embeddings are dataset-wide derivatives (one set of files for all subjects), shipped under the same CC0 license as the rest of the fMRI data — no Data Use Agreement, no signed URLs.
The download is idempotent: files whose local size matches the S3 size are skipped, so re-running an interrupted transfer only fetches what’s missing.
- Parameters:
One of:
"all"(default) — download every model inlaion_fmri.embeddings.AVAILABLE_MODELS.a single label, e.g.
"CLIP".a list of labels, e.g.
["CLIP", "DINOv2"].
data_dir (str or Path, optional) – Override the configured data directory.
n_jobs (int) – Number of parallel AWS CLI copy workers.
1(default) is sequential.
- Returns:
Mapping of model label to local file path for each requested model.
- Return type:
- laion_fmri.download.download_segmentations(data_dir=None)[source]¶
Download the per-stimulus segmentation masks from the public S3 bucket.
Pulls two sibling files into
<data_dir>/stimuli/:task-images_desc-segmentations.h5— stacked(N, H, W)uint8 maskstask-images_desc-segmentations_metadata.csv— one row per mask
These are dataset-wide derivatives (one set of files for all subjects), shipped under the same CC0 license as the rest of the fMRI data — no Data Use Agreement, no signed URLs.
The download is idempotent: files whose local size matches the S3 size are skipped, so re-running an interrupted transfer only fetches what’s missing.
- Parameters:
data_dir (str or Path, optional) – Override the configured data directory.
- Returns:
Mapping of
{"h5": ..., "metadata": ...}to local file paths.- Return type:
- laion_fmri.download.download_stimuli(data_dir=None, server_url='https://laion-fmri.hebartlab.com')[source]¶
Download the stimuli (HDF5 + metadata CSV).
The stimuli is a single HDF5 covering all subjects — it is dataset-wide, not per-subject — so this function takes no subject argument.
Network behaviour:
Always starts with the public manifest endpoint to find out what the current files are and their sha256s. No authentication involved.
If the local files already match the manifest, the function returns immediately. No access-service call, no auth state needed. This is why a cluster job can just rsync the data dir from your laptop and call
download_stimuli()without ever copyingauth.json— the package sees the files are correct and short-circuits.Only when at least one file is missing or has the wrong sha256 does the function reach for the access service: if no cached
request_idis present, it walks the user through the Data Use Agreement form; otherwise it re-mints URLs via/api/v1/refreshand downloads what’s missing.
- Parameters:
- Returns:
Mapping of file name to local
pathlib.Pathfor the downloaded files.- Return type:
- Raises:
AccessServiceError – If the access service rejects the request or a download fails.
TermsOutdatedError – If the cached request_id needs to re-accept an updated ToU.