laion_fmri.subject

Subject class for accessing per-subject data files.

Every accessor maps to exactly one file in the bucket layout: no averaging, concatenation, or rebinning across sessions, with one exception – Subject.metadata aggregates the per-session trial TSVs into one trial table for convenience.

Functions

load_subject(subject)

Load a subject by BIDS ID or integer index.

laion_fmri.subject.load_subject(subject)[source]

Load a subject by BIDS ID or integer index.

Parameters:

subject (int or str)

Return type:

Subject

Raises:
  • SubjectNotFoundError – If the subject identifier is invalid.

  • DataNotDownloadedError – If the subject’s data has not been downloaded.

Classes

Subject(subject_id, data_dir)

Access loaded data files for a single subject.

class laion_fmri.subject.Subject(subject_id, data_dir)[source]

Bases: object

Access loaded data files for a single subject.

Parameters:
  • subject_id (str) – BIDS subject ID.

  • data_dir (str) – Path to the local data directory.

property captions

Per-trial human captions, plus shared non-OOD AI captions.

property embeddings

Per-trial pretrained embeddings (CLIP, DINOv2, …).

get_anatomical_brain_mask(*, res=None)[source]

Return the path to the anatomically-derived brain mask.

Distinct from get_brain_mask(), which returns the rsquare-derived mask as a flat boolean array on the subject’s brain-mask voxels.

get_anatomical_dir()[source]

Return the path to this subject’s anatomical derivatives.

Raises:

DataNotDownloadedError – If the anatomical directory does not exist on disk.

get_available_categories()[source]

Return sorted list of ROI category directory names.

get_available_rois(category=None)[source]

Return sorted list of ROI names available on disk.

Parameters:

category (str or None) – Restrict to ROIs in this category subdirectory.

Returns:

Sorted ROI names (BIDS-clean form).

Return type:

list[str]

get_betas(session, roi=None, mask=None, nc_threshold=None, stimuli=None, streaming=False, mask_source='anatomical')[source]

Load single-trial betas for one or more sessions.

Parameters:
  • session (str, list of str) – BIDS session ID. A list returns a dict keyed by session ID, since trial counts may differ per session. Single-trial betas live per session in the bucket, so the caller must pick which sessions to load.

  • roi (str, list[str], or None) – Named ROI(s) for voxel selection (union if list).

  • mask (np.ndarray[bool] or None) – Custom boolean mask over brain-mask voxels.

  • nc_threshold (float or None) – Minimum per-session noise ceiling to keep a voxel.

  • stimuli ("shared", "unique", or None) – Trial-level filter using the stimulus-metadata shared flag.

  • mask_source ("anatomical" (default) | "rsquare") – Which brain mask to filter the voxel axis on; see get_brain_mask() for the difference.

  • streaming (bool) – If False (default), the full 4-D NIfTI is materialized in RAM and then masked. Decompresses any .nii.gz once; peak memory is the full file (~12 GB for a real session) plus the masked output. Best when you have plenty of RAM. If True, the file is streamed volume-by-volume and the combined brain + ROI + NC mask is applied inline: peak memory is one volume (~10-50 MB) plus the masked output. Use this on memory-constrained machines like Colab. Works on both .nii (nibabel-managed per-volume reads) and .nii.gz (a custom gzip pipeline that never re-decompresses).

Returns:

(n_trials, n_selected_voxels) for a single session; a {session: array} dict for a list. Values are GLMsingle single-trial β estimates in percent-signal-change units. Voxels that GLMsingle did not model (failed fit) arrive as NaN – that’s the signal for “no estimate available”, distinct from “estimate is 0”. Handle them at the caller’s analysis layer.

Return type:

np.ndarray or dict[str, np.ndarray]

get_brain_mask(source='anatomical', res='1pt8')[source]

Load the subject’s brain mask as a flat boolean array.

Parameters:
  • source ("anatomical" (default) | "rsquare") – "anatomical" uses the brain mask in derivatives/anatomical/.../desc-brain_mask.nii.gz – a wider, anatomically-derived mask. Pull it with download(include_anatomical=True). "rsquare" derives the mask from the subject-level mean-R^2 map (voxels with any non-zero GLMsingle fit; the bucket ships R2mean as ..._stat-rsquare_desc-R2mean_statmap.nii.gz).

  • res ("1pt8" (default) | None) – Anatomical-mask resolution. "1pt8" matches the functional grid, so the returned mask aligns with the voxel axis of get_betas / get_noise_ceiling and with the rsquare-derived mask. None loads the full-resolution anatomical mask; the returned 1-D array is larger and will not align with the loader cascade. Ignored for source="rsquare" (the rsquare-derived mask is published at one resolution only).

Returns:

1-D boolean array over the full image grid.

Return type:

np.ndarray

get_freesurfer_dir()[source]

Return the path to this subject’s FreeSurfer recon.

Raises:

DataNotDownloadedError – If the recon directory does not exist on disk.

get_n_stimuli(stimuli=None)[source]

Return number of stimuli described in the metadata CSV.

Parameters:

stimuli ("shared", "unique", or None)

get_n_voxels(source='anatomical', res='1pt8')[source]

Number of voxels in the subject’s brain mask.

source and res mirror get_brain_mask(); see its docstring for the available values.

get_noise_ceiling(session=None, desc=None, roi=None, mask=None, mask_source='anatomical')[source]

Load a noise-ceiling map.

Exactly one of session or desc must be set:

  • session="ses-01" -> per-session NC NIfTI.

  • desc="noiseceiling33ses" -> the subject-level aggregate NC NIfTI with the given desc-... token.

Either argument also accepts a list, in which case the return value is a dict keyed by session ID / desc token.

Parameters:
Returns:

Noise ceiling in percent variance explained (0-100, GLMsingle convention). Threshold near 10-20 % keeps reliably driven voxels.

Return type:

np.ndarray or dict[str, np.ndarray]

get_roi_data(query, format=None, hemi=None, mask_source='anatomical')[source]

Load multi-format ROI data: volume, surface, FreeSurfer label.

Parameters:
  • query (str or list[str]) – Multi-level ROI query (see get_roi_mask).

  • format (str or None) – One of "all", "volume" / "nii.gz" (synonyms), "gii" (per-hemi func.gii + label), "func.gii" (per-hemi surface mask only), "label" (per-hemi FreeSurfer label only). None means "all".

  • hemi (str or None) – One of "L", "R", or "all" (default). Ignored when format resolves to volume only.

Returns:

Top-level dict keyed by ROI name. Each value is a nested dict shaped:

{
    "volume": <1-D bool ndarray>,
    "gii": {
        "hemi-L": {"func.gii": ..., "label": ...},
        "hemi-R": {...},
    },
}

Format/hemi filters prune this tree.

Return type:

dict

get_roi_mask(query, mask_source='anatomical')[source]

Load one or more ROI masks, restricted to brain-mask voxels.

query accepts the multi-level grammar:

  • a specific ROI name ("FFA1");

  • a category name ("face") – expands to every ROI in that category;

  • "all" – expands to every ROI on disk;

  • a list mixing any of the above.

Multi-element resolutions are unioned voxel-wise. Always returns one 1-D bool array within the brain mask.

mask_source selects which brain mask the result is indexed within; see get_brain_mask().

get_roi_masks(queries, mask_source='anatomical')[source]

Load several ROI masks at once.

queries is a list (or single string). Each element is passed verbatim to get_roi_mask; the returned dict is keyed by the user’s strings, so categories and “all” appear as their original keys with a union mask as value.

mask_source is forwarded to get_roi_mask().

get_sessions()[source]

Return sorted list of available session IDs.

get_t1w(*, res=None)[source]

Return the path to this subject’s anatomical T1w volume.

res=None returns the full-resolution image; res="1pt8" returns the variant on the functional grid.

get_t2w(*, res=None)[source]

Return the path to this subject’s anatomical T2w volume.

get_trial_info(session)[source]

Load the events TSV for one or more sessions.

Parameters:

session (str or list of str) – Required – events live per session in the bucket. A list returns a dict keyed by session ID.

Return type:

pd.DataFrame or dict[str, pd.DataFrame]

get_voxel_coordinates(roi=None, mask=None, mask_source='anatomical')[source]

Return MNI/T1w coordinates for the selected voxels.

mask_source picks which brain mask defines “selected voxels”; see get_brain_mask().

has_anatomical()[source]

Return True if this subject’s anatomical derivatives are on disk.

Anatomical files live under derivatives/anatomical/{subject}/ses-PrismaAnat/anat/ and ship T1w / T2w volumes plus a brain mask at two resolutions (full and res-1pt8).

has_freesurfer()[source]

Return True if the per-subject FreeSurfer recon is on disk.

The recon ships under derivatives/freesurfer/{subject}/; pull it with download(..., include_freesurfer=True). Required by to_template() to project T1w-volume data onto fsaverage / fsLR / MNI templates.

has_stimuli()[source]

Return True if the stimuli (HDF5 + CSV) are on disk.

Useful as a guard before touching stimulus-side data (metadata, images, embeddings, segmentations, captions, to_torch_dataset()) when the archive hasn’t been downloaded yet.

property images

Per-trial stimulus images (PIL + raw bytes).

property metadata

Trial table for this subject, concatenated across all sessions.

One row per single-trial beta. Columns include everything from the per-session events TSV plus the derived columns session, session_trial, image_name, stim_idx, unique_or_shared, and dataset.

Returns:

Indexed 0..n_total_trials-1. Each row’s index is the “global trial index” used by images, embeddings, segmentations, and captions.

Return type:

pandas.DataFrame

property segmentations

Per-trial object-segmentation masks (shared images only).

property subject_id

Return the BIDS subject ID (e.g. "sub-03").

surface_to_template(values, target='fsaverage', **kwargs)[source]

fsnative-surface input → surface target.

Accepts a single hemi array (with hemi="L"/"R") or a {"L": ..., "R": ...} dict; returns the same shape.

to_nifti(values, output_path, roi=None, mask=None, mask_source='anatomical')[source]

Write a per-voxel array to a 3-D NIfTI volume.

values is sized to the brain mask selected by mask_source (default anatomical-derived).

to_template(values, target, **kwargs)[source]

Project T1w-space values into a template / reference space.

Forwards to laion_fmri.templates.to_template(); see that function’s docstring for the full kwargs surface (hemi, route, surface, fsaverage_density, interpolation, output_dir, desc, session).

Requires the optional [template] extra; ImportError is raised at call time if any of nilearn / nitransforms / templateflow is missing.

to_torch_dataset(**kwargs)[source]

Wrap this subject as a torch.utils.data.Dataset.

volume_to_surface(values, target='fsaverage', **kwargs)[source]

Volume input → surface target (currently "fsaverage").

volume_to_template(values, target, **kwargs)[source]

Volume input → volume target (MNI variants).