laion_fmri.subject

Subject class for accessing per-subject data files.

Every accessor maps to exactly one file in the bucket layout: no averaging, concatenation, or rebinning across sessions.

Functions

load_subject(subject)

Load a subject by BIDS ID or integer index.

laion_fmri.subject.load_subject(subject)[source]

Load a subject by BIDS ID or integer index.

Parameters:

subject (int or str)

Return type:

Subject

Raises:
  • SubjectNotFoundError – If the subject identifier is invalid.

  • DataNotDownloadedError – If the subject’s data has not been downloaded.

Classes

Subject(subject_id, data_dir)

Access loaded data files for a single subject.

class laion_fmri.subject.Subject(subject_id, data_dir)[source]

Bases: object

Access loaded data files for a single subject.

Parameters:
  • subject_id (str) – BIDS subject ID.

  • data_dir (str) – Path to the local data directory.

get_available_categories()[source]

Return sorted list of ROI category directory names.

get_available_rois(category=None)[source]

Return sorted list of ROI names available on disk.

Parameters:

category (str or None) – Restrict to ROIs in this category subdirectory.

Returns:

Sorted ROI names (BIDS-clean form).

Return type:

list[str]

get_betas(session, roi=None, mask=None, nc_threshold=None, stimuli=None, streaming=False)[source]

Load single-trial betas for one or more sessions.

Parameters:
  • session (str, list of str) – BIDS session ID. A list returns a dict keyed by session ID, since trial counts may differ per session. Single-trial betas live per session in the bucket, so the caller must pick which sessions to load.

  • roi (str, list[str], or None) – Named ROI(s) for voxel selection (union if list).

  • mask (np.ndarray[bool] or None) – Custom boolean mask over brain-mask voxels.

  • nc_threshold (float or None) – Minimum per-session noise ceiling to keep a voxel.

  • stimuli ("shared", "unique", or None) – Trial-level filter using the stimulus-metadata shared flag.

  • streaming (bool) – If False (default), materialize the full 4-D NIfTI up front and mask per volume. Decompresses any .nii.gz once and is the right choice for the bucket’s compressed files; peak memory is the full file plus the masked output (~12 GB for a real session). If True, read one volume at a time – peak memory stays at one volume plus the masked output. Streaming is only fast on raw uncompressed .nii files; on .nii.gz it re-decompresses on every slice and slows to a crawl.

Returns:

(n_trials, n_selected_voxels) for a single session; a {session: array} dict for a list.

Return type:

np.ndarray or dict[str, np.ndarray]

get_brain_mask()[source]

Load the subject’s brain mask as a flat boolean array.

Derived from the subject-level mean-R^2 map: every voxel where the GLMsingle model has any non-zero fit. The bucket ships R2mean as ..._stat-rsquare_desc-R2mean_ statmap.nii.gz rather than a pre-computed mask file.

Returns:

1-D boolean array over the full image grid.

Return type:

np.ndarray

get_image(idx)[source]

Load a single stimulus image by index.

get_images(stimuli=None, format='pil')[source]

Load stimulus images (when stimuli/ is populated).

get_n_stimuli(stimuli=None)[source]

Return number of stimuli described in stimuli.tsv.

Parameters:

stimuli ("shared", "unique", or None)

get_n_voxels()[source]

Number of voxels in the subject’s brain mask.

get_noise_ceiling(session=None, desc=None, roi=None, mask=None)[source]

Load a noise-ceiling map.

Exactly one of session or desc must be set:

  • session="ses-01" -> per-session NC NIfTI.

  • desc="noiseceiling33ses" -> the subject-level aggregate NC NIfTI with the given desc-... token.

Either argument also accepts a list, in which case the return value is a dict keyed by session ID / desc token.

Parameters:
Return type:

np.ndarray or dict[str, np.ndarray]

get_roi_data(query, format=None, hemi=None)[source]

Load multi-format ROI data: volume, surface, FreeSurfer label.

Parameters:
  • query (str or list[str]) – Multi-level ROI query (see get_roi_mask).

  • format (str or None) – One of "all", "volume" / "nii.gz" (synonyms), "gii" (per-hemi func.gii + label), "func.gii" (per-hemi surface mask only), "label" (per-hemi FreeSurfer label only). None means "all".

  • hemi (str or None) – One of "L", "R", or "all" (default). Ignored when format resolves to volume only.

Returns:

Top-level dict keyed by ROI name. Each value is a nested dict shaped:

{
    "volume": <1-D bool ndarray>,
    "gii": {
        "hemi-L": {"func.gii": ..., "label": ...},
        "hemi-R": {...},
    },
}

Format/hemi filters prune this tree.

Return type:

dict

get_roi_mask(query)[source]

Load one or more ROI masks, restricted to brain-mask voxels.

query accepts the multi-level grammar:

  • a specific ROI name ("FFA1");

  • a category name ("face") – expands to every ROI in that category;

  • "all" – expands to every ROI on disk;

  • a list mixing any of the above.

Multi-element resolutions are unioned voxel-wise. Always returns one 1-D bool array within the brain mask.

get_roi_masks(queries)[source]

Load several ROI masks at once.

queries is a list (or single string). Each element is passed verbatim to get_roi_mask; the returned dict is keyed by the user’s strings, so categories and “all” appear as their original keys with a union mask as value.

get_sessions()[source]

Return sorted list of available session IDs.

get_stimulus_metadata()[source]

Load the dataset-wide stimulus metadata TSV.

get_trial_info(session)[source]

Load the events TSV for one or more sessions.

Parameters:

session (str or list of str) – Required – events live per session in the bucket. A list returns a dict keyed by session ID.

Return type:

pd.DataFrame or dict[str, pd.DataFrame]

get_trial_stimulus_indices(session)[source]

Map each trial to its stimulus-metadata row index.

Parameters:

session (str or list of str) – A list returns a dict keyed by session ID.

Return type:

np.ndarray or dict[str, np.ndarray]

get_voxel_coordinates(roi=None, mask=None)[source]

Return MNI/T1w coordinates for the selected voxels.

has_stimuli()[source]

Return True if stimulus metadata + images are on disk.

Useful as a guard before calling stimulus-dependent methods (get_n_stimuli, get_stimulus_metadata, get_images, get_trial_stimulus_indices, to_torch_dataset) when the bucket’s stimuli/ prefix is not yet populated.

property subject_id

Return the BIDS subject ID (e.g. "sub-03").

to_nifti(values, output_path, roi=None, mask=None)[source]

Write a per-voxel array to a 3-D NIfTI volume.

to_torch_dataset(**kwargs)[source]

Wrap this subject as a torch.utils.data.Dataset.