laion_fmri.subject¶
Subject class for accessing per-subject data files.
Every accessor maps to exactly one file in the bucket layout: no
averaging, concatenation, or rebinning across sessions, with one
exception – Subject.metadata aggregates the per-session trial
TSVs into one trial table for convenience.
Functions
|
Load a subject by BIDS ID or integer index. |
Classes
|
Access loaded data files for a single subject. |
- class laion_fmri.subject.Subject(subject_id, data_dir)[source]¶
Bases:
objectAccess loaded data files for a single subject.
- property captions¶
Per-trial human captions, plus shared non-OOD AI captions.
- property embeddings¶
Per-trial pretrained embeddings (CLIP, DINOv2, …).
- get_anatomical_brain_mask(*, res=None)[source]¶
Return the path to the anatomically-derived brain mask.
Distinct from
get_brain_mask(), which returns the rsquare-derived mask as a flat boolean array on the subject’s brain-mask voxels.
- get_anatomical_dir()[source]¶
Return the path to this subject’s anatomical derivatives.
- Raises:
DataNotDownloadedError – If the anatomical directory does not exist on disk.
- get_betas(session, roi=None, mask=None, nc_threshold=None, stimuli=None, streaming=False, mask_source='anatomical')[source]¶
Load single-trial betas for one or more sessions.
- Parameters:
session (str, list of str) – BIDS session ID. A list returns a dict keyed by session ID, since trial counts may differ per session. Single-trial betas live per session in the bucket, so the caller must pick which sessions to load.
roi (str, list[str], or None) – Named ROI(s) for voxel selection (union if list).
mask (np.ndarray[bool] or None) – Custom boolean mask over brain-mask voxels.
nc_threshold (float or None) – Minimum per-session noise ceiling to keep a voxel.
stimuli ("shared", "unique", or None) – Trial-level filter using the stimulus-metadata
sharedflag.mask_source (
"anatomical"(default) |"rsquare") – Which brain mask to filter the voxel axis on; seeget_brain_mask()for the difference.streaming (bool) – If False (default), the full 4-D NIfTI is materialized in RAM and then masked. Decompresses any
.nii.gzonce; peak memory is the full file (~12 GB for a real session) plus the masked output. Best when you have plenty of RAM. If True, the file is streamed volume-by-volume and the combined brain + ROI + NC mask is applied inline: peak memory is one volume (~10-50 MB) plus the masked output. Use this on memory-constrained machines like Colab. Works on both.nii(nibabel-managed per-volume reads) and.nii.gz(a custom gzip pipeline that never re-decompresses).
- Returns:
(n_trials, n_selected_voxels)for a single session; a{session: array}dict for a list. Values are GLMsingle single-trial β estimates in percent-signal-change units. Voxels that GLMsingle did not model (failed fit) arrive asNaN– that’s the signal for “no estimate available”, distinct from “estimate is 0”. Handle them at the caller’s analysis layer.- Return type:
- get_brain_mask(source='anatomical', res='1pt8')[source]¶
Load the subject’s brain mask as a flat boolean array.
- Parameters:
source (
"anatomical"(default) |"rsquare") –"anatomical"uses the brain mask inderivatives/anatomical/.../desc-brain_mask.nii.gz– a wider, anatomically-derived mask. Pull it withdownload(include_anatomical=True)."rsquare"derives the mask from the subject-level mean-R^2 map (voxels with any non-zero GLMsingle fit; the bucket ships R2mean as..._stat-rsquare_desc-R2mean_statmap.nii.gz).res (
"1pt8"(default) |None) – Anatomical-mask resolution."1pt8"matches the functional grid, so the returned mask aligns with the voxel axis ofget_betas/get_noise_ceilingand with the rsquare-derived mask.Noneloads the full-resolution anatomical mask; the returned 1-D array is larger and will not align with the loader cascade. Ignored forsource="rsquare"(the rsquare-derived mask is published at one resolution only).
- Returns:
1-D boolean array over the full image grid.
- Return type:
np.ndarray
- get_freesurfer_dir()[source]¶
Return the path to this subject’s FreeSurfer recon.
- Raises:
DataNotDownloadedError – If the recon directory does not exist on disk.
- get_n_stimuli(stimuli=None)[source]¶
Return number of stimuli described in the metadata CSV.
- Parameters:
stimuli ("shared", "unique", or None)
- get_n_voxels(source='anatomical', res='1pt8')[source]¶
Number of voxels in the subject’s brain mask.
sourceandresmirrorget_brain_mask(); see its docstring for the available values.
- get_noise_ceiling(session=None, desc=None, roi=None, mask=None, mask_source='anatomical')[source]¶
Load a noise-ceiling map.
Exactly one of
sessionordescmust be set:session="ses-01"-> per-session NC NIfTI.desc="noiseceiling33ses"-> the subject-level aggregate NC NIfTI with the givendesc-...token.
Either argument also accepts a list, in which case the return value is a dict keyed by session ID / desc token.
- Parameters:
- Returns:
Noise ceiling in percent variance explained (0-100, GLMsingle convention). Threshold near 10-20 % keeps reliably driven voxels.
- Return type:
- get_roi_data(query, format=None, hemi=None, mask_source='anatomical')[source]¶
Load multi-format ROI data: volume, surface, FreeSurfer label.
- Parameters:
query (str or list[str]) – Multi-level ROI query (see
get_roi_mask).format (str or None) – One of
"all","volume"/"nii.gz"(synonyms),"gii"(per-hemi func.gii + label),"func.gii"(per-hemi surface mask only),"label"(per-hemi FreeSurfer label only).Nonemeans"all".hemi (str or None) – One of
"L","R", or"all"(default). Ignored whenformatresolves to volume only.
- Returns:
Top-level dict keyed by ROI name. Each value is a nested dict shaped:
{ "volume": <1-D bool ndarray>, "gii": { "hemi-L": {"func.gii": ..., "label": ...}, "hemi-R": {...}, }, }
Format/hemi filters prune this tree.
- Return type:
- get_roi_mask(query, mask_source='anatomical')[source]¶
Load one or more ROI masks, restricted to brain-mask voxels.
queryaccepts the multi-level grammar:a specific ROI name (
"FFA1");a category name (
"face") – expands to every ROI in that category;"all"– expands to every ROI on disk;a list mixing any of the above.
Multi-element resolutions are unioned voxel-wise. Always returns one 1-D bool array within the brain mask.
mask_sourceselects which brain mask the result is indexed within; seeget_brain_mask().
- get_roi_masks(queries, mask_source='anatomical')[source]¶
Load several ROI masks at once.
queriesis a list (or single string). Each element is passed verbatim toget_roi_mask; the returned dict is keyed by the user’s strings, so categories and “all” appear as their original keys with a union mask as value.mask_sourceis forwarded toget_roi_mask().
- get_t1w(*, res=None)[source]¶
Return the path to this subject’s anatomical T1w volume.
res=Nonereturns the full-resolution image;res="1pt8"returns the variant on the functional grid.
- get_voxel_coordinates(roi=None, mask=None, mask_source='anatomical')[source]¶
Return MNI/T1w coordinates for the selected voxels.
mask_sourcepicks which brain mask defines “selected voxels”; seeget_brain_mask().
- has_anatomical()[source]¶
Return True if this subject’s anatomical derivatives are on disk.
Anatomical files live under
derivatives/anatomical/{subject}/ses-PrismaAnat/anat/and ship T1w / T2w volumes plus a brain mask at two resolutions (full andres-1pt8).
- has_freesurfer()[source]¶
Return True if the per-subject FreeSurfer recon is on disk.
The recon ships under
derivatives/freesurfer/{subject}/; pull it withdownload(..., include_freesurfer=True). Required byto_template()to project T1w-volume data onto fsaverage / fsLR / MNI templates.
- has_stimuli()[source]¶
Return True if the stimuli (HDF5 + CSV) are on disk.
Useful as a guard before touching stimulus-side data (
metadata,images,embeddings,segmentations,captions,to_torch_dataset()) when the archive hasn’t been downloaded yet.
- property images¶
Per-trial stimulus images (PIL + raw bytes).
- property metadata¶
Trial table for this subject, concatenated across all sessions.
One row per single-trial beta. Columns include everything from the per-session events TSV plus the derived columns
session,session_trial,image_name,stim_idx,unique_or_shared, anddataset.- Returns:
Indexed 0..n_total_trials-1. Each row’s index is the “global trial index” used by
images,embeddings,segmentations, andcaptions.- Return type:
pandas.DataFrame
- property segmentations¶
Per-trial object-segmentation masks (shared images only).
- property subject_id¶
Return the BIDS subject ID (e.g.
"sub-03").
- surface_to_template(values, target='fsaverage', **kwargs)[source]¶
fsnative-surface input → surface target.
Accepts a single hemi array (with
hemi="L"/"R") or a{"L": ..., "R": ...}dict; returns the same shape.
- to_nifti(values, output_path, roi=None, mask=None, mask_source='anatomical')[source]¶
Write a per-voxel array to a 3-D NIfTI volume.
valuesis sized to the brain mask selected bymask_source(default anatomical-derived).
- to_template(values, target, **kwargs)[source]¶
Project T1w-space values into a template / reference space.
Forwards to
laion_fmri.templates.to_template(); see that function’s docstring for the full kwargs surface (hemi,route,surface,fsaverage_density,interpolation,output_dir,desc,session).Requires the optional
[template]extra;ImportErroris raised at call time if any of nilearn / nitransforms / templateflow is missing.