Stimulus Derivatives¶
LAION-fMRI includes dataset-wide files derived from the stimulus images:
pretrained image embeddings, object segmentations, and natural-language
captions. They are stored under stimuli/ and use the same
image_name keys as the stimulus metadata described in
Stimulus Set.
stimuli/
├── task-images_desc-CLIP_embeddings.h5
├── task-images_desc-DINOv2_embeddings.h5
├── task-images_desc-PEcore_embeddings.h5
├── task-images_desc-SigLIP2_embeddings.h5
├── task-images_desc-segmentations.h5
├── task-images_desc-segmentations_metadata.csv
└── task-images_desc-captions.csv
Download the derived files independently of the packed image HDF5:
import laion_fmri
laion_fmri.download_embeddings("CLIP")
laion_fmri.download_segmentations()
laion_fmri.download_captions()
Embeddings can be loaded directly with
load_embeddings(). Captions and segmentations can be
opened directly with Captions and
Segmentations. If the raw stimulus set is also
installed, the same files are available through the dataset-wide
Stimuli handle and through trial-aligned
Subject accessors.
Stimulus Embeddings¶
For every stimulus image in the dataset (25,052 in total, including the OOD set), pretrained image embeddings from four widely used vision models are provided as a convenience for downstream analyses. The embeddings are stored as four HDF5 files, one per model:
stimuli/
├── task-images_desc-CLIP_embeddings.h5
├── task-images_desc-DINOv2_embeddings.h5
├── task-images_desc-PEcore_embeddings.h5
└── task-images_desc-SigLIP2_embeddings.h5
The four models are:
|
Model |
Feature dim |
Notes |
|---|---|---|---|
|
OpenCLIP LAION ViT-H/14 |
1024 |
L2-normalised |
|
DINOv2 ViT-L/14 |
1024 |
Mean-pooled patch tokens from layer 23; not normalised |
|
PE Core L/14, 336 px |
1024 |
L2-normalised |
|
SigLIP2 SO400M Patch14, 384 px |
1152 |
L2-normalised |
Each file has the same layout - a flat HDF5 with three datasets of length 25,052:
embedding # (25052, feature_dim) float16
image_ids # (25052,) variable-length strings; image filenames
valid # (25052,) bool; True for every image in release-main
Rows in embedding correspond one-to-one to entries in image_ids,
and all four files share the same image_ids order. To work with a
specific subject’s stimuli, intersect image_ids with the
image_name column of the stimulus metadata or use the subject-level
accessors below.
Loading embeddings with the standalone loader:
import laion_fmri
emb = laion_fmri.load_embeddings("CLIP")
emb["CLIP"][0] # (1024,) float16
emb.get(
"CLIP", "shared_12rep_LAION_cluster_1003_i0.jpg",
) # one vector
If the full stimulus set is installed, embeddings are also reachable
from the Stimuli handle:
stim = laion_fmri.load_stimuli()
stim.embeddings.models # ['CLIP', 'DINOv2', 'PEcore', 'SigLIP2']
stim.embeddings["CLIP"] # (25052, 1024) float16 array
stim.embeddings.get(
"CLIP", "shared_12rep_LAION_cluster_1003_i0.jpg",
) # one vector
Object Segmentations¶
Every shared stimulus image carries object-level segmentation
masks: for each noun the detector found in the image, one binary mask
per detected instance (e.g. four hand masks for an image with four
visible hands). These are useful for asking questions like “did the
subject see a face on this trial?” or for spatially restricting
analyses to image regions.
Note
Segmentations are provided for the shared stimulus set only
(1,492 images viewed by every subject). Subject-unique images do not
carry masks. The listing methods (nouns, for_image) return
empty results - not errors - for uncovered images.
Files on disk:
stimuli/
├── task-images_desc-segmentations.h5 # (24011, 1000, 1000) uint8
└── task-images_desc-segmentations_metadata.csv # one row per mask
The HDF5 holds a single masks dataset, gzip-compressed with the
byte-shuffle filter so the file ships at about 68 MB despite more than
24,000 masks. The sidecar CSV maps each mask_row to an
(image_name, noun, instance_id) triple, with detection score,
bounding box, and a localized flag that is 0 when the detector
flagged a concept but could not bound it spatially. Those rare rows are
safe to filter out with localized == 1.
Loading segmentations with the standalone reader:
import laion_fmri
seg = laion_fmri.Segmentations()
# What nouns are present in this image?
seg.nouns(
"shared_12rep_LAION_cluster_1003_i0.jpg",
) # ['fingers', 'hand', ...]
# Fetch one mask:
mask = seg.get(
"shared_12rep_LAION_cluster_1003_i0.jpg",
"fingers",
instance=0,
) # (1000, 1000) uint8
Stimulus Captions¶
Each stimulus carries a small set of short human captions. Shared non-OOD stimuli additionally carry one AI caption. The target is:
shared images (seen by every participant) get five human captions and, for non-OOD images, one AI caption
unique images (presented to one participant only) get three human captions and no AI caption
OOD images get their target human captions and no AI caption
The human captions were written by crowdworkers on CloudResearch Connect - each shown one image at a time and asked to describe it in a single sentence. The AI captions were generated by GPT-5.1 and are included for shared non-OOD images only. Together they give you a small set of independent natural-language descriptions per image, useful for caption-conditioned modelling, retrieval, or quick qualitative checks.
See Metadata Acquisition for the collection procedure (CloudResearch Connect batches, quality screening, AI prompt design).
Captions live in a single CSV that sits next to the stimulus images:
stimuli/
task-images_desc-captions.csv
The CSV is in long form: one row per caption. A shared non-OOD image with five human captions and one AI caption contributes six rows; a shared OOD image contributes five rows; a unique image contributes three rows.
Column |
Meaning |
|---|---|
|
Stimulus filename. Join key against
|
|
Position within the image. Rank |
|
|
|
The caption text. |
|
Where the caption came from - a CloudResearch Connect batch
label like |
|
CloudResearch Connect participant identifier. Empty for AI rows. |
|
Model name. Empty for human rows. |
All images have their target human-caption count (three for unique images, five for shared images). AI captions are present for shared non-OOD images only.
Loading captions with the standalone reader:
import laion_fmri
caps = laion_fmri.Captions()
# All human captions for one image (rank-ordered, up to five):
caps.human("shared_12rep_LAION_cluster_1003_i0.jpg")
# ['a hand with light pink painted nails with flower designs',
# 'A hand with finger painted nails with flowers in them',
# ...]
# The AI caption, or None if no AI caption is available:
caps.ai("shared_12rep_LAION_cluster_1003_i0.jpg")
# Or grab everything for an image as a DataFrame:
caps.get("shared_12rep_LAION_cluster_1003_i0.jpg")
# And the full long-form table:
caps.metadata.head()
Subject-Level Access¶
Subject-level accessors provide trial-aligned views of derived stimulus
files once the subject files and the full stimulus set are installed.
Use Subject.metadata for a concatenated trial table with derived
image_name, stim_idx, unique_or_shared, and dataset
columns. The row index of that table is the global trial index accepted
by sub.embeddings, sub.segmentations, and sub.captions. See
GLMsingle Beta Estimates for the beta-to-stimulus mapping convention.
import laion_fmri
sub = laion_fmri.load_subject("sub-01")
trials = sub.metadata
trials[[
"session", "session_trial", "image_name",
"unique_or_shared", "dataset",
]].head()
trial = 42 # global row index in sub.metadata
# Pretrained features aligned to the same trial rows:
x_one = sub.embeddings.get("CLIP", trial) # (1024,)
x_all = sub.embeddings.all("CLIP") # (n_trials, 1024)
x_ses = sub.embeddings.all("CLIP", session="ses-01")
# Object masks for shared-image trials:
if sub.segmentations.has_image(trial):
nouns = sub.segmentations.nouns(trial)
mask = sub.segmentations.get(trial, nouns[0]) # (1000, 1000) uint8
# Captions for the stimulus shown on this trial:
human = sub.captions.human(trial)
ai = sub.captions.ai(trial) # None for unique-image and OOD trials