laion_fmri.captions¶
Access per-stimulus captions for the LAION-fMRI images.
Each stimulus carries a small set of short human captions (collected on CloudResearch Connect). Shared non-OOD stimuli additionally carry one AI caption (a GPT-generated description). The target is:
shared images (seen by every participant) get 5 human captions and, for non-OOD images, 1 AI caption
unique images (one participant only) get 3 human captions and no AI caption
OOD images get their target human captions and no AI caption
Together they give you a small set of independent natural-language descriptions per image, useful for caption-conditioned modelling, retrieval, or quick qualitative checks.
Files on disk:
stimuli/
task-images_desc-captions.csv
The CSV is long-form (one row per caption) with columns:
|
Stimulus filename. Join key against
|
|
Position within the image. Rank |
|
|
|
The caption text. |
|
Which collection the caption came from
(CloudResearch Connect batch labels for humans,
model name like |
|
CloudResearch Connect participant identifier (NaN for AI). |
|
Model name (NaN for human captions). |
All images have their target human-caption count. AI captions are provided for shared non-OOD images only.
You normally reach Captions through the Stimuli
hub:
>>> import laion_fmri
>>> stim = laion_fmri.load_stimuli()
>>> stim.captions.human("shared_12rep_LAION_cluster_1003_i0.jpg")
['a hand with light pink painted nails with flower designs',
'A hand with finger painted nails with flowers in them',
...]
>>> stim.captions.ai("shared_12rep_LAION_cluster_1003_i0.jpg")
'A hand with short, pale pink polished nails features delicate floral nail art on two fingers.'
For a single row-level DataFrame of every caption attached to an image:
>>> stim.captions.get("shared_12rep_LAION_cluster_1003_i0.jpg")
Classes
|
Lazy reader for the per-stimulus captions CSV. |
- class laion_fmri.captions.Captions(data_dir=None)[source]¶
Bases:
objectLazy reader for the per-stimulus captions CSV.
Loads the CSV on first access and caches a per-image lookup.
- Parameters:
data_dir (str or Path, optional) – Override the configured data directory. Defaults to
laion_fmri.config.get_data_dir().- Raises:
FileNotFoundError – If
stimuli/task-images_desc-captions.csvis not present. Captions are a public stimulus-side metadata file; runlaion-fmri download-captions(orlaion_fmri.download.download_captions()) to fetch them.
- ai(image_name: str) str | None[source]¶
AI caption for
image_name, orNoneif not available.AI captions are present for shared non-OOD images only.
- get(image_name: str) DataFrame[source]¶
Return all captions for one image as a DataFrame.
Returns an empty DataFrame (not an error) when the image has no captions. Rows are ordered by
caption_idx: AI first (idx=0), then humans in rank order.
- human(image_name: str, limit: int | None = None) list[str][source]¶
Human captions for
image_namein rank order.
- list(image_name: str, source: str | None = None) list[str][source]¶
Captions for
image_nameas a list of strings.- Parameters:
image_name (str)
source ({"human", "ai"}, optional) – Restrict to one source.
None(default) returns all available captions incaption_idxorder.
- property metadata: DataFrame¶
The captions CSV as a DataFrame (one row per caption).
Columns:
image_name,caption_idx,source,caption,origin_collection,participant_id,ai_model.