laion_fmri.embeddings

Access pretrained image embeddings for the LAION-fMRI stimuli.

Use after the embeddings have been downloaded via laion_fmri.download.download_embeddings() (or laion-fmri download-embeddings).

The embeddings live as one HDF5 file per model, sitting next to the stimulus images:

stimuli/
  task-images_stimuli.h5
  task-images_metadata.csv
  task-images_desc-CLIP_embeddings.h5
  task-images_desc-DINOv2_embeddings.h5
  task-images_desc-PEcore_embeddings.h5
  task-images_desc-SigLIP2_embeddings.h5

Each file has three datasets of length 25,052: embedding (the (N, feature_dim) float16 matrix), image_ids (image filenames), and valid (per-row validity flag). All four files share the same image_ids order.

You normally do not construct Embeddings directly. Reach it through the Stimuli hub:

>>> import laion_fmri
>>> stim = laion_fmri.load_stimuli()
>>> stim.embeddings["CLIP"].shape          # (25052, 1024)
>>> stim.embeddings.get("CLIP", "shared_12rep_LAION_cluster_1003_i0.jpg")

For subject-aligned arrays, use the Subject namespace:

>>> sub = laion_fmri.load_subject("sub-01")
>>> features = sub.embeddings.all("CLIP")  # (n_trials, D)

Module Attributes

AVAILABLE_MODELS

Models shipped with the LAION-fMRI release.

laion_fmri.embeddings.AVAILABLE_MODELS = ('CLIP', 'DINOv2', 'PEcore', 'SigLIP2')

Models shipped with the LAION-fMRI release. The label is the BIDS desc- token used in the filename.

Functions

load_embeddings([models, data_dir])

Return a lazy embedding reader for one or more models.

laion_fmri.embeddings.load_embeddings(models='all', data_dir=None) Embeddings[source]

Return a lazy embedding reader for one or more models.

Parameters:
  • models ("all", str, or iterable[str]) – "all" loads every available embedding model. A single model label such as "CLIP" or an iterable of labels narrows the reader.

  • data_dir (str or Path, optional) – Override the configured data directory.

Classes

Embeddings(models[, data_dir])

Lazy reader for one or more model embedding files.

class laion_fmri.embeddings.Embeddings(models, data_dir=None)[source]

Bases: object

Lazy reader for one or more model embedding files.

Opens each model’s HDF5 on first access and keeps the handle open for the lifetime of the instance. Use as a context manager to explicitly release the handles:

with Stimuli() as stim:
    v = stim.embeddings.get("CLIP", "img.jpg")
Parameters:
  • models (str or iterable[str]) – Model labels this handle covers (subset of AVAILABLE_MODELS). A single string such as "CLIP" is accepted.

  • data_dir (str or Path, optional) – Override the configured data directory.

close() None[source]

Release every open HDF5 handle.

get(model: str, image_name) ndarray[source]

Return embedding row(s) for one or many image names.

Parameters:
  • model (str) – One of AVAILABLE_MODELS.

  • image_name (str or sequence of str) – One image filename or a list/array of filenames.

Returns:

(feature_dim,) if a single name was passed, otherwise (n, feature_dim) in the requested order.

Return type:

np.ndarray

property image_ids: ndarray

Image filenames in embedding row order (shared across models).

property models: list[str]

Model labels this handle covers, in user-supplied order.