laion_fmri.splits¶
Predefined train / test splits for the re:vision generalization framework.
LAION-fMRI ships with 12 train/test splits per pool, designed to test
generalization across the per-subject shared + unique stimulus pool
or across just the cross-subject shared pool.
Pools¶
"shared"— the 1,121 LAION images shared across every subject (non-OOD subset of the shared block)."sub-01","sub-03","sub-05","sub-06","sub-07"— the 5,833-image per-subject pools (1,121 shared + 4,712 unique).
Splits¶
The same 12 split names exist in every pool:
random_0…random_4— five seeded uniform-random partitions (re:vision baseline).cluster_k5_0…cluster_k5_4— five hold-out-cluster partitions (CLIP-feature k-means; one cluster held out as test). re:vision Method 2.tau— the MMD-matched 80/20 nearest-neighbour-distance split. re:vision Method 1.ood— train = the pool’s regular images, test = all 371 OOD shared images. re:vision Method 3. Test set is identical across pools; train varies with pool.
Loading¶
The natural entry point is get_train_test_ids() (or
get_split_masks() for trial-table masks):
>>> from laion_fmri.splits import get_train_test_ids
>>> train_ids, test_ids = get_train_test_ids("tau", pool="shared")
>>> len(train_ids), len(test_ids)
(897, 224)
OOD type filter¶
The ood split’s test set spans 9 OOD categories. Pass
ood_types= to restrict the test set to a subset:
>>> from laion_fmri.splits import list_ood_types
>>> list_ood_types()
['cropped', 'gabor', 'gaudy', 'illusion-classic', 'illusion-natural', 'relations', 'selfmade', 'shape', 'unusual']
>>> _, test_shape = get_train_test_ids(
... "ood", pool="shared", ood_types=["shape"],
... )
>>> len(test_shape)
82
The returned image_ids match the label column of every session’s
events TSV. Slice betas with the existing get_betas filter system —
see Load for details.
Functions
|
Build |
|
Convenience: return |
Return the 9 OOD categories present in the |
|
Return every pool that has bundled splits. |
|
Return the 11 split names available in every pool. |
|
|
Load every split for |
|
Load one bundled split. |
- laion_fmri.splits.get_split_masks(trials, name: str, pool: str, variant_id: int = 0, ood_types: str | Iterable[str] | None = None) Tuple[ndarray, ndarray][source]¶
Build
(train_mask, test_mask)over rows of a trial table.The masks are derived by matching
trials["label"]against the train/test image-id lists of the requested split. They line up one-to-one with the trials passed in, so you can apply them to any trial-aligned array (betas, features, decoded labels, …).- Parameters:
trials (pandas.DataFrame, pandas.Series, np.ndarray or list) – Trial-level labels. If a
DataFrame, the"label"column is used; otherwise the input is treated as label values directly.name (str) – One of the 12 split names (see
list_splits()).pool (str) –
"shared"or a subject id like"sub-01"(seelist_pools()).variant_id (int, default 0) – Variant within the split. Almost always 0.
ood_types (str, list[str], or None) – Only meaningful when
name == "ood". Seeget_train_test_ids().
- Returns:
(train_mask, test_mask) – Boolean arrays, both of length
len(trials).- Return type:
tuple of np.ndarray
Examples
>>> sub = laion_fmri.load_subject("sub-01") >>> trials = pd.concat( ... sub.get_trial_info(session=sub.get_sessions()).values(), ... ignore_index=True, ... ) >>> train_mask, test_mask = get_split_masks( ... trials, "tau", pool="shared", ... ) >>> betas[train_mask], betas[test_mask]
- laion_fmri.splits.get_train_test_ids(name: str, pool: str, variant_id: int = 0, ood_types: str | Iterable[str] | None = None) Tuple[List[str], List[str]][source]¶
Convenience: return
(train_ids, test_ids)for one variant.Most splits have a single
variant_id=0. The fiverandom_*and the fivecluster_k5_*splits each ARE the variants — pick the split name;variant_idstays 0.- Parameters:
ood_types (str, list[str], or None) – Only meaningful when
name == "ood". Restricts the test set to image_ids of these OOD categories (seelist_ood_types()).Nonekeeps all 9 categories.
- laion_fmri.splits.list_ood_types() List[str][source]¶
Return the 9 OOD categories present in the
oodsplit.
- laion_fmri.splits.list_splits() List[str][source]¶
Return the 11 split names available in every pool.
- laion_fmri.splits.load_all_splits(pool: str) Dict[str, Split][source]¶
Load every split for
pool→{name: Split}.
- laion_fmri.splits.load_split(name: str, pool: str) Split[source]¶
Load one bundled split.
- Parameters:
name (str) – One of the 11 split names (see
list_splits()).pool (str) –
"shared"or a subject id like"sub-01"(seelist_pools()).
- Return type:
Classes
|
One named split (e.g. |
|
A single train/test image-id partition within a Split. |