Initialize the data directory¶
dataset_initialize(path) records path as the local
mirror of the bucket. The choice is persisted across Python
sessions, so subsequent calls on the same machine are
unnecessary – Subject / Group loaders pick the
configured directory up automatically.
from laion_fmri.config import dataset_initialize, get_data_dir
dataset_initialize("./laion_fmri_data")
get_data_dir() # → "./laion_fmri_data"
This step writes nothing to S3 and does not require AWS
credentials. It only sets up the local destination for future
download(...) calls and makes Subject / Group
loaders aware of where the data lives.
Requirements¶
The directory must already exist.
dataset_initializeraisesFileNotFoundErrorif the path is missing – it doesn’t create the directory for you, so a typo can’t silently scatter dataset files in the wrong place. Create it yourself first (e.g.mkdir -p ./laion_fmri_data).The path argument must be a string –
TypeErrorotherwise.
What gets created¶
The package records your license acceptance privately inside
the data directory so subsequent download(...) calls don’t
re-prompt. If you want to start over with a fresh acceptance
flow (e.g. on a new dataset clone), delete the data directory
or its hidden configuration folder. The bucket mirror itself
is laid out under derivatives/, stimuli/ etc. directly
in the data directory once download(...) runs.
Switching data directories¶
Calling dataset_initialize again with a different path
overwrites the persisted choice and points all subsequent
loaders at the new location. The old directory is left on
disk untouched – laion_fmri never deletes data for you.
dataset_initialize("./scratch_run") # active until next call
dataset_initialize("./project_run") # now active; first dir untouched
If no data directory has been initialized yet,
get_data_dir() raises DataDirNotSetError with a hint to
run dataset_initialize.